← All talks

BSidesSF 2026 - Composing the Response: Building an Incident Pipeline from Scratch (Geet Pradhan)

BSidesSF29:1324 viewsPublished 2026-05Watch on YouTube ↗
About this talk
Composing the Response: Building an Incident Pipeline from Scratch Geet Pradhan Most security teams improvise responses, leading to fatigue and slow containment. This talk shows how to build and scale a response pipeline from scratch — connecting detection alerts and response playbooks to actually respond to security alerts. https://bsidessf2026.sched.com/event/b7567c163ef567081134f345c3908974
Show transcript [en]

Uh it is my pleasure to introduce Git Pradan. He is a security engineer and he's going to be talking about composing the response building an incident pipeline from scratch. Please join me in welcoming Git. Thank you. Thank you Steph. So before I get started I want to give a big shout out to Bides all the volunteers all the staff and the sponsors. Thank you again for organizing this and it's it's a great conference as usual. So, I can't see anyone here. I was going to do one of these like show your hands, but can you yell if you have ever responded to an incident either internally externally? >> All right, it's a lot of you. So, you

guys are going to have fun here because I'm going to talk about all the good, the bad, the ugly, and the stupid things that I did. And now, if I had to build a incident response pipeline from scratch, what I would do. So just to be clear, this talk is for folks who are probably employee number one, employee number two, security employee number two, and who have no money in their company. Right? The meme that it says, shut up and take my appreciation, that is exactly what you have. So for the folks who work for companies who have a lot of money, this may be something which is a bit basic, but I'm sure you've gone through this

before. Now, a bit about me. I am the concerto conductor of the beautiful symphony of incident response. So I'm experienced in building internal and external incident response programs. I really like building from 0ero to one and I have spoken at bsides 2025 on the detection pipeline. So my previous talk last year was about how to build the detection pipeline again with no money and if you're employee number one. What I spoke about there was detection pipelines are all about identifying a highfidelity signal from all the noise. And today we're going to talk about the shield, not the Marvel shield, but the shield which converts all these signals into actionable, executable things to do to save yourself.

So the program today firstly I'm going to talk about what really happens in a cyber incident. You know the craziness and the tax that is manual improvisation. Then I want to move on to my version of the five pillars of incident response emphasizing what you actually need to think of. I'm sure you've all seen some version of these five pillars. Then we'll touch a little bit on how to mature this pipeline and uh a little known concept in San Francisco called AI, you know, agents, automation, things like that, and how we can add that. Then lastly, let's let's take a look at the journey of an alert and how it would look in practice. So

picture this, right? You get an alert, maybe a weird login or something like that. What do you do first? You panic. That is what you have to do. Then you know you take a breath, you regain your spirits and start to analyze. You look at the account. Maybe you check something like if you have a workday or an octa for information about the user. Then you find the device. Then you look at the CMDB or whatever or a spreadsheet, you know, if you have it on what device is assigned to them and some more information about the device. Then maybe you look at the IP, try to get some info about the IP. Okay, cool.

Something's weird in this case. Now, now you try to contact the team who owns the app and ask them the who, what, why about whatever is happening. Okay. And should we be isolating? Maybe. Obviously, they're off for their afternoon coffee run. So, that takes a while. Then finally, maybe the rest of whoever is doing the instant response is based in SF while you're in London. Yikes. This is what happens. So the improvisation tax is exactly that. Every lookup adds crucial minutes, seconds, minutes, hours to the instant response. For all the people who wooed right now, you would probably attest to the fact that cyber incident response is all about time. The amount of time you

can save. You have to buy time so that your methodology and mentality can move from reacting to responding. Now the improvised decision-m, right? All these people who are on the afternoon coffee runs, they need you need them for decision-m. Now, if they're out on the coffee runs, it adds time because you have to get them back. And finally, time zones. Time zone restrictions are what most people need to solve for. And that is where it hits. So, attackers attackers automate. That is the way the world is going. And defenders, from my experience, we're we're getting there, right? This is why manual response is not really scalable. So, how do you fix that? We fix that

with my five pillars of instant response. Now, when you're starting out, one of the things that I've learned is there's a lot of talk about, you know, you have to get all your policies, your procedures, your SOPs, everything right. That is true. You have to do that at some point, but you have to get the basics right. And what are the basics? You need good high fidelity detection alerts. You need to find out ownership and who does what. You need to be able to talk and understand how to coordinate this. you need to execute and actually respond. And then the last one which a lot of people tend to miss out, you have to worry about things which concern the

company which is where the governance and compliance falls in. So let's talk about the first pillar. Now I'm not going to go deep into how to set up detection rules because I speak I've have a whole 30 minutes about that. But essentially you need to normalize the chaos. Normally at this stage there'll be lots and lots of ways that people will come to you for their incidents. Maybe you have a SIM, maybe you have Jira, or maybe your CEO sends you an email about something they saw online, right? That that is an intake. Maybe there's customer support. That is insane for one individual to manage. So, how do you how do you solve for that?

From my experience, you need to find one single pane of glass, right, to see all the different ways. Now, I'm not saying every single format of every single ticket has to be the same because that's where you have to get to. But at the very least, you only want to see one place because when you're panicking and you don't have all the time, you don't you don't want to be looking through 30 different tabs like I do all the time every day. Let's move on to the command structure. This is one of the most interesting conversations that I have with everyone because of this. Everyone thinks they are not the incident commander and then at the same time everyone thinks that

they are the incident commander. Now, when you're actually responding, as all the people who would will say, there are multiple roles and multiple hats people need to need to put on. When you're the only security employee or one of a few, you will often have to lean on your sister teams to help out, right? Regardless of this, you need to find a proper roles and responsibility structure of who does what. Now, some of the roles that are commonly seen in incident response is the incident commander, right? the person who actually directs the symphony, um the mechanic or in this case a technical lead, someone who's actually doing the instant response and stopping the bad guys and the comms because people need

to know what is happening. I'm going to be real with you, there is a pretty high chance you will do all three and there's a pretty high chance that maybe you can find a friend in a different team to help you out. Regardless, the number one rule is every incident must have one single commander. So you have your fidelity rules. You know where you get your incident from. You know all the roles and responsibilities and you're ready, right? No, you're not because coordination. Even though you have everything that you think that you're ready for, there's always going to be Slack DMs and there's always going to be 30 email threads. How are you going to capture that with a lot

of coffee? Now, this is where folks must have heard of these different frameworks and the concept of prepare in the instant response pipeline. This is one of the things you need to prepare for, right? Do you talk on Slack? Do you talk on Jura? Do you if you're in office, if one of those companies, do you go over to their desk and shake them? Maybe that maybe that's a way that works. But you have to find a way that works for your company, your team, and you have to create that war room. Now, this is where I'm going to ring the bell of automation, right? You have to automate this war room because this is if you

figure out and you spend time figuring out what and where the war room is. We go back to the improvisation tax, right? You're adding these these precious minutes to actually getting to responding. Now, cyber attackers love that. They love the confusion. They love creating confusion. So, what do you do? You automate the incident warroom. Now this is a really well welldrawn diagram of what should happen right let's say you get an alert in your sim automagically someone gets paged like hey you have to do something a slack channel gets automatically created for everyone to talk now the golden rule here is it doesn't matter if it's Slack right it could be it could be anything it could

be a WhatsApp channel I don't recommend that but if that's what you want to do go for it the golden rule is all the updates all the conversations have to happen in that channel because you You know, let's just say you're the incident commander and you are actually falling sick on a Friday and you have to take off. If someone else is being the incident commander, they need to know all the context and it should be all in the same place. Now, okay, you have the command in place. You know how you're going to talk to people and you know what's happening. Now, what do you do? You actually need to respond. Now, I'm not going to go into what you need to do

to respond. Do you isolate? Do you do this? Because it's different for every single situation. However, you need to move towards containment, containment, containment. So, the core philosophy here is you have to contain the attacker even before the investigation starts. That is the northstar you want to move on to. Now, I know when you're when you're reacting, that's not possible. But each and every minute that you buy and you save yourself in the first three sections, right, you can spend time and actually figuring out what the execution is. So, say for example, you need to, you know, disable an account or quarantine an infected endpoint or revoke any keys or or anything of that sort, you need to be

able to press the red button. Now, do you always press the red button automatically, right? In this case, uh I think this Anakin Skywalker and not Darth Vader, right? Um shut down auto prod automatically. Now, is that is that good? Sometimes it is but you know there are some situations that you don't want to automatically do everything. Guard drills don't kill don't kill prod. Now there has to be safety checks. The more automations you add you need to have steps that make sure that hey if if you know some sort of production database is showing something weird. Your isolate button does not just go and shut it down. Then there will be downstream impact. Right? So how do you

solve for that? It's an if condition. Right? if it's an alert and the asset is not protected. Now, what does protected mean? It's different for every company. It could be a critical server. It could be a critical employee. It could be a database. Anything of that sort. That's that's what I do, right? What I would suggest is if it's if it's not a critical asset, try to move towards auto remediation. Sorry, um autocontainment, right? Auto remediation will come later, I promise. Um, and if it's if it's a protected account, a protected database, have a human in the loop. Now, how do you know if you have a human in the loop, how do you know what

account is what? Context, right? You need to enrich as much as possible. Now, in this case, Alice has suspiciously logged in to an unknown host from an unknown IP. Now, this is a pretty high alert, right? We know that that Alice may be an important person may not be an important person but you have to enrich this ticket. Now we know okay Alice at company has logged in from a IP which is u geotag from Russia and thread intel says it's ransomware but the host is a production database and it's owned by the infrastructure team. So now with your human in the loop you have two options. Do you isolate host or do you disable

the user? What do you guys think we should do first?

No, the answer I was expecting is ask the infra team what this database does because they own the database and this is where you have to lean on your team to give you that context. Now we spoke about these four acts but there is one last one which I think will will make or break it because now that you have figured out and you have a pipeline of hey I know my detection alerts are pretty good. I know how to talk to people and I know how to get them in the same room to communicate and coordinate and I know what actions to take, right? With a caveat. Does that mean I can respond from everything and

I'm fine? Can you respond from everything? Maybe. Are you fine? No. Because there is something known as compliance, right? A lot of you who who did say that you've been a part of an incident response will probably know there's something known as notifications, right? You may have to tell someone that something is going wrong and that is act five, governance and preservation. Now technical response saves the team but compliance and governance response saves the company. Although you think you've responded and you're fine and you you know you've also recovered from from what happened. There will be notifications and this is where you have to talk to your legal experts or your compliance experts. Maybe if

you're the first engineer on your team, you don't have uh the resources in your company to go to. So I I I really really urge that you find out what your company needs to be compliant for and what these breach notification timelines are. Now along with that there's a certain concept known as forensic preservation. Right? For my for my folks who have been involved in an incident, this is something which I you know in when I've been learning about incident response from my previous roles and my current roles, this is something that I I didn't think of straight away, right? To forensically preserve an active incident before any sort of execution. So if you're pressing the red button on

disabling the host, take a snapshot, take a snapshot after, right? Why? Because if you have to recover something in the future, it's useful. But more importantly, when you're investigating and when people are auditing of like, hey, what have you done? I want to see you have the proof and it's and it's forensically sound. Now, these are my five acts of incident response. Right? Now, let's move on to the feedback loop and the maturity of what I think uh people should be doing in instant response. Now, the first one is manual response. Right? You have built this pipeline that we spoke about and you're responding manually. Now, what's the next step? You figure out a way to

structure your triage and and come up with some sort of automated triage, automated enrichment pipeline. Now, cool. You've done that. Now, what do you do? You move on to the scary red button, which is automated containment. Okay, G, I've done that. Now, what do you do? Well, first of all, great job. Then, you move on to things like proactive threat hunting. The north star you have to always think about is us as as analysts, as as responders, as humans, we need to move from actual operators and analysts to decision makers, right? The feedback and also you have to add a feedback loop. Every single time you respond to something, you have a whole instant response process.

Try doing something like a postmortem or like a lessons learned or one of basically just get together with the people who did whatever they had to do and talk about what went right and what went wrong because I guarantee you that particular exercise will save you more than one more detection rule. Now I did promise you AI the real AI sweet spots. Now in my opinion right and I could be wrong AI is most useful for the things like alert normalization getting all the alerts in the same format adding context looking up different databases and pulling in things about the device etc investigation assistant right adding more context of what has happened so you can ask it questions documentation and

also implementing that feedback loop to improve your processes now my opinion for where AI should not be blindly trusted yet is fully autonomous containment. Right? I I previously mentioned there needs to be some sort of safety net for protected accounts, protected users. It's too risky to just leave everything to AI right now without adding a little bit of guardrails. Now, legal and compliance decisions, this is fairly nuanced, right? I think there are cases where you can completely 100% rely on AI. However, I think you need a human in the loop in this case. evidence handling. It needs a strict procedure and it really depends from region to region right? So, I'm sure there's more places that

you can add AI. From my experience, these are the ones that is is an easy win. Now, if you do all the things on the left side that I've that I've spoken about, you will buy yourself those precious minutes that can help you move from the react methodology to the responding methodology. Now, let's go through a a well rehearsed a well rehearsed symphony of instant response and a badly rehearsed symphony of instant response. So, what I like to call a 50-minute incident. Now, cool. You you get an a login alert. I will wake up 5 minutes after that, you know, take my sweet time. I will start to review this and I'll start triaging this, right? It'll take me a little bit

of time because I'll need to look up host, IP, context, things like that. and then I'll confirm, okay, it looks like an actual alert and something's going wrong. Then I'll open up a ticket. I'll get the folks that need to come in and and let them know what's happening. Then it'll take me a few more minutes to identify more logs, trying to trying to get some more context about what's happened, when it's happened, when did the user last login, get their baseline, and add some more thread intel. Now again, it'll take me another 5 10 minutes to actually come up to a point where I can block the IP and disable user. Now keep in mind this is still

very aggressive. 10 minutes to make a decision that I need to block someone or something. It is it is still very tough to get to but it's taking me 10 minutes right now. Finally at the 15 minute mark maybe the incident commander who someone else maybe the application owner finally logs on. Why? Because me as a security engineer security analyst I've already done the initial containment. Now the rest of the response may be led by our sister team. Now, manual processes lose time and every manual check that I've been doing here, there's five, there's 10, there's 15 minutes. Now, what if this was automatically automated? So, in this case, the pipeline buys time. An an alert comes up, right? In

the next one or two minutes, a ticket is automatically created. A Slack channel's already spun up, right? The on call engineer from assisted teams have already been paged. And in the Jira ticket, there's some context. Now all the ip identity all the enrichment that we've spoken about has already been added in this case on minute two. In a couple more minutes because of our if condition this particular account right is not a protected account. So the user is automatically disabled and because of our automatically forensic imaging processes or forensic snapshot processes at minute 6 or minute 7 the host is already isolated and the and the snapshot has been captured. Now when I who takes normally 5 to 10 minutes to

get up from my my bed to my office which is my living room um by the time I come on all this context has already been added right and I just have to review the scope. Now what does this tell you? This tells you that the pipeline is to buy you time. And I know I keep ringing this drum but cyber incident response the most important thing is you have to buy yourself time to actually defend against the attackers, right? Now in actuality this is an example of an alert right an alert's coming onto your sim now the notification automat automatically the the jura ticket has been created slack message and also pager duty someone's been paged now the

coordination again automatically an incident has been created in the slack channel you have all the folks in the right place to make these decisions and also the context has been added right things about is it a managed device what is the OS user role etc and the execution because it's a not very important account right things have already been added now before I come online as the on call engineer a lot of the pipeline and a lot of the actions have already been taken and the risk has been mitigated in the way because something's been contained now what I'm trying to tell you here is the more effort you spend on plumbing the pipes of this pipeline, the more

time you can spend on actually thinking about the response. Right? One thing I've learned is in a cyber cyber instant response, you're always your back's always against the wall. There's always some sort of impact that's happening and there's 5, 10, 15 people breathing down your neck. Now, what tends to happen is you move into panic mode and you start reacting. You miss things and you you you cannot carefully respond to what's in front of you. you react to what you think is happening. So every single thing you do when you're building your pipeline from scratch and it doesn't need to be uh it doesn't need to be expensive. You just have to find a way to buy yourself time. So if there

are three things that I want you to take from this is plumbing the pipes. Make sure that telemetry is good and you know what's happening. Normalize alerts before you start automating because you have to get into the same pipeline. Context is more important than the alerts. Automate the enrichment as much as possible even before a human has to come into the loop and make any decisions. And number three, contain, contain, contain. Contain the attacker before anything. Containment beats documentation. Documentation is very important, but I would much rather contain a spread of a breach than fill out an entire Jira ticket, right? Stop the attack first and then start autoisolating. Now, a lot of you may not have ever done

instant response before and you don't know how to start. So, here are some free resources, right, that can help you get started to create a demo environment and and start to get a feel on how actually in practice an incident can be done. So, at the end again, I'm going to beat the drum of time. Everything that you do in an incident response is all about saving time. Thank you.

Thank you very much, Ke. All right, we have time for some questions here. So, uh, if you have questions, I would encourage you to put them into Slido. You can access Slido on your device by going to besidesf.org/qna. If you give me a second to get pulled up here, I will see if we have any already. There we are. Okay. Uh, we don't have any yet. So, I will take a question from the audience. Someone wants to raise their hand. Uh, I have a white shirt back left or cream. Yes. >> You hinted at auto remediation. Would love to hear more. >> Uh, I'll repeat the question. You hinted at auto remediation. We'd love to hear

more. >> Yeah. Um, do you have anything specific about that or just in general? >> In general. >> Yeah. So, I think there's a difference between respond and recover, right? Response, in my opinion, is is getting the bad guy out or stopping stopping the bleeding. Now, remediation essentially is coming to a point where you're back to normal. Now, let's say if it's a system and and you're being, right? Auto remediation is finding a way to to have a load balancer, for example, to get back into normal operations, right? The difficult part in auto remediation is the process, right? If you're if you're a founding engineer or a founding employee, it's very difficult to even remediate, let

alone auto remediate, right? You need to know what the normal was. And often times the normal before the incident is not the new normal because a lot of things have changed. Maybe maybe you have you have to figure out what your IM looks like because you really messed up setting it up the first time. So a long-winded answer to say I also don't know. You know there there's there's multiple different uh products out there which will tell you hey we'll help you get back to normal without clicking a button. I don't know the complete answer to that but this is what I think from my perspective. Yeah. >> Thank you. Thanks. >> All right. I have questions on Slido.

How is AI used in investigations? Do you analyze data in your seam or pull data from multiple systems to feed it to AI for analysis? >> Yeah, that's a good question. I I'm assuming someone in the crowd asked it. I think there's multiple things that you can do, right? If you have a SIM which has AI capabilities, you can actually ask information and chat to it in a way that you can understand what's happening. So, for example, context, right? If you're investigating, let's say the alert from Alice, it's a suspicious login from a from a place you don't know, you can try to find the IP. You can also look for, let's say, minus 30 days, any other IoC's that you've

seen in the past. And on the other side of the question, I think it was do you just feed in something to an AI model and try to understand what's happening? Yes, you can also do that, right? You can look through I don't know and think of a virus total for example, right? You can look at IP, you can try to connect um other thread intel sources through AI as well. And it really depends on how you want to use it. My experience has been using it directly in the sim where it really tells me what is happening and also what's happened in the past. So that's really helped me come to a point of when

I'm investigating, right, is it a one-time or is it persistent? So that that's where it's helped me quite a bit. >> Thank you. At this time, what do you think is better for automating incident response, sore playbooks or AI based automation? >> Um, that's a great question. I think they do similar but different things, right? I think sore playbooks are great for finding the pipeline, right? I think if you want to do one thing multiple times without any context changing, I think a sore playbook is really good for that. Um, AI, what was the AI socks? I think it was >> uh AI based automation. >> Yeah. So AI based automation, you know, I think the the good thing about AI is

that it generate context. So if any sort of playbooks require changes of context or it requires you to learn and adapt, I would say maybe an AI AI based uh sore or AI based automation works there. But a sore in my in my experience works best when it's repeatable task multiple times or repeatable uh flow multiple times. >> Thank you. Um, we're not supposed to do vendor pitches here, but uh, what in your experience, what are the best tools or software for AI forensics and incident response management? >> Uh, there's a fourth floor upstairs. Um, no. Um, I don't want to do vendor pitches. I think there's lots of vendors who do a good job at different things.

Uh, the ones that I pointed out in my presentation, I can I can go back. Um, they're free. They have open source so you can try them. Um, I think I I don't know which vendor does what best. Uh, so I want to I want to be vendor agnostic right now. >> All right, we have one more question here. Oh, shoot. I marked the wrong one. >> Give me a second. >> If you're in the crowd, do you want to just shout it? >> On evidence handling, what is the recommended pipeline and who are the necessary stakeholders? uh if you're in the crowd, if you're the only security engineer, um the recommended audience is yourself and uh

if you have a legal partner in your team or in your company, um your legal partner. If you have a external cyber insurance vendor or a breach coach or a third party incident response team, they are also a very important um stakeholder. If you have uh I know managed sock for example, right, they are also a stakeholder. It it just really depends on you know if you're a more mature company you probably have a way to take those snapshots. Maybe you're I don't know infrastructure team which manages the infrastructure does that but if you're the first security engineer a lot of that is something you will have to figure out. >> Thank you very much.

>> Thank you so much. Uh please give one more round of applause for his wonderful work. >> Thank you so much. Thanks guys.

[ feedback ]