← All talks

How I Learned to Stop Worrying and Build a Modern Detection & Response Program by Allyn Stott

BSides Toronto29:35229 viewsPublished 2023-11Watch on YouTube ↗
About this talk
Presented on Oct 21 2023 at BSides Toronto 2023. You haven’t slept in days. Pager alerts at all hours. Constant firefights. How do you get out of this mess? This talk gives away all the secrets you’ll need to go from reactive chaos to building and running a finely tuned detection & response program (and finally get some sleep). Gone are the days of buying the ol’ EDR/IDS/NGAV combo, throwing some engineers on an on-call rotation, and calling it your incident response team. You need a robust and comprehensive detection and response program to fight modern day attackers. But there’s a lot of challenges in the way: alert fatigue, tools are expensive, hiring talent is impossibly difficult, and your current team is overworked from constant firefights. How do you successfully build a modern detection and response program, all while riding the rocket of never ending incidents and unforgiving on-call schedules? This talk addresses the lack of a framework, which has led to ineffective, outdated, and after-thought detection and response programs. At the end of this talk, you will walk away with a better understanding of all the capabilities a modern program should have and a framework to build or improve your own.
Show transcript [en]

hey y'all thanks for coming to my talk so for the next 25 minutes we're going to use a framework I put together to build a modern detection and response program but first hey I'm Allan I'm a senior staff engineer at Airbnb on the technology leadership team where I work on fun things like detection and response I am also a dad I live in Austin Texas my slides aren't up there where' It Go they were up there before yay all right uh and I live there in Texas with my wife and two-year-old son Liam here we are in a giant adult-sized ball pit where I almost lost him many times um and this talk is inspired by Liam you

see I'm a worrior uh and being a parent and a worrier is super fun um I remember when Liam was just a few hours old barely a little lump and he he was sleeping in the little Hospital cot next to uh I was sleeping in the hospital cot next to him he's in the little bassinet and I'm listening to him breathe when I notice that his breathing sounds a little different so I start to worry and I get up and look at him and he's purple and if you've ever seen a newborn child they're purple but he's more purple um but I know what to do I have a plan I paid attention in my online parenting

class I took during coid so I flip him over in my hand and I'm patting him on the back and he spits up and he starts crying but he's breathing again he's fine and Liam gifted me something really important that day and that was perspective on my worrying and I realized something worrying can be a superpower and there's this research paper called the surprising upsides of worry and in this paper they argue that um that uh well I'll read the quote worry illuminates the importance of taking action to prevent an undesirable outcome and it keeps the situation at the front of one's mind to ensure that appropriate action is taken and when I read that I said hey

that's what I do on Blue Team um worriors bring a unique skill to detection and response we're constantly thinking about what could go wrong but we need to have a plan where we'll keep worrying and here's where I see a lot of programs fail when I start Ed my infoset career uh I started on the red team where I watched blue teams fail a lot and when I switched over to Blue Team because I thought I could do it so much better I found that myself and my peers while we were great at getting the technical work done we had no strategy so after making lots of mistakes i p built this framework uh that finally worked for me and hopefully

it work for you building a detection and response program and this way can learn from my mistakes and make better more informed mistakes than I did so let's talk about what I mean when I say a modern detection response program versus a legacy one a legacy program is reactive it's focuses on alerts that indicate that something bad has already happened a legacy program focuses all of its strategy on the technology you know you're doing this when you describe your program by listing all the tools you own and the vendors you do business with instead of thinking about what capabilities you actually have a legacy program has lots of manual heavy tasks sure you probably bought some automation tooling but if

your teams are still doing a lot of the day-to-day tasks manually you're still operating in this Legacy model and finally a legacy program operates completely siloed and disjointed from the rest of the organization I'm often very guilty of this you want to move quickly but it also puts you completely out of touch with your organization technology it inhibits their ability to work side by side with other teams and you end up investing time and money into solving problems that would be better solved in collaboration with other teams on the other hand a modern program is proactive and that doesn't just mean you do a thing called threat hunting it means that your philosophy for detection

incorporates the idea that you want to detect a threat as early in an attack as possible and there's many signals you can correlate before that something bad has happened alert needs to Fire and instead of being so tool focused a modern program build strategy that is business focused and by that I mean we Empower our teams with more than just the requirements but also the context of what they're working on and how it empowers the business a modern program prioritizes automation that means that your first thought when you get a new tool or a new process is to think how will I automate this instead of creating the man ual playbooks that you sure hope to automate

in the future you won't and finally a modern program is connected to the business it centralizes functions its workflows and data because you can't succeed trying to do this alone you won't scale so to start building our Mar program we're going to use the process of organizational design so I read this book so you didn't have to and this book lays out a step-by-step process for designing your organization now they have seven steps but I know my audience so I simplified the four steps because most of you haven't had enough coffee for seven steps so there's these are the four pH phases of the framework we're going to work through today so let's jump in and first we're going to assess and

analyze our current state and we start by asking where are we and what do we have and maybe you don't have much maybe it's you and some antivirus or maybe you have very large program that reports detailed metrics and we're going to assess and analyze from three viewpoints and first up is our vision and Mission and instead of reading what I'm sure is a riveting vision and mission statement I want you to ask what is unique about detection and response at your organization what unique problems does the culture the technology and the people that your organization at your organization pose for detection and response so what makes it tricky what makes it interesting and then what are

people working on and where are they spending most of their time and then we assess from the Viewpoint of our people so we understand what skill sets we have before we start building and I'll point you to the nice cyber security Workforce framework uh comes from n and it categorizes and describes different Works uh different cyber security skills in nice there's these things called work roles and there's a lot of them a lot um but before you know I'll make it a little easier so I grouped these into General detection and response roles and I'd expect there's lots of overlap in these um but it's a good starting point without too much customization and then to assess I was a

senior manager at the time I created a self self- evaluation survey for my team and ask them to rate themselves on each of the various work groups so for the forensic analyst workg group each member of my team would rate themselves from a small baby amount all the way up to parallel parking during rush hour and then we use all that data to create a heat map and that visualizes the experience and skill sets of the team so now you know how to make process and Technology decisions you know where to prioritize mentorship and we're to Target Training and hiring next we look at our technology for these I'd like you to consider that technical capabilities are not product

categories and what do I mean by that um so you might have a thing called EDR uh and I might have a thing called EDR uh but depending on the vendor the tool the operating systems our environments uh how we're actually using them our capabilities might be really different and second uh maybe it's just me I think product categories are confusing uh if I asked five people in this room what xdr was I'd probably get six different answers all right so let's have some fun and design and develop our program and when I design and develop I like to think about it like we're telling a story because you might leave this room and not remember anything about me

except that I had a cute kid um and it's because stories are memorable our brains are wired to remember stories um and so having a mission a vision statement those are cool things but when you start to talk about your program like you're telling a story that's when it can really start to resonate so with that in mind let's start designing and developing and in this phase we're going to create these two deliverables and first we're going to ask ourselves what processes do we need for this program program because processes tell the story of what we do and to better enable this storytelling we're going to build out this view um I have bad news if you're like

an engineer in really particular like myself this won't be an exact diagram um yeah this is a visual representation it's a story and I'll start from an ideal beginning in my story so the first process to talk about is threat modeling where we build profiles of threat actors get intrusion sets maybe it's just understanding about what type of threats we care about the most and the story I'm telling here is that our detection and response program should be fueled by threat Intel where that Intel both internal and external is collected analyzed and disseminated throughout our program whether that's in the form of threat briefings ioc's hunt packages and the story I'm telling here is that I didn't want to reinvent the

wheel we still need the classics event monitoring triage analysis incident response and the story I'm telling here is that we're proactive our Intel often from incidents is used for threat hunting maybe starting with simple automated ioc searching and then moving into creative data analysis to uncover undetected threats and the story I'm telling here is that we continuously test and improve our program I call these things micro purple tests and these are tests that simulate threat techniques so we can validate that our detections and responses actually work you know you write some code you write some tests right you write some tests sure you do uh so you write some detections and then you also write a test that

validates that hey this detection actually would detect the type of activity I'm trying to and that we have responses Associated to that the story is that continuous Improvement should be Central to the program using it to capture and prioritize the results of our micro purple tests so we're only working on the things that are most important um insert a rant about agile here including work that might come from threat hunting like gaps in our application logs the story I'm telling here is that visibility without context has never been good enough we we want our data to be integrated with our threat Intel with the context of our systems networks applications Cloud platforms we want observability so we

can engineer High Fidelity detection rules and when our detections fire our operational processes can pick up the event and start the triage and response processes the story I'm telling here is that this program's performance should be measured and that data is used to communicate how we're succeeding improving changing and investing and finally we Circle right back to the beginning where our incident data flows back into our threat Intel collection so we can continue to inform the program with new threat hunts threat briefings and detections and the great thing about a framework is that your story could be totally different and i' would love to chat with you about it after and if we zoom out we can see that our

reporting and metrics can also inform our red teams with new scenarios our security awareness teams with metrics from fishing events so we can better educate our users and our partner security teams that operate and build the controls that hopefully prevent more incidents in the future cool now we've got some processes let's build out our architecture and we started with the process view because I think it's easier to think about what we need to do and then come up with the technical capabilities we need to execute on these so if we go back to our process View and we focus on threat Intel and think about the types of data we'd like to be collecting about threat actors

intrusions or just raw Intel from the app formerly known as Twitter but also actionable Intel like early warning signals maybe dark web postings related to your organization threats to your brand and reputation credentials for sale and then we have to think about how we're going to collect it disseminate it and integrate it across all of our tooling for Intel driven detection then if we switch over to our Classics I think about all the automated capabilities and controls that I'll need to do triage and Analysis as well the ones that I'll need to do response and then for my more engineering focus and proactive processes I think about not just collecting the logs but normalizing enriching aggregating correlating and

getting true observability out of my logs thinking about how I need analytics to detect threat behaviors malware and to automate testing of all the miter attack techniques in all my different environments and then you can put all those capabilities together and create your architecture View and I've organized these in the groups that tend to be more closely related and uh most of my stories start with this idea of taking all of your threat Intel and using it to create Intel driven detection so that our threat Behavior analytics our malware analytics our micr purple tests can I'll be informed by that Intel and then taking all that data and turning it into something our teams can use to engineer detections that

generate alerts for our automated analysis and then trigger our rapid responses across all those various controls so now you have your long-term architecture view for your modern program and I'm sure that sounded great uh beautifully drawn processes driven by technical capabilities a fully staffed team but uh what about the lack of a budget lack of tools hiring freeze so let's talk about it and first let's talk about making the most of the people that you have especially thinking about it uh in the context of being a manager um when I transitioned from being an engineer to a manager my perspective really changed I had to take a step back and think about the bigger picture and that's where the

idea of building these different views came from and back in an engineering role today and I realized that these program viewpoints are important but almost more important for the people that are working and building these things um they tell people what the most important thing for them to be working on this is especially important if you have operational teams it's easy to feel pulled in many different directions if you're trying to build a new program that has a lot of historical debt noisy alerts and manual tasks two approaches I have found work really well the first is to declare bankruptcy you won't solve yourself and that probably won't fly with leadership so option two is to temporarily

Outsource it um time box it hire a third party sock throw bodies at the problem and that way your team can focus on building something that works and as you're prioritizing those technical capabilities sometimes the build versus say buy argument comes up and uh here I've I've really liked using the buying knowing you'll build approach um so buying Solutions I always say if you can get 65% of the way buying um buying with your capability view in mind and then as you implement your team can focus on building those program specifics because you'll buy lots of tools you'll still have to build lots of things and of course while you build you've still got your incidents and

operations and those are always the priority but especially at the beginning of implementation push back let the less critical Things fall on the floor being really busy gives the appearance that the program is detecting just fine um but as we'll discuss next changing your reporting can open your leadership's eyes to reality and hopefully get them more on board with this endeavor to shift to a modern program and just like there was Legacy in modern program I think some of you probably work at organizations that have quite modern programs but use Legacy methods of evaluating and Reporting and it's preventing you from getting the support and funding you really need to succeed so when I say Legacy I mean

you're still thinking reactive by assuming that time to detect time to respond and time to contain is all there is to say about an incident uh when I say Legacy it's a report to leadership with the number of events that happened this month in comparison to last month without any context uh as if that provided any meeting to anybody it's when you report on what you're seeing without any context of what you can't see and overall Legacy reporting doesn't tell the business the value that you and your program bring all they know is that some bad stuff happened there were more bad things this month than last month and you are very busy while a modern program provides context

around the detections in place descri describing how threats are detected and what the impact of business is our reporting should focus on what threats we are seeing can see and are seeking funding to see it provides context to the numbers what are the associated threats what environments could be impacted and where is their visibility and it quantif ifies it at least in a narrative like hey remember that ransomware incident it was a close call but it could have meant we weren't doing business for weeks and that's on a very low estimate so let's talk about some ways to evaluate and report and to answer what we can detect today I mentioned this idea of micr purple testing and

that is running tests that validate whether you can or can't detect something and here's an example of what running all the techniques cross miter attack and scoring them as pass or fail might look like in this hypothetical example the story is that we've got great coverage of Discovery and C2 techniques but an attacker might be able to move laterally through the network without being detected and that might be bad if your business has it's a nice way of saying it poor controls between environment boundaries I'm sure none of you work anywhere that has a flat Network work next we look at the results of running all those techniques across all our different environments or what I'm

calling Landscapes we can get a different view of where tactics might be less likely to be discovered this is always a little difficult because miter has really well written out techniques for endpoint less great in other areas so you have to get creative here our story is that this organization has pretty good endpoints coverage but almost no visibility in their containers which could be a problem if that's where the business apps run and then you can sum it all up we can provide our overall visibility into threats by looking at the results from our micr purple tests looking at the detections we have in place today but haven't yet validated and waiting those by the priority and prevalence of our

environment these three reports tell leadership what are the different different types of threats which ones can we see and where and overall how we're doing and then of course you could Trend this over time and find where Investments and hiring might enable faster delivery and closely tied to our observability report are our metrics for these let's move away from just giving the alert numbers a cringe every time I say it because yeah I've done it a lot um we know what we can see and what threats we are seeing the most and where which ones pose the biggest threat to the business from what we can see what are the trends and impact and what

preventative controls Investments could reduce this risk and impact so instead of giving a lot of meaningless numbers derive from your events what are the top threats the top Landscapes at risk and top incident trends that need to be immediately addressed and so with all those then you can finally present your road map you know what you can do you know what you can't do today you know what you can and you can't detect you know what type of threats are impacting your business and you know how to make it better so you present your road map that close those gaps and then you make your asks here's what we need from you fund it hire for

it so here we are instead of hiring based on the number of alerts or investigations you can make datadriven decisions based on roles and skill set instead of threat hunting because it sounds cool your process view is a story that answers what your program does how it provides value and how you measure success instead of buying based on what Gartner says you need you have a vision for your architecture with a prioritized timeline to accomplish it and clear asks to leadership instead of telling your bosses yeah we might detect it you have metrics that describe your threat coverage and hopefully now you've learned how to stop worrying and build a modern detection and response program

thanks so much for having

me and then real quick here's my email and all my social handles are on my website I write an infrequent newsletter called meard uh has an adorable cat the people love security info is just okay um I have stickers as well if you come see me um if you have any questions do feel free to shoot me an email or come find me during lunch and thanks again for having

me yeah yeah we have uh time for questions so uh if anyone has anything uh thank you Alan I that was a great talk by the way uh uh what being a risk analyst uh what resonated with me is your um H talk about coverage across the landscape which is something that I think gets forgotten a lot and then the coverage across your capabilities and focusing on capabilities rather than products which is one of my pet peeves as well so um does anybody else have any questions for Alan

Max yeah uh I I think part how do you move from the old metrics to these new ones and how do you pull out that data from the old one is that that part of the question yeah I think there's also a part question in there which is like we've been presenting these metrics for a long time and now we're gonna give you new ones and that's scary and that's often why we get stuck presenting the old metrics uh I know because I always inherit old metrics and I'm like okay how do we get these numbers um I really like to do like the last three months and dive into that data and be like here's what we gave you for the

last three months here's some actual information out of it do the analysis and figure out what type of threats you're seeing highlight the ones especially that are important or interesting to the business um narratives are also good to pull out of those as well you can present like really beautiful graphs and slides but sometimes actually just pulling out some narratives go a lot further um and then usually when you switch to the new metrics it's always good to like go back in time and like postwork those and that's the hard awful part because you do the metrics when they're due at the end of every month or whatever uh and so you have to actually

work ahead and go through those so definitely having like a transition of looking at your old data and putting those into the new metrics and then pulling out the most important things like the top threats the top risks and then where your landscape is at yeah any other questions I think I got time for one

more great question yeah so that process view how do people fit into that are they working in each one separately are they working across all of them I've tried a lot of different things in this area one thing I've learned about people is that they don't like to be siloed they don't like to just get pigeon told into doing one thing I see that somewhat occurring a lot more again in our industry I felt like we broke out of the like big sock model where oh you you have tier one tier two tier three and you only work as tier one then you move to tier two and you never see the other things I really like this new idea we're

playing around with we're doing this thing called pods and so we take some of those like grouping of areas so thread Intel and threat hunting detection and response um micro purple testing and a Automation and putting people in those pods for a quarter three months four months five months letting them have some really good time to sink into that area and then rotating to the next one I really think that people exposure to all of those processes even if they're not things that they're like I'm going to do this only and forever um having their exposure to those areas can make them better at all the other areas so if you're doing triage and response and you're like only ever doing

triage in response and you're like I sure wish those automation guys would build us the automation to do this when you go over to the automation pod now you're there and now you're thinking I've done triage and response I know what make make my life better and then rotating around so yeah I think I like the idea of PODS and people moving throughout but having enough time there to build some Excellence cool I'm out of time thanks y'all