
perfect um so I have never been on Call of Duty myself but I have been told how tiring it can be especially if you have like constant alerts that wake you up in the middle of the night and you have to handle issues uh while you actually want to sleep um there's only one thing that can significantly improve uh exactly that you're on call duty and that is a strategy that will give you l alerts while having more impact our next speaker um spend some thoughts of how to actually achieve that and he's going to tell you about that please welcome [Applause] Ellen hey y'all thanks for coming to my talk uh this is uh how I learn to stop
worrying and build a modern detection and response program today we're going to use a framework that I built built to build a modern detection and response program you'll come away from this talk with a better understanding of what a detection and response program should look like how all the tools capabilities and processes should fit together and how to analyze build Implement and evaluate so your program provides immense business value but first hi I'm Allan I'm a senior staff engineer at Airbnb on our technology leadership team and I work on fun things like threat detection and incident response um I'm also a dad and I live in Austin Texas with my wife and two-year-old son Liam
um so I do incident response and I have a toddler so I don't sleep too often um but uh here we are in a giant adult-sized ball pit where I almost lost him many times he may still be there uh and this talk is in a way inspired by Liam you see I'm a warrior uh and being a parent and a worrior is a very fun combination um when Liam was just a few hours old barely a little lump I was sleeping in the little Hospital cot next to him while he was in the bassinet and I'm listening to him breathe and I notice that his breathing sounds a little different and I get up to look at him
and he is purple now I've been a dad for about 30 minutes at this point and I know very little um and they do look kind of purple when they first arrive um but he seems more purple I suppose um and as I take a closer look I realize that he is struggling to breathe and his face is turning darker and darker but I know it to do I have a plan um I paid attention during my online parenting class during Co and so I flip him over in my hand and I Pat Pat Pat Pat and he spits up and he starts crying but he's breathing again he's fine Liam gifted me something really important that day and
that was perspective on my worrying and I realized something worrying can be a superpower um there's a research paper out of the University of California called the surprising upsides of worry and and uh in this paper they argue that while worrying especially at extreme levels is obviously very negative to your well-being there's a really cool upside to worry here's a quote worry illuminates the importance of taking action to prevent an undesirable outcome and keeps the situation at the front of one's mind to Ure appropriate action is taken and when I read that I said that that sounds like what I do on Blue Team um worrying can bring a unique skill to detection and response we're constantly
thinking about what could go wrong but we need to have a plan and here's where I think a lot of programs fail so I started my infosite career on the red team where I got to watch blue teams fail a lot uh and then I switched over to Blue Team because I thought I could do that so much better um and started working in detection and response and I found that myself and my peers while we were really great at accomplishing the technical work a lot of us really had no idea about what our overall strategy was what a program was we just did our tasks and projects ad hoc without really a bigger picture in
mind and I think this is one of the reasons that we're constantly failing we are making lots of actions I think as anybody who here is on an operational blue team yeah a couple of you and my guess is that most of you guys are busy uh and uh that's been the story of my operational career uh we're really busy but we're really making it up as we go uh or we copy what we did at our last organization because it sort of worked there um so after making lots of mistakes I took all those lessons that I learned and I built this framework and so hopefully yall can learn from my mistakes and make better more informed
mistakes than I did and before we go into the framework I wanted to talk about the difference between what I what I mean when I say a modern versus a legacy detection and response program in a legacy program the focus is on alerts you're reactive and those alerts are centered around the idea that something B has already happened a legacy program focuses all of its tools all of its strategy on the tools on the technology uh you know you're doing this when you describe your program by talking about all the tools that you own and all the vendors you do business with instead of thinking about what capabilities you actually have a legacy program has a lot of manual heavy tasks
you may have invested in automation tooling but you're still doing the day-to-day tasks manually and then a legacy program operates completely siloed from the rest of the organization um and so this puts your teams not only out of touch with the technology that your company uses it also in inhibits their ability to work side by side with other folks across the company and that's uh not great when you're trying to be side by side with them when you're thinking about what threats you could be detecting on the tools that they're deploying or building and then on the other hand a modern program is proactive and that doesn't just mean you do a thing called thrat hunting even though it sounds cool
uh it means that your philosophy for detection incorporates the idea that you'd want to detect a threat as early in an attack as possible and that there's many signals that you can correlate before that something really bad has happened alert needs to fire instead of being so tool focused a program build strategy that is business focused gross um but by that I mean I we Empower our teams with more than just the requirements but also the context of how what they're working on empowers the rest of the business a modern program prioritizes automation so when you're building our processes we're purchasing new vendor tools and we're integrating things we're thinking how will I automate this instead of making those
manual playbooks and sure hoping you'll automate it later you won't and finally a modern program is connected to the business and centralizes function workflows and data uh you can't succeed by doing this alone I tried um and uh some of you are thinking sure sounds great um but I've got challenges in my way and I know them really well uh it's alert fatigue that is grinding any sense of hope of moving forward it's the lack of a budget to buy the expensive tools it's hiring or retaining Talent especially because these roles can lead to burnout really easily and it seems like you'll never get out of firefighting mode uh the incidents they just keep coming and uh
so hopefully I'm here to help and to get past these we're going to use this process of organizational design and I read the books it didn't have to it won't speak ill of it but it wasn't riveting um and in this book they lay out this stepbystep approach to designing your organization so we're going to apply that to building our program now they had seven steps and the talk after mine is lunch so I simplified to four steps uh because I can hear your stomach's rumbling from here and these are the four phases of the framework we're going to work through today we're going to assessin analyze our current state we're going to design and develop
our ideal program this is where we'll spend most of our time and then when we'll when we comes time to implement I'll talk about some things that I found help overcoming those operational challenges and then finally we'll evaluate and report and so here we go so first we're going to assess and analyze and to start we're going to ask the question where are we and what do we have do you have anything so maybe you come from a large program that you know reports lots of metrics that go all the way up to your board of directors or up to whoever's in charge and maybe it's just you and some antivirus hopefully um so we want to understand our starting
point and the first piece of advice I give folks when they start out on this is to stop doing and start learning take a breath sure things are on fire but they've been on fire for a while so it's slow burn um and we need to figure out first before we fix we need to learn some things and we're going to assess and analyze our current state by learning from these three viewpoints and first up is our vision and Mission and while your organization may have a riveting Vision emission statement I want you to instead ask what's unique about about detection and response at your organization what unique problems do the people the culture the tech pose for
detection response maybe you have a harder time with ntlm than I do uh and then what are people spending their time doing especially on the team and then uh what could they be maybe the better question is oh what should they actually be doing that they're they're not and they're doing these other things and those aren't as useful um and then who do you ask audience
participation all is quiet in Berlin yes what's that yeah that's true I like to start with the people that are going to do the work so you could ask the folks on the team folks outside the team what are what are some people you might want to ask stakeholders like who seeso leadership they're probably gonna you're probably going to ask them for some money so that's a good place to go what's
that yes one of my favorites to ask legal um I want them to be my friends always hr2 and then a silly question is how do we ask nicely yes that is the exact answer um so uh I don't usually use questionaires but usually these are questions people have to think about you know what is unique about threat detection at this org what do people want out of it what is the team doing that they're not doing and so questionnaires would be a good way to let people think about it and give a response but regardless of how you do it the takeaways are what are people doing because this tells you what is the
implied vision and mission statement which could be different than what's written down usually is and then what does the organization need from threat detection and response and then we assess from the Viewpoint of the people on the team so you understand what skill sets we have before we start building and I'll point you to this resource nice have this cyber security Workforce framework categorizes different cyber security skills there's a lot of them but I gave you a jump start and grouped them into these General threat detection and response groupings and I'd expect you'd have responders that are also detection and response Engineers but this is like a general way to start without too much customization and then to assess I
created I was a manager at the time friendly one too so I didn't have to do this anonymously uh I created a self- evaluation survey and asked my team to rate themselves for each of the different work groups so for forensic analyst asking each of them to rate themselves from having no experience to a small baby amount all the way up to parallel parking during rush hour and then using all that data to create a heat map to then visualize the experiences and skill sets of the team so now you now you have an estimate of what where you have the strengths and weaknesses skill-wise and where you have gaps and you're asking this not just to
know about you know the current state of the team but also to help you evaluate technology decisions the tools you're building the tools you're buying where you need to prioritize mentorship and then where you can Target both training and then hiring if you can and then you can take a look at the technology ology and for these I'd like you to consider that technical capabilities are not product categories and so what do I mean by that uh who here has EDR work running on their work laptop I know you've looked it's like most of you yes and so operational folks uh so I might have a thing called EDR and uh you know your blue teams may have something called EDR
but depending on the vendor the tool the operating systems our environment how we're actually using it how we can actually use it uh what license subscription we purchased uh our capabilities might be really different and then second uh maybe it's because I'm slow uh but product categories are confusing to me um if I asked five of you what xdr was I'd probably get six different answers um so as we're analyzing and designing our technical architecture you could focus on capabilities their use cases instead of product categories so instead of saying well we have EDR you could document things like I have fileless malware detection I have the ability to isolate hosts on the network I can view the process tree you get the
idea um but as you're thinking about these capabilities think about things that are outside of your own team ownership things like what you use during incidents for messaging what you're using for ticketing and alerting all right great we stopped we learned you did great now we're GNA have some fun and design and develop our program and when I design and develop I like to think about it like I'm telling a story because you might leave this talk and get lunch and all you remember is that the last speaker had some cute photos of his kid up on the screen and he told a story about him uh you know choking and that's because we're our
brains are wired to remember stories so when we're building our program I really like to think about it like I'm telling a story I'm putting together a narrative because that's going to actually help stick with the people that are going to build it and also with the rest of the organization so with that in mind as we design to develop we're going to create these two views and first we're going to ask ourselves what processes does our program need because processes tell the story of what we do and so in this process view we're going to build this visual representation of all the processes within our program and I had bad news for you if you are an engineer like me
and are really specific about how things are this is not going to be an exact diagram so deal with it uh I mentally not but here we go I'll start from an ideal beginning our first process is threat modeling and this is where we can do things as complex as building profiles of the threat actors Gathering intrusion sets to understand what threats we care about the most and the story I'm telling here is that this threat detection response program is fueled by threat Intel where Intel both internal and external is collected analyzed and disseminated throughout the program whether that's in the form of threat briefings ioc's hunt packages and the story I'm telling here is that we don't want to reinvent the
wheel we still need the classics I didn't you know make this up in the dark we still need event monitoring triage analysis incident response and the story here here is that we're proactive our Intel often from incidents is used for threat hunting maybe starting with simple automated ioc searching and then moving into more creative data analysis and the story I'm telling here is that we continuously test and improve our program with something called micr purle tests these are tests that simulate threat techniques so you can validate that your detections actually work and that your responses work the analogy I like to use here is that you write some code and then you write some tests right right liar you don't write
tests for every bit of code you write unbelievable but just like that you write a detection if you don't have a test to validate that this detection actually works so simulating some thread actor behavior and then validating that yes an alert would fire but also that when that alert fire an analyst is going to get that alert know what to do or have some automated things happen already otherwise you could write the best Alert in the world and nobody will know what to do with it close is false positive and then a lot of the work that can come into our continuous Improvement so the idea that we're going to take the results of those micro purple tests and
prioritize the new detection capabilities the new response capab abilities those things from often come from threat hunting uh a lot of times you don't find uh the bad guy when you go threat hunting but you find you have massive gaps in your app logs and the story I'm telling here is that visibility without context uh has never been really good enough um so we want our data to be integrated with our threat Intel we want to be able to have the context of our systems our networks our applications and Cloud platforms we want observability then we can engineer detections that use that to find the bad guys and when our detections fire that's when our classical processes can pick up
the event and start the triage and Analysis process and the story I'm telling here is that this program's performance can and should be measured and that the data is used to communicate how we're succeeding improving changing and investing so that we can Circle right back to the beginning where our incident data flows back into our threat Intel collection so that we could continue to inform the program with new threats hunts new threat hunts new threat briefings new detections and new ways that we have to get visibility and the nice thing about a framework is that you don't have to do what I did you can use this as inspiration if you have other ideas we
should talk talk about it after I've had a couple bites of food or I might be grumpy and then if you zoom out you can see that our reporting and metrics can also inform our red teams with new scenarios our security awareness teams with metrics from fishing events so we can better educate our users and our partner security teams the guys that are building the preventions that are supposed to help stop incidents a lot of times we forget to let them know hey we're still having incidents that area so maybe fix that control or help us understand why it's not working great so now we have some processes let's look at building our architecture and we're not going to list
product categories we care about capabilities and we started with the process view because I think it's easier to think about what we need to do and then think about what capabilities we need to execute on those and I'll just point you to some quick resources that you can use when you want to think about what capabilities you might need miter attack you're familiar with it one way I've used it is to walk through the different threat techniques and think about how would I detect this what tech what capabilities what technical capabilities would I need to be able to detect this type of threat activity miter defend was released not too long ago and it's a catalog of
defensive techniques to help answer what specific capabilities would I need to defend against minor attack uh it's a little bit prescriptive um so not always applicable but it does have really great analysis techniques and talks about those in really great detail and has good definitions of those tines have their sock automation Matrix it lists all the common activities a sock would want to automate it includes the obvious things like Alert in enrichment and response automation um but also includes things like uh automatically document documenting things uh and change management because I'm sure you're all just as good at writing documentation as you are about writing tests and then snowflake have their detection series on medium and that
connects the idea of building detections to software engineering and I think there's a lot of lessons that they've already learned over there that we should take over to us so then if we go back to the process View and use it to inform our capability architecture View and we focus on threat Intel and think about the types of data we'd like to be collecting about thread actors intrusions or just raw Intel from the app formerly known as Twitter um but also actionable Intel like ear early warning signals maybe dark web postings from that relate to your organization threats to your brand and reputation and credentials for sale and then we have to think about how
we'll collect it disseminate it and integrate it across all of our tooling so that we can have Intel driven detection and then if we switch over to the classics and think about all the automated capabilities we'll need to do triage and Analysis as well as the capabilities we'll need to do response and then for the more engineering focused processes not just collecting the logs but normaliz Iz in enriching aggregating correlating so that we can get true observability from our logs and then thinking about the type of analytics we'll need to detect threats both behaviors malware across our endpoints networks all the things that you have and then how to automate testing across all the different miter
attack techniques and all the different environments and then you can put all those capabilities together and create our architecture View and I've organized the capabilities in the groups that tend to be more closely related and just as I discussed before the headline for the stories that I tell are often that threat Intel capabilities should enable us to have Intel driven detection so that our threat Behavior analytics our malware analytics our micro purple tests can be directly informed by that Intel so that you can take all that data turn it into something that teams can use to write and engineer detections that generate alerts for automated analysis so that you can trigger your rapid responses across all your various
controls so then you have your long-term architecture and all that seems pretty nice and fancy and whatnot H but maybe it doesn't sound too uh possible you know these are really beautifully crafted keynote diagrams right uh but they require a lot of things um they require funding they require Staffing of teams and so what happens when you have a lack of a budget for those tools and a hiring freeze or you can't retain anybody so let's talk about it and first I'll talk about making the most of the people that you do have so when I transitioned from being an engineer to a manager my perspective changed a lot I had to take a step back
and think about the bigger picture and that's where the idea of building these different viewpoints came from I'm back in an engineering role again and I realize that these viewpoints are vital for everyone but especially the people that are building and operating it because these viewpoints tell the story of what's the most important thing for the team to be working on and if you're in an operational team that's especially needs to be true because you're always busy there's always things coming up and so being able to automatically prioritize can be really helpful if you're trying to build a new team build a new program but you have a lot of historical debt noisy alerts manual tasks I have found two approaches
that work really well the first is to declare bankruptcy like number one USA move right there uh you're not going to solve yourself out of this one uh that probably won't fly with leadership so Outsource it um time box it but maybe bring in a third party stck or somebody but throw bodies at the problem so that your team can focus on building the new thing that's going to work while you still maintain the requirements to handle alerts and incidents in the meantime and then as you prioritize the technical capabilities you need the inevitable question of build versus buy often com comes up we have it lots of times um and as an engineer I really
like to build um but usually what I recommend especially for a program just starting out is to find that 65% solution because you can't ignore your basic principles that you want to be proactive that you're prioritizing Automation and so there's going to be a lot of build that comes with a buy anyway and so by buying it gets you there part of the way and then you can build those program specifics and of course while you build you'll still have incidents you'll still have your operations and those are always going to be the priority but especially at the beginning of implementation you should say no uh let the less critical Things fall on the floor um because we often especially in
operational teams give the appearance of being busy and we are really busy but that sometimes leads to the thought that this program is working just fine like everybody's really busy so we must be detecting everything just fine this program's great but as we'll discuss next changing how and what you report can really open up the eyes of your leadership and the rest of the org and get them on board with this endeavor to shift to a modern program so let's jump in and just like there were Legacy and modern programs I think a lot of you probably work in very modern detection response programs but are still falling into Legacy methods of evaluating reporting and it's preventing
you from getting the support or funding that you need to really succeed so when I say Legacy it means you're still thinking reactive by assuming that time to detect time to respond and time to contain is all there is to say about an incident when I say Legacy it's a report to leadership with the number number of events that happened last month in comparison to this month without any context or providing any meaning I have been guilty of this it's when you report on what you're seeing without any context of what you're not seeing and overall Legacy reporting doesn't tell the business the value that your program brings all they know is that some bad stuff happened there were
more bad things this month than last month and you were very busy while a modern program provides context around the detections in place describing how threats are being detected and what the impact of business is our reporting should focus on what we're seeing can see and seeking funding to see and it provides context to the numbers what are the associated threats what environments can be impacted and where is their visibility and then it quantif mod ifies these at least in the narrative to say what the business risk is like hey remember that ransomware incident we had that might have meant we weren't doing business for weeks maybe months those are low estimates so I want to talk about these
four quick ways to evaluate and report I talk a lot about observability and I think these three questions really simplify it what can we detect today what threats can we see what can't we see what's our landscape coverage do I have the same threat visibility in my cloud platform that I have on my laptops and then taking both of these to answer what's my overall visibility into threats and to answer what we can detect today oh it got light in here hello everyone I mentioned the idea of micr purple testing and that is running tests that validate whether we can or can't detect something that was like a check to see if are you awake everyone arrive
we all good so then here's an example of what running all the different techniques across miter attack and scoring them as pass or fail might look like so in this hypothetical example we can see that we have really strong coverage in our Discovery techniques but uh but attacker could move laterally without being detected and I know none of you have flat networks none of you right right that could be bad for business because you might have poor controls between those environment boundaries and then you can R look at the results of running those across all your different environments or what I'm calling Landscapes and get a different view of the tactics less likely to be
discovered in those different environments so in this story this organization has really good endpoint coverage uh but there's little to no visibility into the containers and that might be bad because like maybe that's where the business apps run and then you can sum it up and provide the overall visibility by looking at the results of our micr purple tests and looking at the detections we have in place today but maybe we haven't written the tests yet to validate and then waiting those by the priority and prevalence of our environments and then these reports tell your leadership what are the different types of threats which ones we can see and where and overall how we're doing
and then of course you could Trend this over time to show where Investments or hiring could enable for faster delivery and then if we look at our metrics for these please move away from giving alert numbers and instead say we know what we can see where and what threats are we seeing the most which ones pose the biggest threats to the business and from what we can see what are the trends and impact and then what preventative controls could reduce that risk so instead of giving you know you had 10,000 alerts this month and this month you last month you had 10 and you know meaningless numbers derived from your events what are the top threats
what are your top Landscapes at risk and then what are the top incident trends that need to be addressed immediately and speaking of incidents you can make really beautiful PowerPoints for leadership and they'll say those were great I love those uh but a lot of times the thing that they walk away the most with is the story of how did things go so telling it from the perspective of an incident that you had and talking about it from your processes we did we had a really great response to it the analyst pulled it you know malware and we we pulled out all the different ioc's from it and then we searched across the org and then we we
correlated those could be a great part of the story and then we went to prevention and we tried to block the IP address and it turns out we can't do it in that environment and so it tells the story from your processes you tell it from your technology View and you can tell it from your people roles too and a lot of times that sticks way better than pretty graphs and so with all those then you can finally present your road map you know what you can and can't detect you know what type of threats are impacting the business and you know how you can make it better so then you make your asks you have the
data to support what you need to do to reduce the risk to the business you've laid out the priorities here's the road map to close those gaps and here's what we need from you to fund it and hire for it so then going forward you can report on the progress and project of the projects that close those gaps by investing so here we are in instead of hiring based on the number of alerts or investigations you can make datadriven decisions based on roles and skill sets instead of threat hunting because it sounds cool uh your process view is a story that tells what your program does how it provides value and how you measure success instead of buying
because Gartner said you should uh you have a vision of your architecture and a prioritized timeline to accomplish it with clear asks to leadership and instead of telling your boss I think we'll detect it maybe I don't know uh you have metrics that describe your threat coverage and performance from previous incidents so hopefully now you've learned how to stop worrying and build a modern detection and response program thanks so much for having [Applause] me and then uh real quick here's my email and my social handles I write a very infrequent I have a toddler so excuses excuses I write a very infrequent newsletter called meard it has an adorable cat that people love and the
security info is just okay um so yeah thanks thank you so much great