The Fault in Our Metrics: Rethinking How We Measure Detection & Response

BSides SLC · 202424:2126 viewsPublished 2024-09Watch on YouTube ↗

Speakers

Allyn Stott

Tags

StyleTalk

Show transcript [en]

hey y'all thanks for coming to my talk I will apologize I am a speaker that's losing his voice so I sound a little bit like Elmo and Cookie Monster in a fight um I've worked in detection and response for the last decade and I've made a lot of mistakes but especially when it comes to metrics this is the talk I wish I had seen today you'll get two things a new maturity model to help describe and measure detection response capabilities and a framework to help you build much better metrics and my story with metrics starts on a Monday morning I'm only a few months into a new job and I get a message from my boss and he says that

the board of director meeting is coming up and he's looking for updated program metrics and you can tell by my response that I'm new to senior leadership I don't ask questions and I'm eager to please so I send the message to my new team and I ask hey what have we presented in the past and what's the response that's right oh no bad news last guy made those up and good news I'm going to do so much better how many of you have had this happen where you inherit somebody else's metric mess this is often our starting point metrics that have not been well thought out and maybe worse fudged to avoid questions or more

work so I did what you did I Googled it and then I ended up just copying the metrics for my last job and that's led to me using lots of bad metrics but so what why do we care about metrics well one reason might be that metrics help Drive improvements uh Carl Pearson is a late 1800s early 1900s guy and widely viewed as the founder of modern statistics and he's got this quote that will always come up if you start googling metrics that which is measured improves which at first sounds like it's just a great plug for metrics but there's an implied warning in that me message uh what if you're measuring the wrong thing

there's a paper written by these two guys at of MIT Hower and cats called metrics you are what you measure they talk about how as you pay more attention to metrics you start to make decisions and take actions to improve those metrics the metrics you chose are improving and over time you'll become what you measure you'll become what you measure metrics also help us communicate what we do and why people should care Edward tufty who by the way teaches maybe one of the best courses on presenting data he says metrics reveal data metrics are a tool that enable us to present the greatest number of ideas in the shortest amount of time with the least ink in the

smallest space and why well let's be honest because we need a budget and we need headcount and metrics are usually the tool that we use to communicate that metrics help us show our value and they show a and demonstrate a return on investment but why are security metrics hard I gave a version of this talk before and somebody said because we're trying to prove a negative that's partially true too in my experience metrics are hard because I'm a security person and I don't care that much about metrics you men here's a much less famous quote metrics are an annoying PowerPoint I need to update every month that one's from me a bit about me I'm a senior staff

engineer at Airbnb I've lost my Bo and I work on fun things like Enterprise security threat detection and incident response and I really love my job I live in Austin Texas with my wife and three-year-old son Liam who probably got me sick in the throat here uh and I really love being a dad and a husband and one thing I'm really good at as a husband as a dad and as a security engineer is I'm really good at making mistakes and this is the point of the the talk where I'm supposed to gain some credibility with all of you tell you about my 15 years of experience but in the last 15 years the reality is that

I've just made a lot of mistakes let me tell you about three of them and the first terrible mistake I've made is losing sight of the goal this year marks my 10-year anniversary of being on call and for those of us that spend our days triaging alerts and responding to fires it can be really easy to lose sight of the goal so we end up describing those Frontline operations with metrics like this one yep and here's a metric that shows the number of security alerts per month you've seen this metric you probably have this metric today and if we take a closer look at this metric we see in the past year March and April had the most alerts my

boss will ask me a question about that and if you keep looking at it it looks like alerts are generally trending down did we do that or maybe we just turned off some logging in February alert count has become the heartbeat metric for security operations instead of rooting back to our goal of detecting threats and responding quickly and effectively we've reduced ourselves to cries for help I've come to call this metric the operational burden we've inflicted on ourselves another title might be we're doing things it's crazy out there maybe it's fear-driven scare leadership with a bunch of alerts and sometimes we try to make it a bit better we break it down with true and false positives I've been proud to

do this but really this only shows how much I've lost sight of the goal this graph makes a lot of assumptions like there's a direct correlation between reducing false positives and reducing operational load and you might be thinking wait but doesn't it this grap also assumes that less false positives means higher quality alert analysis or is it the opposite do alerts mean you have better visibility in your environment and because I live in the operations world I find it's really easy to lose sight of the goal and I don't even know where to start when I create metrics so to help you start thinking about your own metrics I thought about all the different measurable activities

in detection and response that can help us make decisions and see if we're improving and then there's an acronym so you'll remember it and the first category of work is streamlined and this is where our Ops metrics live this is usually focused on efficiency accuracy accuracy and automation awareness is where we take our threat Intel and turn it into our lists of top threats and Trends vigilance is where we describe our visibility and detection coverage for known threats exploration is for the results of our threat hunts and proactive investigations and Readiness is the measurement that shows whether we're prepared for the next big incident so when you're thinking about your own metrics think about which sa category

the metric would fall under and this can help you tie it back to your goal or your outcome and then to figure out what category a metric should fall under we can ask what question s does this metric answer so what question were we trying to answer with this metric maybe it was our false positives taking up too much of our time or do we have enough time to investigate our true positives but how do we control this metric how do we reduce false positives alert tuning how's that going for y'all about as good as it is for me this is a streamline metric and streamline metrics usually answer questions about efficiency accuracy and Automation and I

have two big problems with this metric first this metric doesn't tell me where I'm spending most of my time and second the only control I have for this is turning or tuning alerts off so how can we make it better so here's a graph of time spent on false positives and I've completely removed true positives for now because for now I'm saying I'll spend as much time as I need on my true positives but instead of tracking how many false positives there are I'm tracking how much time is being spent on them so how much time you spend working on an alert manually that could be as simple as measuring the time an alert is assigned

to the time that it's marked as a false positive now if your team is anything like mine we have this bad habit where when we're working the alert queue what do we do we just select all the alerts and then we assign them to ourselves and why why do we do that what metric makes us do that time to assign maybe the silliest metric we could have possibly invented so if we stop measuring it or prioritizing it this metric suddenly will get a lot more accurate because people won't be as motivated to just oh get my time to assign down that matters so how do we control this metric what can we do to improve this

metric well we've been talking about a lot today Automation and as we get more automation tools the number of events may not even equate to how much time we're spending on our false positives and then as you automate carry the time that you spend manually over to automated and this lets you do something really cool this lets you speak to the amount of human hours your automation efforts are saving and now you're not incentivized to just tune alerts you're incentivized to find where's the most manual time being spent so that we can move that to automated mistake number two thinking proxy metrics are bad or more simply over engine ing to create this awesome metric with an insane cost when a much

less perfect but correlating metric would have been good enough here's a great example eight years ago my team and I determined that we wanted to see our miter attack coverage so that we could better determine what types of activities we could see and not see and this was before miter attack coverage was like the cool thing to do so we determined that we'd have to write tests across the entire framework and once we got going we figured well one test per technique won't tell us much so we'll need a lot of those and we've also got Windows and Mac and Linux so we'll need tests for all of those and then after years of developing

tests and investing in tooling we finally had the data we needed to visualize our attack detection coverage side note I saw a tweet the other day that said we need to do a better job of mocking vendors that claim a 100% miter attack coverage but many reasons for that but first being I've seen the Carnage that 100% coverage brings hint it's alert fatigue like you wouldn't believe anyway we spent years Gathering all this data and it's cool but at the end of the day all we really wanted to know was where do we prioritize our detection building so do this instead rather than trying to measure your detection coverage across the entire attack Matrix start by finding your top five threats

that you care about the most don't overthink it look at your external thread Intel think about what type of Industry you're in what type of environment you have and then look at your incident Trends what types of incidents are reoccurring and then link those back to your organization's security risks what would be a really bad day for your company if data was exfiltrated what data would make the Chief privacy officer just cry the most and then once you've got your top five prioritize your detection coverage there I like to Workshop these as a team where we'll take each one of those top threats and then we'll take break them out in the groups and then we'll use

attack to derive the different techniques and sub techniques Associated to that threat and as you write your tests and detections you will slowly end up building building yourself a prioritized miter attack coverage map but without all the alert fatigue and years of building a costly metric mistake number three asking why instead of how and my natural inclination inclination is to ask why why didn't we detect this malware sooner why are we still missing these firewall logs and as a dad I have a lot of questions a lot of why questions why did we bring the car seat when we only took one taxi ride the whole trip why do we need four suitcases why didn't we bring the

stroller why can't Liam walk by himself but in all of these examples why is not helping so instead I've learned to move straight to the how and start figuring out what actually needs to be done and often answering how allows you to identify the underlying problem much faster and with a much more positive perspective especially from your spouse I mean your co-worker how can I carry Liam a car seat and two suitcases through the airport how can we detect these threats sooner how can we respond faster when I interviewed with my current VP she asked me how do we build a modern detection and response program how do we get there simple question not a simple answer how

do we describe where we are today and where we're going and it made me think about maturity models and my first exposure to maturity models was the hunting maturity model hmm and the hunting maturity model was really helpful in describing the maturity of threat hunting and what we needed to do to get to the next maturity level maturity models help us answer these questions where are we now what tools and processes do we have what's the current situation what are our challenges and where are we going what should the future look like where do we want to be in a couple years how do we get there what are our objectives how are we going to achieve

them so I created this threat detection and response maturity model and the TDR maturity model builds off of the hunting maturity model and expands it across all the different areas of detection and response and there's a lot to it but at the end I'll provide a link that'll give you the full maturity model to use and here are the pillars of it the first is observability it's the foundation that we build our detection and response capabilities on it's having the tools and logs that give us visibility into our entities and user activity it's enriching it so that we can contextualize that data and search it quickly the second pillar is proactive threat detection where we focus on

collecting thread Intel so we can prioritize the detections that we build and buy and the Hunts we perform and the third is rapid response where we prepare by having complete playbooks enrichments and automations so that we can move from triage to analysis with the forensic capabilities we need to respond as quickly and effectively as possible and we can use these pillars and these 14 capabilities to describe and measure where we are today and where we want to go next and the first question we want to ask is where are we today so for each capability in the framework you'll rate the maturity across four different areas and you'll rate each of them from initial all the way up to Leading and

within the framework there's a lot of specific detail for each type of capability but here's some general ones for now and so for example if we rate our detection engine capability and we think about the processes we have do we have a process for creating a detection that for example looks for firsttime occurrences do we have a process that defines how do we determine thresholds and then we rate our tools as detection are are detections centralized and managed from a single location and then documentation or what's been the case for most of my career the lack thereof and then finally testing how do we validate that the logic we're using to determine firsttime occurrences how

do we know it's working and then once you rated all your capabilities you can calculate your current state and where you plan to focus on improving and here's an example of how you could use the model to visualize your current program's maturity and you can show a comparison of where you plan to be by the end of the year based on the projects and initiatives that you've prioritized so now with the maturity model you have a way to describe where you are and how you're going to get to your target maturity but as we do work we'll need metrics to show results are we getting better are we still on track do we need to adjust our

strategy and that's where the sa framework can come in again and for each metric you create you'll put it into this structure here we want to avoid mistake number one losing sight of the goal so what question does this metric answer what's the outcome we're looking to achieve and then use the sa categories streamline awareness vigilance exploration and Readiness to help tie us back to our outcome and then make sure that metric something we can control today think about the levers that control that metric and then if you have control of a metric what risks could this measurement reward I was talking to a buddy of mine and we were talking about metrics and was talking about the time to analyze

metric and it was a really big pain point in the sock he was working in overall analysis was taking a lot longer than they expected so they brought it up with the team they said hey the time to analyze it's really high and we need to figure out ways to bring it down so guess what you won't believe it it went down and then guess what else went down quality of analysis and guess what went up true positives missed so when you introduce a new metric think about the potential risky behavior that could be rewarding and it might not be a bad metric you just might want to think about the companion metrics that need to

go along with it because remember we are what we measure and then there's metric expiration when is this metric no longer needed when our only lever was alert tuning it made sense more to to track the number of false positives but now that we have automation tooling maybe it's less important that we track the number of alert counts or at least remove it from our leadership decks the next three fields are data requirements effort and cost or simply how much data the metric requires how much new effort we're going to need to improve that metric and then how much time does it cost to collect this metric remember mistake number two thinking proxy metrics were bad testing 100%

across miter attack framework is cool but you might not need to and anytime I talk about metrics I always get asked at the end so how do I change the bad metrics I'm already presenting today and I get it change is hard leadership doesn't like surprises and they often have expectations that you'll be updating last month's slide deck but I have one tip that's worked really well for me now here I've convinced my friend Dexter and he is still my friend much to the Delight of my toddler here to get into near freezing water and when Dexter entered the water his first reaction was shock his heart rate spiked his stress hormone spiked and when he hit the water he

gasped and he had to work to not hyperventilate but then suddenly clarity it's the same when you change your metrics it's not going to be fun immediately and some people will go into a State of Shock especially when those bad metrics have been around for a while they've gotten used to them but my tip is to embrace it push through the change because you too will soon have Clarity so now you have some tools to help you rethink how you measure detection and response so instead of making wild guesses about whether you're improving you have the TDR maturity model to measure your capabilities instead of using volume counts fear tactics and tired emojis you can use sa to get to the core of a

metric ask better questions and map that to something you can control today and instead of focusing on 100% miter attack coverage you focus on the threats that matter the most to the business right now and are working on having detection coverage that have real impact so hopefully this talk is your wakeup call take the cold plunge and rethink your detection and response metrics thank

you and real quick here's my link tree it's got my Twitter and Linkedin handles plus there's a copy of the slide deck on there and the complete TDR maturity model I also write a very in frequent newsletter remember I have a toddler called meard uh it has an adorable cat that people love and the security info is half decent um we're out of time but I'll be sticking around in the back for a little bit and would be happy to chat and I have some cute cat stickers too thank you very much

The Fault in Our Metrics: Rethinking How We Measure Detection & Response

Related talks