← All talks

Death by Metrics: How Numbers Can Kill Your Security

BSides Dublin23:5215 viewsPublished 2025-10Watch on YouTube ↗
Speakers
Tags
StyleTalk
Show transcript [en]

Hi everyone. We're here to talk about something very interesting and that's so passionating. That's um metrics. Um but the bad side of metrics. Um I'm Valentine VM for the seam. Uh so that's what I wrote on this. Um this is my Pokémon edition of metrics. Got to grab them all. But first we got to line on one definition. What are metrics? Metrics are measurements used to track performance, progress or trends. We're okay with that. If somebody tells you 75% of false positives are like 75% false positive rate, it's symmetric. So that's our definition. But first, before I go into this, I want to know a little bit about who you all are. So a little bit of a show of hands. We're

going to do this quick cuz we only have 30 minutes. So raise your hand if you are in a leadership position. [Music] Okay. Technical. IC role compliance risk man GRC sea level. Oh. Oh, this is for you. Um, you're an auditor. Okay. You're just here to wait for lunch. Everyone, you're all of the above or none of the above. Hey, good. just want to get to know the audience a little bit. So, who am I? Uh, primarily I'm a fiction author. I published a epic fantasy series recently some years ago, but I also have now a cyber crime mystery thriller uh on the way. I'm also a staff security engineer at GitLab for the security operations

team. So, we do incident response, detection engineering, trust and safety, whatever. Uh I was also uh crowned the metrics queen at GitLab not because I do excellent metrics but because I talk about it too much. Um I'm also the founder of the women in cyber security community in the Netherlands. Uh we fully did not pick the name for the acronym I swear. And I'm also the organizer of WCON which is a um conference in Netherlands with about 400 500 attendees that puts women in spotlight but everybody's welcome to join. And I love music. I sing. I love theater, hiking and Dungeons and Dragons and Pokemon as well. This is a Pokemon themed uh um presentation. But yeah,

that's a little bit about me. Um I had to put this from my employer. These views are mine. It's totally not related to anything I do at work. So why should I care about metrics? Because I mean you're all here. You got to know why you're here and if you're not waiting for lunch. So metrics actually shape decisions because without metrics you do not have performance indicators, risk indicators, KPIs, price and without good performance indicators or actual metrics that show that you need budget. Well, you don't get any money and then your department doesn't exist and then you don't exist in the company. There's no security and you don't have open in your life because they're all security

people. So that's why you should care about metrics. Without metrics you wouldn't exist. So what is a good metric? A good metric is first of all measurable because you need to actually make a number out of it. It's much easier to compare numbers than to compare you know vibe based stuff and it has to be actionable. So you have to be able to do something with the metric. So if you get a metric you're like what do I do with this information? It's probably not a good one. It has to be relevant for your strategy mission or something like that. If somebody is giving you metric and you're like why are you even telling me this? Maybe it's

not relevant. And it has to provide enough insights that you're able to make conclusions from those metrics. If you get metrics and you're like, what is this supposed to mean? Well, maybe it's not giving you the right insights. And metrics help you predict outcomes, assess risk, give you performance indicators, and all that we talk about. So, an example of a metric that is very famous in the industry of security operations, security in general is the meanantime to detect. So the mean time to detect shows how fast threats are being identified and it's a time based metric so it's measurable it's very easy um it's actionable because you can you know with a good or bad time to stick

you can make decisions on what you're going to improve in your sections or your scene or something like that it's relevant for your mission because if you're a sec ops team or in in street response you want to detect threats so in time to detect might be something good to uh analyze and give you insights because it reveals how quickly your team or other things are able to detect threats. So it helps us track response readiness and all that. So it's a very famous metric in the industry. So what is a bad metric? So we here to talk about bad metrics. Bad metrics. Well, you can't do anything with it or about it. So for

example, if I tell you I had 200 incidents last month, you're like what do I do? But incidents kind of happen to us. We can't really do anything about that. Um doesn't mean anything to you. So if you're in an incident response team and I said I uh I don't know collected 50 potatoes last week can it doesn't mean anything to you doesn't tell you anything or you're not measuring reading it right and for the presentation because we only have 30 minutes we're going to focus on the last point which is measuring and reading metrics right so we're going to focus on wrong maths and bad impressions of metrics and I'm going to throw a few

terms here but to illustrate them I've generated um Pokemon for you. So that's your opponent team, the team of metrics failures. So we've got data related failures which will be the selection bias, data scopes, misleading scales, it's at the pres presentation layer of metric and then false correlations and confounding variables which are the interpretation layer of the metric. So select bias data is collected in a way that it has a high probability of being collected. So, does this image mean anything to anyone? Would anyone like to explain what this image is? I'll give you little bit of a spotlight. Go ahead. If you are that come back and ignore that come back >> exactly then.

>> Yeah. So survivorship bias is a subset of selection bias. It's basically they analyze all the planes that came back and were like, "Oh, we're going to reinforce where they had bullet holes, but these planes came back." So they ignored the planes that, you know, went down, which is probably what they should have like reinforce the other parts maybe. So that is a form of select bias. I'm going to go back to the meanantime detect. So take all alerts or incidents, determine the time detect, and divide the number. What are we missing? Well, everything we didn't detect. So, tracking MTD, NTD is good, but it ignores everything you fa to detect. So, it celebrates your success, but it has

your failures. Now, data screws. The data school is the distribution of your of the data in your data set is causing your metrics to show the majority something for the majority and not the most impactful cases. So you have a lot of data that you're not stratifying correctly. So you're not cleaning sorting your data correctly. And I'll give you an example. So if I tell you we patch 130 plus vulnerabilities this month, that's great. But then I show you the distribution for the severities of those vulnerabilities. And then you see that I keep patching my S3 and S4. So medium and low SE, but my S1 remain unpatched. So I can be like, "Oh, I patched 100

thread plus vulnerabilities a month, but they were all low low severity." So the critical issues actually remain unfixed. So data screws can actually make your security posture look much better than it is. And that's a great way. By the way, pro tip, if you want to lie to your executives, you show kind of metrics. So then they're like, "Oh yeah, we catch 13 probabilities, but they were actually OS scores." It's rewarding quantity over criticality which is that can kill your metrics for your security program. And then we go to misleading scales and this is something that is not only used in security but also in the news politics and everything. So I really love this one.

Um you're basically distorting your chart so that it shows a different story that you want to tell. For example, we see a significant update in security alerts. Yeah, there is a significant update uptake. But can someone want to say what's wrong with the chart? It >> starts at 900. >> Exactly. It starts 900. That's just a 20 difference and out of 900 that is you know almost nothing. So people actually present this data like this. And then I tell you I reduced fishing clicks by 50%. March March to April. But then I give you the actual raw counts and we went from six to three. She's like it's it is 50%. That is correct. But on

this scale it's not statistically significant. So using percentages when the sample size is small skewing the y-axis to tell a different story. These are all examples of misleading skills and they create a false sense of progress or risk or false sense of security. Um which can lead to teams prioritizing the wrong threats. Then we go to last ones. False correlation and confounding variables. I put them together because false correlations is two uh we're making like two data two data sets related while they're not. And confined confounding variables is when you have two data sets that look related and there's actually something else going on which is your confounder. So that's my hope the image is uh rendering correctly but that's my

favorite one to to illustrate that note cats are not that heavy that they call cause holes in the ground they just like the shape of it so let me tell you a story created an awesome security awareness training we did a fishing simulation exercise and the results were significant significantly lower than last time so our training must be super effective but actually there might be something else going on there you could have founders like the quality of your second simulation might have really low that everybody saw through it and you it was a fishing exercise. It could be the word of mouth security. So like employees started talking together and saying oh there's this fishing email and I share

this information or you just had better email filters and a lot of people just didn't receive it. So this can also lead to overconfidence in resilience or uh just overconfidence in your data in general. So just a bit of recap. We went through a lot of concepts in a short amount of time. So how numbers can actually kill your security, how bad metrics or bad quotations can kill your security. Celebrate successes, high failures, false sense of progress, risk, prioritizing the wrong threats, overconfidence performance rewarding quantity over criticality, and making your general security posture look better than it is. So how do you win at a tricks? We got to create good ones. So, I'll give you a

step-by-step guide to really really think about the metrics you're creating. So, first of all, you got to think about why you even want this metric. Is this relevant to your mission? Is this relevant to the company? Does this metric answer the right question? Are you actually measuring the right things? What are you even going to measure? Is it going to be vibe based or number based? Do you know your data? Do you know what's inside your data? Do you know how like your data collection step is very important as well. You don't want to like suffer from survivorship bias or selection bias. Watch out for schools or that. Um, set thresholds, targets. A metric without a target is

not really going to be useful to you or with a threshold. You want don't want to go above something or you want to target something. How will you use this metric? So once you have your metric and it's beautiful and it's in place and you have nice little dashboards, how do you actually use it? That's also very important to ask. Have a plan to action and then you test it. Iterate. Don't assume that the first time the first time you publish this dashboard or whatever that is correct or it's going to give you the right thing. You iterate on it and see if it's relevant in 3 months. So we're going to propose a metric with

our mission statement. A Pokémon got to catch them all. We're going to propose the monthly count of Pokémon caught. So why do you want this metric? To measure how many Pokémon we caught. Does this metric answer the right question? If my mission statement is to catch them all just by measuring raw counts of Pokémon caught, I don't know if I have caught them all. A better metric would be the Pokémon caught ratio. The total number of Pokémon you caught divided by the total number of Pokemon in the world. So it g measure numerical value of progress. It tracks our progress to be your mission and you set a target. We want to catch 75% by the end of year.

And if we don't meet our target, we're going to invest in better Poké Balls. So, death by metrics. Little recap. How do we survive? Well, watch out for bad metrics because they're dangerous. They're going to lie about the state of your security. So, avoid by metrics. Ask yourself if your numbers are telling the truth or if you're just telling a story you want to tailor for your executives so they think everything is fine with the fire emoji. you know you follow the step as the guide when you create new metrics um user critical skills. So ask questions as well. So when the vendor comes to you and says we detect with a 99% success rate ask them like what's

your data set? What's how did you measure this? So always ask those they hate those question by the way. Um and most of the time they can't answer it because you get sales people. Um no offense sales people love you guys. Um, evaluate your data set with the concepts you learned here and craft them all. Make pretty dashboards that make executives happy or your metrics. Any questions? This is where you can find me. And this is the latest cyber crime to shame.

Um, how do you manage leaders? So, from this is we've all kind of been in a situation where we enter into a new employer and we get the monthly report, the daily report, weekly report and all of a sudden you're looking at metrics that are fairly heavily skewed. >> How do you manage leaders into adopting newer metrics that maybe untested will provide better data? >> That's the greatest question of all. How do you conser that your tricks are good? Um, I say show them, walk them through the conclusions like how you got to the point where uh, you know, the metrics were wrong. Get them to also understand what's in for them as well. So like you

can say with bed metrics you're going to have more accurate representation of a program and then and then continues like that. So really walking them through for me at least that's what has worked just walking them through the how you got to conclusions and why it's not good or not as good as thought it was a question you were speaking about

what you detected

what you >> so the problem is like the unknown unknown. So you can never know like what you didn't catch but it's just like when you present the meanantime to tech for example it's it's like one data point among many others. So it's it's like the making the meantime to detect better should not be the goal. It's just like a data point that helps us see like the state of our security and like when you present that to executives. It's just saying like this is not what we're trying to address. It's just part of something here and then you have other metrics and so you don't have only like meantime detecting the thing. you like

respond and contain and all that that also actually show the performance of your team and all that. So it's just always specifying there's like one data point among many others.

>> So we want to show many actions.

Yeah. >> Yeah. It's just if you are able to like um show the data and also like what show like the health of your systems, it also provides more efficient. You had question. >> Could uh could red teaming or security validation help uh show how realistic those are? Because if you know for example how many tox you actually created then how many you detected how long it took you absolutely >> it's never going to be exhaustive though so that's um we we have to live with that unfortunately but that's why it's also important especially if you are like a sack team it's also important to invest like your like the one I mentioned before like it's a time to

respond time to contain like those like things you can fully control. That's why you should put the focus on like the time to detect is a day point, but you it's it's not really anything you you will never be able to detect all the threats. So don't invest all your resources in making that better. >> Um in your opinion, what's one metric if we were to all go back to our own voice and change? What's metric that we should focus on to get dwell time let's say >> even though it's only going to be on thing you actually detect it but can give you so not so dwell time is when is like the time a actor was actually in

your systems but I also use it for like if you had a vulnerability or like just exposure time of something like and that can give you an indentation as to like so when you actually detect the threat how long have they been in there so like how it can give you a little bit of an indication of our like do we detect quickly or do we like even the containment phase like it can show you like broken uh processes within your company to contain. So if your time to detect short but your containment time is big that then it means something is wrong. So it can identify multiple things. Uh it will not be uh like it

will not show you anything you didn't detect but it can sort of tell you like okay we're really bad at actually like detecting beds that have been there in a long like for a long time. So this >> so I think I love dashboards and it is hard to get to look at them and I think I think one of the big challenges is is educating them what does go look like talk about time surely that was lower is better but number of cost that's higher is better >> yeah they're not going to look at the dashboard any suggestions on how you uh how you define what good looks like what is success So I mean I don't know the the the best

answer to give you but the way we do it is we have like good better best stuff and it's all explained in the dashboard itself. So the executive are like the information in one place but just we um it's like it said it's basically says it on the dashboard what does good look like. Um, but your the first thing you said were like getting leaders to look at the dashboards. That is one of the hardest part of this job as well. >> Unless they like dashboards. Some really like it and they're always in but um other questions. In your understanding experience, how reliable you think

especially when you are explaining it to business and all that they sort of incences haven't detect. So my perspective is really >> what's your >> so meanantime to detect alone I don't like it it's better to split like by source for example like or by systems that you have in your environment like because then you have more like by source is really important because then you can see if there's a a source of alerting that is more faulty than the others Um, and that's more actionable than just knowing my time to detect is 5 seconds or something. So alone not good stratifying it. >> Yeah. Just wondering, have you looked much into the overlap between security

metrics and development metrics like metrics? >> Not really. Um do do you have examples of development metrics being better? >> So the door metrics would be things like uh similar to me time meantime defects or frequency of deployment that sort of thing. >> Yeah. So the the meanantime to fix is one we have to but it's yeah it's an interesting uh point to find common areas where we could align. Yeah. Other questions people fire them >> people game gaming the metrics if you find out about it um that's also an indication that your metric might not be the best because if it's easily gameable you might just want to get rid of it. So I'll give you an example like for

example numbers of MRS or like a merge request like that just this metric it's like as a performance metric for people like if you notice that people suddenly starting like tiny MRS for everything that then you have like inclined of like okay they're gaming the system we just got rid of it that's the like no punishment for these people I mean the system was there to be gamed so >> more of action That is true. Yeah. Any other questions? Cool. Guess we're done. [Applause]