← All talks

GT - Who Watches the Watchers? Metric for Security Strategy - Michael Roytman

BSides Las Vegas55:2916 viewsPublished 2016-12Watch on YouTube ↗
About this talk
GT - Who Watches the Watchers? Metric for Security Strategy - Michael Roytman Ground Truth BSidesLV 2015 - Tuscany Hotel - August 05, 2015
Show transcript [en]

fun so this talk is about security metrics but in reality I want to answer two questions the first is going to be what are good metrics what are bad metrics I think we often come up with metrics and then say this one helps me do something or this one is a good measure of risk but we don't have formal definitions who are good metrics are and the truth is lots of people have spent a lot of time thinking about this none of these ideas are really mine but it's important to synthesize them and apply them in security I don't think we have in many senses done so and I think the second question I'm going to answer is

what is your next good startup idea how do you benchmark that startup idea in security specifically is it is the microphone good is this okay cool so my name is Michael reitman I am the senior data scientist thank you for kenna security formerly risk i/o formally as in yesterday and I have access to two large swathes of data the first swath of data is vulnerability scans that our asset and vulnerability data fairly live not very stale from about thousand customers and that stuff updates you know it depends on how the customer sets it but monthly weekly daily and in the aggregate i'm looking at about 200 million vulnerabilities across about a million assets so that's the first data

set I have to work with the second data set is a repository of breaches and during the talk i'll keep specifying what i mean by breach i certainly don't mean you lost a lot of money i mean the technical sense as and something was exploited successfully and this data set is about one and a half billion breach events over the past two years so you know poodle has been exploited something like six hundred thousand times over the past year has any data been stolen with it sure I don't know where or when this data comes from the sands is you storm center alien involve threat exchange Dell SecureWorks verisign high defense all sort of threat feed providers and

you know they usually provide you with threat information like this CD is being actively exploited what I have are the successful exploits being against IDs systems and then I take a look at whether open vulnerability existed on that system without any mitigating control so the exploitation was successful and actually hit the target so I just keep those two data sets of mind that's all i'll be using during this talk and let's start metrics are useful because they allow us to design better services or better products i think uber is the best example both of a system that uses good metrics and of the emergence of good metrics which is something that security is very is faced with in a very real

time right now five years ago uber was not possible uber was impossible because we didn't have location data about users so if you think about 10 years ago when you would hail a cab cabs were really efficient caps had excellent operational metrics about how many times a day a driver stopped how many times they fueled up what's the average cost per carry how many on average how many customers they picked up per hour all of those metrics could tell you what a good cab driver was could help you run a cab company efficiently a dispatcher would sit at a terminal and could use those metrics to run a really good operation and lots of cab companies made throngs

of dollars now the reason nuber was able to undercut their costs is because a new data set emerged that data set was the user's location no longer did a cab driver have to drive around looking for particular customers to pick up but they knew where they were so what changed was the metrics that defined the space also shifted all of a sudden you didn't have to talk about just the cab driver when discussing what a good cabbie was or what the system looked like as a whole now you could talk about user satisfaction how quickly did the user get from point A to point B those same metrics about the cab driver became more detailed and more nuanced you could now

say you know did the driver take the most optimal route because we all we have the data set of all the points and all the routes that they should have taken on this allowed uber to just be more efficient than their competition this also allowed them through generate new metrics and automate a new system that is automated the system of picking you up you don't have to have a human driving around looking for people to pick up because we know where people are and I think that it's a really simplistic analogy but it's a really good analogy for security yesterday we've had a whole bunch of and we've come up with a whole bunch of

metrics about how to run an efficient sock or how to run an efficient vulnerability management practice but the data sets that underlie that in the past five to ten years they've shifted we all of a sudden have data about exploitations in bulk we have ids logs in bulk we have data about which owner abilities are being exploited when we no longer have to rely on a CVS escort to define how important the vulnerability is and when that happens when the data set emerges what we can automate and what we can drive humans away from I guess is has changed all of a sudden I think vulnerability management can be automated I think intrusion detection has been becoming more and more

automated and as I discuss these metrics as I come up with some good criterias for them you'll see that this is really all about automation I think attackers have gotten pretty good at automation and I think that's why we're falling behind automation is a concept you know it's a nice word but it's not it is only useful in so far as it describes the speed of your practice it describes a speed of security operations you know a bunch of humans can be quick but they can't be equipped as quick as an automated system I think the attackers have been on to this for a while my data indicates that attackers have been using automated weaponized exploits in bulk

for a number of years really successfully not just trying metasploit modules but being successful at them and they're able to shift their strategy much more quickly than organizations are able to ship their strategy whatever you by ship their strategy is they can decide that an old vulnerability is one they should go after because they've seen successful exploitation zahner in the past month and shift all of their resources into automating that they can create SL a's for their users saying that we know that this will be successful with this probability by doing so you know they don't need a team of a hundred engineers to do what the to come up with new exploits they can just

use old ones in an automated fashion and be very successful so let's look at how they do that this is the data set for 2014 2015 all of the vulnerabilities that were successfully exploited the size of the bubble indicates the number of successful exploitations the largest of those is around 150 million successful exploitations I two things are really apparent the first is that these vulnerability is only a small fraction of them account for almost the totality of successful exploitation those big bubbles that you see there's maybe 20 of them and they are stable weaponized exploits that are being fired off across the internet and they're hitting successfully for any of a number of reasons maybe because the

systems are outdated and can't be updated maybe because attackers know blocks of the internet with those vulnerabilities whether those exploits are successful the second thing that becomes really apparent is if you I don't know if you can see but the years of those vulnerabilities other than think poodle all of those are really old and they're really old because the tackers have figured out what works and they've automated what works they're not chasing after a patch tuesday vulnerability and thinking oh man i should write a custom script to exploit this one bank I know of in America they're thinking a vulnerability from 2002 has come back with a three percent success rate I'm going to write a script

to attack the entire internet with that and see how I do that way I can guarantee that the exploits I'm selling off to people are successful that way I can measure my program and I can see you know I can calculate an ROI in terms of credit cards and this trend is persistent if you break down to any small scale so it's it's a power-law distribution it's fractal and nature any subset of the data that I look at I see this so sometimes the vulnerability shift which ones are most attacked but some are constantly there the trend is the same though they're pretty old vulnerabilities and a large a small percentage of the vulnerabilities are

responsible for the large percentage of the creatures so if attackers are getting so good at automating things because you know you can't fire off 200 million targeted attacks in q1 of 2014 those are for sure automated that they're getting better we need to get better so what does that mean for us currently in vulnerability management we have automated vulnerability discovery what I mean by that is nobody's you know scraping through their windows OS everybody's firing off scanners we have good scanners we have scanners that we can fire off and a semi automated fashion you x business processes get in the way sometimes we're not scheduling those scans to be repeatable so we're halfway there in terms of how often we scan but

our threat intelligence is incredibly manual I work with a whole bunch of third intelligent speeds and they're super useful and telling you you know active cam playing on TV 2013 this this this but what does that mean I'm getting an XML reporter a PDF and I think that's how most people consume you stuff about vulnerabilities in fact the entire hype cycle of vulnerabilities that we've seen emerged over the past year is an indication of how manual thread intelligences because we need somebody to write an article about hardly to realize that this is a new threat vector that's actually a threat like no we should have that integrated into our scanning technology we should know that

in an automated fashion so that engineers can respond to it in real time not read bloomberg com to figure out that it's available vulnerability to go after and our scoring of those vulnerabilities and this is a consequence of the threat intelligence peak manual is also really manual so this then might are do a really good job of coming up with CBS s course there are descriptive of vulnerabilities but they do this in an incredibly manual fashion they evaluate each vulnerability when it comes out and they decide you know what's the explain ability of this thing what's the confidentiality and I don't want to say they're wrong sometimes but it's just the those are point in time

assessments those are not real time automated ways to score a vulnerability because that threat intelligence it would be a huge undertaking for them to a by it be interpreted and correlated to those vulnerabilities course and our remediation is as a result of these things incredibly manual so you know the number of vulnerability scans that come back with hundreds of thousands of CBS s 10 vulnerabilities are almost all of them and figuring out which of those vulnerabilities you should remediate first is something that's up to people who are making those decisions and I don't think there's a consensus about how to do that you know you could say this is on a critical asset you could

say I can fix a thousand vulnerabilities if I apply this patch so I should do that but are you really reducing the risk to your enterprise when you do that how do you measure that how do you know that the threat intelligence is indicative of is confirming your decision so why are we so manual and here I'm going to introduce you to a recurring theme in this entire talk and that is that metrics and automation are essentially the same thing so we have two requirements to automation and I'll go into more detail the first requirement is data the second requirement is metrics about that data once we have those two we have the secret ingredients to generate

automation so what kind of data do we need we need base rates for exploitation because if attackers are increasing the speed at which they attack things we need to be able to measure that increase in speed and be able to know what it is that they're attacking we need better data about exploit availability so metasploit and showdown pretty good you can buy some into semantics thread feed to figure out what other things are have active exploits around them but that's just the tip of the iceberg in reality there's a ton of exploits out there and the way that we ingest them is we might go to a blog post or a forum and figure out that this cve now has an exploit

available that some black hat hackers are exploiting the steps that it takes to integrate anything that is a URL into your actual operation are way too slow it needs to be automated in there and writing scripts to web scrape is I think not automation there's data about vulnerability trends so we all read threat reports from a vast variety of sources and I think people on security metrics and ciara and in these discussions often try to figure out you know what is the ground truth amongst all those reports which vulnerabilities are actually trending up and what would should we be caring about but we don't do that in any kind of automated fashion we read a bunch of PDF reports we talk

about it and then maybe practitioners come away with some decisions about that we need better data about those breaches so the DB ir is awesome i love the GBI RM its biggest fan but when a breach occurs and we record that it somebody lost two hundred million dollars we have no idea how I mean that data is sparse to begin with but I think it needs to be a priority because that's the only way that we can not the only way but that's the best way that we can integrate data lateral lost data into a vulnerability management practices so in essence we need this better data because we need better metrics and so here's kind of quip

number two about this overall thesis it's that we sometimes make bad decisions but that's because we have bad metrics whether or not you enumerate those metrics or not metrics are decision support for you good metrics are just objective functions for automations but what I mean by metrics are decision support is metrics are a way to convert data into ways that we can make decisions and we construct metrics about a whole variety of things in reality when you make an arbitrary choice between two vulnerabilities to patch on Tuesday or two vulnerabilities that are ten years old to patch you still have some baseline metric or some objective function that you're using to make that decision and if you stop and

think about how you made that choice that choice is a metric that you can probably automate to use it a vulnerability management practice I'll talk about some of those ways some of those things and I'll evaluate some of those things later in the talk but I think i want this talk i want at least the Q&A to be super interactive and people to come up with the ways that they make those prioritization decisions and we'll see if those metrics are good or bad based on a set of criteria I'll presents it so you know these statements are vacuous and nice but what actually makes a metric good it is certainly not the absence of bad but let's talk about

what makes a metric bad first so one hardly got released I started tracking exploitations on hard lead and I would see maybe two or three exploitations per day those dots that you see are three hour time buckets and the missing the gap in july and june is actually just missing theta it's not actually gap so at the beginning the rate of exploitation is really low people are so being successful in exploiting heartbleed once or twice maybe three times per hour once we get to the reverse heartbleed releases we start seeing tens dozens hundreds eventually thousands of successful expectations per hour on Harvey similarly by that time we were already fascinated with shell-shocked and horribly too kind of

been gone the way of the media news cycle but shell shock was seeing those same really low small numbers of exploitation and if we zoom out we can see that when poodle came out that was not the case the way to fire off a successful exploit for puta was super easy and people started doing it in mass in an automated fashion in bulk this is not a story about those vulnerabilities this is rather a story about cvss and you can see that CBS s is a metric to decide which of those vulnerabilities to remediate poodles a 4.3 hard plate is a 5 shell shock as of ten has no basis in the reality of actual exploitation base

rates but these are the things that we know about more interestingly are things that we don't hear about so all of these for other vulnerabilities our vulnerabilities that cluster around hard bleed for me I took a look at all the breakdowns of the CBS s factor for heart bleed and found things that match heartbleed so information disclosure under CBS s score 5 but the caveat here is that I picked only vulnerabilities that have more successful exploitation that hardly does and you can see that a 2013 Java Runtime vulnerability is pretty often exploited you know in the same capacity as hard bleed and there's a whole bunch of other ones but there's a 2001 windows NT windows 2000

vulnerability that is exploit edema lian's of times no matter when you look at it and you know no media hype cycle has picked that up and it wouldn't have in 2001 because we just didn't have this data set available it's impossible to construct a metric around how you capture the successful exploitation of something if you don't have that data but looking at the data we can certainly disprove some metrics and CBS s is one of those metrics oh that went backwards not forwards yes yes absolutely so the way what those dots are what a successful exploitation is is one of two things either an exploit signature was detected on an IDs system and at the

exact same so I have access to the ideologues and have access to the vulnerability scans and at the exact same time on the system of which they exploit was targeted at I know that there existed an open ability that was under mediated had no mitigating control in front of it that match that exploit signature so it's entirely possible that nothing happened it's entirely possible that they compromised not much or there was three more CVS that they had to fire off that they couldn't but to train a base rate of whether this vulnerability is when you should care about this is useful information the second way that successful expectations make it into the data set our indicators have compromised

found by research teams so to say that you know this malware only gets in via this cve and we found this pale a payload on this machine we can estimate that it happened at this hour based on ids occurred and these come in aggregate from managed service providers and from the alienvault oak TX is that does that make sense is that a fair definition of exploit because it's not by any means the commonly accepted definition of a breach I just think it should be yeah but the question is is this exploit going to be successful should I close this vulnerability or not and in that case if the base rate of exploitation is really high and you know that poodle

plus two other things gets you into a whole lot of trouble you should probably close it if there's a lot of activity on it it's more useful in the negative sense which is to say this vulnerability that we really thought mattered only has this many exploitation across a hundred thousand other other enterprises so why are we prioritizing it so high why are we spending time for mediating and of course this is a probabilistic approach to vulnerability management there's also the impact based one but from where I stand I don't know I know nothing about the impact of exploiting a particular machine that's something that all the practitioners back home now and should feed into this kind of system so you

think about first is this thing going to get exploited second if it does what happens and you can slice away a whole lot of vulnerabilities that way so back to this CBS s is a metric that doesn't capture exploitability but it's not the problem CBS s for prioritization is actually the problem because CBS s is used to describe a vulnerability it's not used as a guide for remediation the descriptions that it offers a vulnerability at the point in time in which the vulnerability is released are fairly fairly useful to say that this is an information disclosure vulnerability is a useful attribute of the vulnerability to say that we know there's a confirmed exploit is a useful

facet of a vulnerability however that might change over time for example a new export might come out that when the cv cv was scored did not factor into the system all of a sudden changing how important that vulnerability is or changing the way the urine media vulnerabilities to illustrate this let's take a look at CBS is a predictor of breach volume and what I mean here is if i use CBS s can I reliably say that's something that has scored highly in CBS s is indicative of something that's exploited often and some of the scored low is indicative of something that's not scored that's not exploited very often so the vulnerabilities that were hit only once or twice in the past year

averaged around a seven point three seven on CBS s and the vulnerabilities that are exploited a hundred to a million times in the past year average is 7.98 what does that tell us the difference being so small is actually partially due to two things first is that CBS is not actually a continuous scale it goes from 0 to 10 but when you look at all the permutations of the formula that are possible there's only like 16 or 18 different outcomes and some of them overlap so in reading the CBS s version to change log there's a really good tidbit of emails in the exchange and it says I think they're looking at the distribution after

they've scored all the vulnerabilities and one of the researchers rights to the other saying it looks like 9.2 has a lot of data points around it I'm going to scale it by point 7 and move 9.41 out a little bit further you know expand the way the distribution looks because it's nicer that way that's what's causing this right that's what's causing these like everything falls in the same place there's maybe some slight indication of it being exploited because it's a really an ordinal scale it has fixed values on it and those fixed values that the score of them is arbitrarily determined what actually happened there was they tried to fit a nice distribution to a data

said they had that data set was 2002 I think and since then its expanded wildly and out of control of those little scaling parameters that's what we're seeing here moreover the things that you know the fact that there's only a slight difference between things that are exploited a lot and not a lot means that it's probably binary it's probably that at the top end of the scale these things are somewhat indicative of high exploitations at the bottom it's not but it's not that we have the kind of granularity that allows us to make predictions based on CBS s but that's what we're doing when we say we should only fix than eight nines and tens we're

making a prediction saying that this is the important vulnerability that will end up in a breach yes well that's that's partially true right it's both an impact and an ability metric so an impact metric not being indicative of exploitation rates what does that mean for us so well that so yes it is severity but it is also not how often it's occurring rather how exploitable is it and whether exploits exist right so that calculation being point in time in the score is what's causing these problems i think that calculation and the way it's modeled also kind of obfuscates the impact metrics and this is what i was saying earlier like the fact that is information disclosure is a

useful facet of CBS s the score is not that is the score is scaled by the times that it is exploit adore whether an exploit exists at all and if we don't have information about it the score isn't scaled what that means is that whether or not we have reliable data impacts the score whether or not these expectations are actually happening that's impacts the score but we don't have data factored into the actual equation but those facets those descriptions of the data itself you know availability confidentiality impact those are really useful descriptions of a vulnerability and they can be used in decision making yeah so so i think and i gave a talk with dave severe ski and in

the audience about this at RSA he has a method for doing so called Volm prior I would advocated but I would also say that sometimes you don't need to do that in that if you can identify the facet of the you know the point of data that matters for you and decision making there's no need to take an old score and rescore it if you can say that you know how am I going to risk or it I'm going to say that things that are not you know business critical don't matter to me or things that I know don't have confirmed exploits don't matter to me there might not be a reason to assign a score to

those things now this will become more clear when I talk about metasploit in particular my point is if the scoring metric itself is skewed and you're using your augmenting that scoring metric to make it fit your decision making process better why not just use your own data and use your own removal so part of the answer is because you already have those scores assigned to every vulnerability right that's why you want the skew with this even change the CVS escort to be more indicative of your practice but I think an important step in deciding how you do that is to evaluate those pieces of data by themselves for example if you evaluate just doesn't bility have an exploit or

not and think about whether that's a useful metric for your practice that alone can slice out a huge number of vulnerabilities if that's your decision papules or I don't care about information disclosure vulnerabilities can slice out a large portion as well without needing to change the score and seeing how it competes with the original formula does that make sense we will get there well ok ok so I think Walt alluded to this the reason this is important is because attackers wait did that work let's see that's fitting the reason that that is important is because attackers are changing their tactics daily so while the CBS s score score tries to describe you know the vulnerability is

score by itself you use this in prioritization decision-making there's impact metrics of exploitability metrics the truth is what attackers are going after it's changing every single week is this still spinning excellent what you're seeing here is a week over a week snapshot of that breach data at the top are things that didn't enter the data set last week but are entering it this week it's logarithmically scaled at the bottom are things that were in the data set last week but are not in the data set this week on the right hand side are things that are being exploited millions of times and on the left hand side are things that are being exploited once or

twice the supposition here is the things that are being exploited in the hundreds of thousands or tons of thousands of times our automated attacks and the other ones are targeting this data set is from around 70,000 organizations and in this graphic you're looking at I think 600,000 successful expectations so what this really tries to drive home the point that if you are trying to describe the probability of exploitation of particular ability attackers when it comes to the really targeted vulnerabilities on the left hand side frequently change what they're going after they don't automate those they don't write massive deployments of those they'll fire off a couple and then go after new ones when it comes to some

other ones they're consistently on the right hand side and consistently in the picture they don't enter or exit week-over-week and they're constantly being exploited but they are not necessarily being described because maybe the exploit didn't come out when the CDU was scored maybe your scanner does ranks out of 5 or maybe it's an information disclosure vulnerability but we see many successful exploitations on that vulnerable so what defines a good metric is essentially just good data but there's more than that let's do a quick thought exercise which of these systems is more secure why okay is anybody voting for two they're the exact same asset they're just value is different than they have the same control deployed

on it so yes there are two answers and I agree with the first one it is certainly a system one that is more secure because it's less of a target right the same defenses are implemented but there's less value or risk here but that's a philosophical yeah absolutely I'm saying all things held equal in terms of vulnerabilities and controls everything about those assets

fair enough so the two possible answers are one which is the one I agree with because I think of terms of things in terms of risk and that's a philosophical choice which is exactly what is going to say saying that both are equal is also a completely valid answer because that describes the security of the assets right there's more value at risk at asset to but it's not any more or less secure than asset one and you're right it's probably a semantic application but I think it's actually a philosophical choice of how you determine those asset computations whether you want to incorporate that value at risk or whether you think that what matters is are the controls is the actual security

of the assets and this distinction this may be semantic maybe philosophic distinction is also the distinction in types of metrics that I want to talk about there are type 1 metrics that exclude the real life threat environment that is to say those two assets are equally secure the occurrence rate of actual exploitation zor the value at risk is controlled and there are type two metrics which define the interaction with the threat environment so to say that a saone was more secure was to use a type two metric implicitly was to say that the value at risk there is lower and hence it is more secure to say that they are equally secure is to say just

to exclude the real life threat environment I'm not making a value statement about either one of these types of metrics both are completely valid and both are absolutely necessary in a security practice so let's illustrate that a type one metric is something like you know there's some of these malware for these fishing firms that will send out a phishing email to your entire corporate office and then see how many bite and then like you know you get a return rate of like 23% click on this link so twenty three percent of my employees would fall for a phishing attack that's not true that as a controlled occurrence rate and a crafted message and that is really useful in

determining which of those people in the organization might be susceptible to that kind of risk but that's not the real life threat environment type 1 metrics are useful in getting a base rate of how your system will interact in the abstract type two metrics are useful in saying I am measuring something about the world I'm measuring the number of loss events or the number of exploitation that have happened on those vulnerabilities a better example of a type one metric is the CBS escort it is in the abstract saying this vulnerability poses is you know maybe this at risk or poses this much security and a type two metric would be something like which percentage of my vulnerabilities have a Metasploit

module out in the wild and that's the real life threat environment those vulnerabilities interact with combining those two metrics combining the impact of the CBS s score metrics that it describes and the real life threat environment in which those vulnerabilities exist is what gets you to good metrics I don't think that you can run a good practice without both I think that they're essential to both in the first half and I think this was Walt's contention this was your contention as well I'm talking only about interaction with the threat environment metrics they're missing something essential they're missing the controlled occurrence rate descriptions of the environment and when we add those on we can do a lot better so there are

actually some really accepted definitions of what makes a good metric and I expect that there will be some debate about these as well there was a really good conference maybe a year ago in Germany about security metrics in particular and it had an add on about metrics in general and I kind of read through all those papers and found out what it is that people are using to describe good metrics and bad metrics and this is kind of all of that literature review together there are a few things that described a good metric the first is that it needs to be bounded you can't make decisions on unbounded metrics because there's always two things that could follow above the bound

and then you can't make prioritization or competing decisions about them the second is that it needs to scale metrically that is to say we need to know the difference between a score 90 score 10 and be able to articulate that difference as a result of the way in which CBS s maps just scores it is not metrically scaled I think it is possible to create something that is metrically scale there is possible to make the claim that we've constructed metrically scaled scoring systems a really simple example of a metrical scaled scoring system is binary because 0 is the absence of it and one is it exists objective valid and reliable are mathematical definitions for those

metrics that is to say with the same inputs you get the same output the inputs map to different outputs and reliability is to say that if the underlying data set shifts then the function that determines you know the output of that metric is consistent six is a little tricky which is to say context specific and I think that's my beef with CBS s that is when you use it for prioritization it's not a metric for a period ization it's not contextually constructed to make prioritization based decisions because metrics are decision support and yet people are using it in that way when we say CBS s is a good metric it might be a good metric for

describing a vulnerability or describing the impact of it if we mined out those first three things and the southern point I added up myself which is to say needs to be computed automatically it'll become a parent and an example as to why automatic computation is necessary but I think automatic computation is necessary because you want to be able to make decisions with those metrics and in the world of security if your decisions aren't automated you're a little too slow yes absolutely yeah and I think so you should definitely sync up with Dave svorski after because he has a framework for doing so and you know you could also make CBS s metrically scaled if you just

assign meaning to those differences if you say that the 9.2 one and the 9.43 and CBS s mean these things for my organization then you've essentially made it metrically scale and so the best way to illustrate these is to go through a couple of them here's one and i also want this to be attracted so mean time to incident discovery is a metric a lot of people use do people think this is a good or bad metric like forget these things forget getting into the nitty-gritty of it is that a useful metric for a security practice why not

absolutely

is your organization getting better because the meantime is decreasing or are there less threats or less vulnerabilities out there is it easier to detect them are the tools butter so I think that measures something but I don't think it measures your team over time or you fired two people and so less people are doing detection and they're slower I think that you can have a much more contextual specific metric but i think the overtime a lot of metrics become more useful as well meantime tensed and discovery my opinion is a bad metric because it's not bounded so like it could be infinity days and it often is like 273 days until you've discovered something and when you think about the

meantime to incident discovery that's not very useful because it's possible that you just weren't looking for a particular vulnerability it's like the unemployment rate people who aren't engaging in the job market aren't counted in it but maybe that's because they're past their unemployment benefits and no one we're looking for it so you could have a mean time of incident discovery of ten days and have found molnar abilities in the first day a hundred times and then found vulnerabilities in the twentieth day hundred times and you can also have I mean convince and discovery of ten days where you're pretty good on average but there's one vulnerability that took you a year to find and I think that that

happens often the range of those things is really large i disagree i think that if you're using that metric to determine how many people you should hire you're making the assumption that your time will get reduced absolutely so a problem with that a problem with using that metric to decide how much automation you need is that you can't compute that metric automatically for every innocent response because if you could computer automatically you would have discovered the vulnerability to begin with right if there's an automated way to say this is when the billy entered the system and this is when we discovered it or thought in the incident the incident sorry when the incident occurred and then when the

incident was discovered there's an automated way of doing that that you would have detected the incident okay so like a bit nine carbon black type of solution or like you found that you walk backwards to where it happened sure I suppose that makes sense but that means that every time you're finding the weight if you could walk back to it why don't you just be so the next time this incident occurs you would detect it automatically right because you found the way that it got in was I going mad I could go back to being out that hole now next time something happen tonight here you whatever I still would have to that so that brings up another issue

right if if you can do that you can walk back and discover the thing that caused it then the next time that exact same incident occurs and you're trying to use the metric over time you can automatically say this is this bad thing is happening I've detected in zero time this is at the point of systems like that to figure out like this is what was causing it let's create a rule around that system so we no longer have to worry about it yeah I suppose that make sense is that is that doable in an automated fashion like every time there's answer that make sense um what's a more useful metric for incident discovered it I don't work an incident

discovery at all I'm evaluating these metrics in a very abstract

[Music]

do you mean the difference between like an IP discovery incident that IP got in and this was the attack vector yeah I think when I conceptualized this I was definitely talking about the exploitation the attack vector not what I think both of you are describing here which is like this is the malicious attacker what's the time to discover the attacker so I definitely agree with the context delineation that you made but I still don't think the metric is in any way more bounded than it was before I like what does it mean to be to have a good does it mean if you're good and that other context in your context of it does that mean that you're just

decreasing over time yeah but like so it at a point in time to say 250 days is the mean time to incent of discovery what does that say about an organization but doesn't I don't know like maybe some vulnerabilities take years some incense take years to discover but on average we're doing you know at the top end of the quantiles we're doing pretty well so I I completely agree with your distinction right i think that the different metrics are necessary to measure both of those I just think that both of those metrics and the point of this is just a illustrate that mean time to incident discovery it's not bounded in terms of the metric

itself like it can grow infinitely and by growing infinitely it's a bad metric maybe it's like a really good way to do this is to say of the incidence we've captured in the past 60 days are we doing faster or slower and then that metric is all of a sudden count of the incidents that pertain to exploitation in the past 60 days how many have we captured is that trending over time yeah

good it'd be way better that's the Exxon valid I completely agree with you yeah I don't think it's a mallet valid metric because multiple inputs into that metric can produce the same output that's not descriptive of the actual system state which is what you're saying the median or the top 10 so I think actually like something like the uppermost quantile is a better description because we want to be good in the cases where we capture them quickly and there's a whole bunch of reasons why it might take a year to capture something there's only a few as to why we're good or looking at both the top at the bottom quantile useful metrics so away from incident discovery

and caveat I don't know much about anything except willner abilities this is an important context for this let's talk about some other metrics the point is to show that this framework and I think we've done that in this discussion is to show that thinking about these seven things maybe there's more things maybe there's less things allows us to figure to have a discussion about whether the metrics make sense how can we make them better and that applies to all sorts of metrics not just security metrics certainly vulnerability metrics this one I know something about what do people think about scanning coverage is a metric for vulnerability management in the context of vulnerability management does anybody actually use scanning

coverage as a metric yeah absolutely

absolutely so in the abstract vulnerability scanning coverage doesn't mean a thing but if you have but if you have asset inventory and you say of these thousand assets yeah of course but I think there are sections of businesses that do have an accurate as an immature

so that's fair back to this framework then I think it matches all seven of these what is missing so you think this metric is in context specific because the underlying data that it's trying to describe is shifting or we don't have a good description

it's bounded by your assumption right ok I think that's fair um so does that mean the what do we know at all how this metric my tribute to our knowledge on top of what we do it may ever that is or may have a profession if it helps us learn right some decisions how we need to be behind important maybe it's not about homes on the standing at all let's put our effort of you getting bettering the quarry particular ways makes about staffing decisions so meantime light in prostatic decisions wasted other metrics mom looking at behind your remediation and caption I get your starting to look on the distribution and helping them discover which probabilities where it is

we're never found a person so while you can look for print respectively veterans say we have my some properties be pleasant be I don't think it it forms whether or not remember exactly so not a response to that but I think the way that I mean these are necessary and sufficient criteria for a good metric I want to disagree with the fact that the you know you said mean time to incident discovery of vulnerability discovery can be useful to start a conversation about whatever I think something like median bounded time to incident discovery can be just as useful in starting a more robust conversation if it matches these necessary and sufficient criteria so I think you're

correct and saying that there's you know other criteria for these metrics um and I think context specificity is one of the most important ones that's my whole heart with CBS us in this discussion but I think if a metric fails these tests we ought to rethink that metric or reject it not that you know so this is a really good example the vulnerabilities covering coverage maybe passes in my opinion but it doesn't because it's not context-specific in other sense yeah so do we think that a temporal component to evaluating a metric is useful and I think russ's point is that metrics are useful in the object observer system not but by themselves but I think some of

what we're trying to do in these discussions is define how they interact with the observer system right you guys are the observers of these metrics in the real world and we're trying to define how they're interacting so I think if we can come up with criteria as to what makes it more likely to be successful that's useful so I have like a minute no minutes let me jump ahead so one of the interesting things what are the most interesting things here is that metrics about so bring it back to CBS s there's a lot of data contained in there and there are metrics within it that are useful by themselves that are valid by themselves and pass all of these tests

for example the binary choice of does something have an exploit or not passes all of these tests and is context specific to prioritization because when you try to prioritize vulnerability is that as part of what you're trying to describe you're describing what is the probability that i'll get hit on that vulnerability it's about it the ordinal scale of the CBS s it's bounded it symmetrically scaled metrics that are using information in the way that it's it's in the way that it's supposed to be used like Russ said like what is the context house interact with the system I think there are some metrics like doesn't metasploit exist in a vulnerability what is the percentage of

metasploit exploitable vulnerabilities in my environment that do those things that are useful not just because they are valid metrics but because they're contextually specific to the task at hand I'm going to post these slides afterwards we're out of time there's some data about how these metrics perform and how they capture those data sets of successful exploitations so like metasploit being present under ulnar ability much better indicator of being in that data set for the probability of exploitation on an open live vulnerability in those vulnerability scans then a CBS escort by itself is but you can leave through those there's some nice graphs afterwards and I guess the whole point of the talk is that if if we

can come up with criteria for what justifies a good metric and this is what i meant from the beginning about your good startup idea is that once we can agree in a good framework for what defines a good metric because our data sets have shifted and they generate new metrics we can also use those metrics to create automation systems to create products that will make our jobs much easier and unless we have a standard type of definitions or a standard framework for defining is this metric good or bad we're going to keep coming up with multiple vendors spinning out different metrics that are not necessarily tested by the community so i think i'm really glad there was a

discussion here it's an open challenge to the community to come up with a feasible framework for evaluating these metrics fair to say mine is insufficient it requires more context but let's come up with that context and then you can look at a vendor and say this is a useful dashboard this is not a useful dashboard this is something I should automate in my practice this is something that requires human effort you