← All talks

Sara Anstey - Educating Your Guesses: How To Quantify Risk and Uncertainty

BSides Knoxville37:39178 viewsPublished 2023-05Watch on YouTube ↗
About this talk
At its core, cybersecurity is all about risk. We need to understand, report, and mitigate our risk. However, the industry adopted methods for analyzing risk lead to inaccurate assessments, invalid math, and ultimately bad decision making and spending. I will show you why, and how to fix it. Asking for budget and justifying spend in cybersecurity departments can be a difficult task due to limited data and high uncertainty of future events. This talk will dive into quantitative risk analysis as it relates to cybersecurity - how to model uncertain events and understand financial risk. Attendees will see a first hand demonstration of how quantitative modeling can be used to communicate risk and understand ROI. Attendees will walk away with the tools needed to present cyber risk as a dollar amount that can be easily understood by other business decision makers at their company.
Show transcript [en]

for our next talk here we have Sarah anstey is that is that right ansty yeah uh is presenting educating your guesses how to quantify risk and uncertainty cool my name is and I work for Nova Coast as their director of data analytics so I probably have like a relatively different background than a lot of people at this conference or in the room because my background in training is not actually in cyber security or I.T or anything like that um I'm actually more of a statistician and data scientists by trade um who just so happened to get into cyber security about six years ago now and have been trying to take a lot of statistical methodologies and ways in the math and data world that we do things into cyber security and kind of applying them to some of the problems and challenges that we have in in cyber and I.T um so today I'm going to keep it like pretty casual and stuff but I'm going to be talking about risk and uncertainty and how we can model that and understand that kind of go over some of the ways that it's currently or historically been done in the industry what I think is wrong with them has the statistician and then different models that we can use to actually quantify risk so I think before we really hop into any of that we need to talk about you know what is risk um and really defining that before we can try to quantify it so risk in my opinion is anything that's unknown um so whether that's a risk in cyber security or even a risk in life it's any time where you don't know what the outcome of something is going to be so think about taking a risk in real life right why is it a risk it's probably because there's some uncertainty around it you don't exactly know how things are going to end up so it's the same in cyber security a lot of the times the different risks we're looking at are the risk of getting breached you know the risk of ransomware or something like that or a vulnerability being exploited all of those things are things that you know we don't know if they're going to happen to us if they do we don't know when and to what severity so it's really all about uncertainty and unknown which I think begs is a question if risk inherently means we don't know what's going to happen and it's it's all about the unknown how can we possibly measure it and how can we quantify it right it's kind of like it seems counterintuitive that you would be able to do that if inherently it's unknown and so before I get into some of the methodologies of how I think that we can do that and how we can use you know stats and Mathematics to do it I want to talk a little bit about the ways that it's currently and historically been done in cyber security so everyone is probably pretty familiar with this this is like your typical risk Matrix right so whenever you're rating something on a scale of like low medium and high maybe you're doing one to three or a one to five scale or something this is pretty common in cyber security um so you can see we're looking at things like what's the impact going to be versus the likelihood and you can rate things low medium to High um sometimes I see like a lot of like one to three scores or things like that this is what we call a qualitative method for understanding uncertainty or risk right so say someone comes to you and they say hey what's the risk of this thing happening and you say medium that would be a qualitative answer it's not quantitative in nature and so these are what's used a lot of the time and the reasoning that I most often hear for why we're rating things on on these types of scales is that because it's unknown we don't have much data right we don't have much input data so we can't possibly say with a hundred percent certainty that there's a 12.4 chance of this exploit happening in the next five months you know we we don't know right it's risk we don't have that input data but we think we're relatively well protected so we'll say it's like a low a low chance and people feel more comfortable saying that because they feel like they can't get exact and precise in their measurements but I wanna I wanna talk a little bit about where this starts to fall apart so let's say I'm you know a data person and I'm trying to understand for our organization what's the risk that we get breached in the next year right really broad question kind of hard to answer and so I've got our suso and our CTO and I'm gonna go ask them both the same question on a scale of one to five what do you think the chance that we get breached in the next year is so I go and I ask the CSO and then she says like you know we've made all of these great improvements in cyber security and we're doing so well and we've got all this tools and funding and whatever um we're really doing pretty good there's probably only like a three percent chance we get breached in the next year they're kind of you know thinking in their head so they say that's a one on a scale of one to five and they tell me we think we're at a one okay so then I go to the CTO I ask the same question and maybe the CTO is like yeah the CSO is an idiot and we have no idea what we're doing and who knows right and they say you know in their head they're like oh there's probably like a 17 maybe 18 chance to get breached in the next year so they'll rate that a one on a scale of one to five just like you could see here and so now both the CSO and the CTO have agreed there's a one on a scale of one to five chance of this risk occurring of getting breached in the next year but all of us in cyber security know there's actually a huge difference between a three percent and you know a 19 chance you get breached in the next year that's really really significant difference but because of the scale that I asked them to use when giving that rating you know we're not seeing that difference we're kind of getting rid of precision and so I want to show another example of how these things really start to fall apart so now let's say we've got two risks risk a and risk B risk a we say has a 50 likelihood and a 9 million dollar impact risk B has a 60 likelihood in the two million dollar impact so using this risk Matrix so it's like I just said there's nothing crazy it's a common risk Matrix we would rate risk a as a medium and risk B is a high by just using the scales but if you do a very quick calculation for expected loss which would be impact times likelihood you can see risk a has a 4.5 million dollar expected loss and risk B only has a 1.2 million dollar expected loss so now not only are we like losing some clarity and precision because of the scales that we're using but we're actually understanding our risk worse right we have come to the wrong decision using a risk Matrix like that about risk a versus risk B and at a high level this is a concept we refer to as analysis placebo analysis Placebo basically is just a broad term for any time that the measurement or the scale that you're using to try to understand data or you know understand some something statistically is actually giving you either no measurable information or giving you a worse understanding of your risk just because of the actual methodology you're using to assess it which we can see happen right here right because we're using a qualitative risk Matrix approach like this we're actually getting a worse understanding of our risk and at a at a lower level too just to say that the actual type of analysis per se or analysis Placebo at play here is called range compression so basically we're compressing quantitative data into qualitative ranges and then what happens is people try to do mathematical operations on it right so we rate things a one two or three let's say and then people try to say well a one we have a one and a two so the average is 1.5 but that's the exact same as saying we've got a low and a medium so we'll call it a low and a half right because you can replace one with low and three with high and so forth anytime you can take those numbers right and just replace them easily with words it's a really good indication that you're probably not using a quantitative scale you could do the same thing on a you know one to five scale you could replace it all with improbable so um Regional Etc so if these are some of the problems with the way that we're currently understanding risk at a high level in cyber security how can we fix it well this kind of begs another interesting question because if we go back to what I said at the beginning that risk is anything that's inherently unknown then anytime we're trying to measure or understand our risk to an extent it's going to be a guess because we can't know for sure what's going to happen in the future we're always estimating or guessing and it kind of begs the question is it okay to guess right and then we move into this question of are you a good guesser right and is it possible to make yourself a better guesser so before we get into some of the quantitative methodology I want to run through a little exercise to show everyone you know of whether or not they're inherently good guessers and see if we can learn how to be a better guesser so um I like to think that um you know I can read people pretty well and you know stuff like that and I can just tell everyone in this room is a huge fan of reality TV I can just yeah I can just tell um and so I want to talk about how many seasons of The Bachelor have there been and do not Google it um I love reality television let me explain um because I know that's going to you know invalidate me if you have never seen The Bachelor or The Bachelorette it's on for two hours every Monday get yourself a bottle of wine drink the whole thing watch the show by the end of it you will feel so much better about your life I mean seriously it's really therapeutic right okay but so how many seasons of The Bachelor have there been why did I pick this question let's think about it a lot of you probably if I'm going by stereotypes and cyber security don't watch reality television or the bachelor right um so you probably don't know off the top of your head what the answer to this question is but you probably you know maybe you've heard of The Bachelor or maybe someone you know watches it maybe you have like a little bit of information or history with the show right so I want everyone actually real quick so everyone in their head right now I want you to think about what you think the answer to this question is but don't just think of a number think of a 90 confidence interval range so if I can take us all back to college statistics real quick a confidence interval right a 90 confidence interval you're going to give a lower bound and an upper bound to what you think the answer to this question is that you're 90 sure the right answer would fall on that bound so about a 10 chance you know it's it's outside of that but you're 90 confident that it's between these two numbers kind of think about that in your head right now now let's say we're gonna play a game unfortunately b-sides didn't give me the budget to actually give you guys a thousand dollars so hypothetically let's say I was giving you the chance to win a thousand dollars okay I'm gonna give you two options for the way to potentially win the money option number one if the correct answer to that question is within the confidence interval in your head you win a thousand dollars if not you don't option number two spin the wheel if it lands in green you win a thousand dollars if you don't if it doesn't you don't okay so think about instinct which one would you choose right just it's not a trick question it's just like instinct which one first popped into your head that you would you would prefer so now let's all go to mentee.com real quick this will be the only time I ask for audience participation but it'll be interesting to see to see what people answer we'll go to this code right this is going to be a quick live poll um it's not going to make you download an app you don't have to sign in all right and I'm gonna just pull up monkey and the code's still up there at the Top If you need it but what was that instinct which one would you choose I want to see what the audience thinks I'd also just like to point out a room full of cyber Security Professionals and you all just go to whatever link I tell you I think we might need to reevaluate so would you choose a confidence interval would you spin the wheel or do you have no preference you would do either and it's not a trick question it was what was your gut It's Your Instinct right pretty even interesting so about half would choose their confidence interval a half it's been the wheel most people had a preference okay most people had an incident okay interesting so I won't leave you guys hanging the correct answer to the question 27 seasons and this is I I should have said this before this is just the bachelor we're not does not include the bachelorette bachelor pad bachelor in Paradise Bachelor Winter Games okay just the bachelor 27 Seasons it was 26 I had to change my slides this morning because one season just ended but Zach was like the most boring Bachelor of all time it's not worth watching he is terrible terrible not worth it save yourself okay so real quick one more time let's go back to mentee here hold on things are going crazy present okay was that answer within your original confidence interval yes or no be honest let's vote let's see how we did so was 27 contained within your original confidence interval promise there's a point to all this by the way other than it just being fun it relates back to quantification okay so about 25 75 30 70-ish majority though no that answer wasn't within your original confidence interval so here's what's interesting about these results so about 30 of people had it in their confidence interval about 70 percent didn't if we were let's say a perfectly calibrated room which means if everyone in this room was a really good guesser we would expect 90 of people to have that answer in their confidence interval and 10 of people not to right because what did I ask you I asked for a 90 confidence interval to the answer to this question which means 90 of all the intervals given if you gave your true 90 confidence interval should have had the right answer even if you didn't know the answer what you guys really gave on average as an audience was your 30 confidence interval right on on average that's actually what you all gave as your confidence intervals in the audience which means that you guys are what 60 percent overconfident in general um it's not just because you guys are in cyber security that you're all over confident it's actually human nature there's there has been so many psychological studies done to prove that when people inherently have to estimate something they are almost always incredibly overconfident and this is the reason that when your boss asks you how long a project is going to take and you say four weeks it ends up taking five months right like how often does that happen it's because we're not inherently good estimators no humans are but the really cool thing is we can actually learn to be better there's a few psychological tricks so going let's go back to this let's go back to the wheel inherently if right away You're you your thought was that you wanted to spin the wheel that means that you thought there was a better chance of winning a thousand dollars when spinning the wheel than going with your confidence interval but if you truly gave your 90 confidence interval you should have had no preference between the two because both in your mind should have represented a 90 chance of winning a thousand dollars so if you wanted to spin the wheel then what you should do is go back and widen your confidence interval right make both ends wider until you have no preference between the two and then if you're inherently if you wanted to choose your confidence interval you should shrink your interval until you have no preference between the two and that's called the equivalent bets method it's one of a bunch of psychological ways that people can actually become better estimators but it basically just proves that whenever you give monetary um potential winnings or monetary loss on the line even if it's fake in this example you can actually train yourself to be a better estimator by putting you know money at risk so just a method that you guys should you know even if even if you don't do it in cyber security even if you're doing it to maybe just estimate how long a project takes try doing that next time um it's a good way to kind of tune your brain a little bit to be a better estimator and like I said there's a bunch of other really cool methodologies to do that but moving on so why is all that relevant right um well it's relevant because now I want to talk through a simple one-to-one substitution of the risk Matrix that I showed before but one that is statistically valid and accurate for uncertain data and uncertain endpoints so basically what I mean by this is that the methodology I'm going to show it doesn't require any tools it doesn't require any additional input data than you would have if you were just making a risk Matrix um and it will give you know similar types of results but in a more statistically accurate way to assess risk so what we're going to do is called a Monte Carlo simulation a lot of you are probably familiar with it um it's basically just a broad term for a statistical methodology where you're replicating something like 10 000 times right so it takes input data in the form of ranges confidence intervals I told you it would all loop back around um instead of precise numbers so if you think of like a normal equation right you know x equals five or whatever there's an exact number well a Monte Carlo simulation takes ranges as the input and why is that good it's because we're looking at risk so there's a lot of unknowns and uncertainty right so it's a lot more accurate if we can give a range of things that we think are going to happen instead of pinpointing ourselves into one exact number and then do thousands and thousands of reputations and look at the trends and the averages and things like that so another last time that I'm going to make you do college statistics I promise but if we look at the replications let's say we're doing 10 000 replications well those variables need to be pulled from some type of underlying distribution right like let's say you take a random number in Excel there needs to be some type of distribution that you're pulling a random number from you know maybe the distribution is from one to five and it's all normally distributed which means there's the exact same chance of you pulling 1.2 as there is 4.3 or something like that maybe it's based off a normal distribution which you probably all remember from college stats again right so your normal distribution it's centered at zero standard deviation of one even on both sides it's in red there pretty simple this models a lot of real world things but actually a lot of things in cyber security don't fall well on this model but if we look at a log normal distribution which is in blue you can see the differences a lot of the density of the curve is closer to zero right is closer to the left hand side and then it has this long tail on the outside the other important thing here to notice is that it never goes below zero so what do these translate to in the real world well one it shows that our risk can never be negative right as much as we might want there to be a negative five percent chance of getting breached in the next year that can never happen so we don't want to choose you know a distribution that would allow for that and then the other thing we see here like I said is the long tail this represents that when we do get breached right A lot of the times it's not that five million dollar reputation loss hits the news data