← All talks

Educating Your Guesses: How To Quantify Risk And Uncertainty by Sara Anstey

BSides Leeds28:5467 viewsPublished 2023-07Watch on YouTube ↗
Speakers
Show transcript [en]

okay I'm gonna go ahead and get started um I want to First say thanks for sticking around till the afternoon I know it's been like a long day of talk so thanks for actually showing up and if I knew that my talk was going to be in the afternoon I'd have brought like beers as props or something but they didn't tell me to far enough in advance um but yeah so I'm Sarah um I probably have a bit of a different background than a lot of people at the conference in that I kind of got into cyber security because of the company that I work for but I actually have like a data analytics statistical more mathematical background and a lot of what I've been working on since I've gotten into cyber security about six years ago has been bringing different types of statistical models and way to analyze you know numbers and uncertainty and applying that to cyber security Concepts um and that's kind of what I'm going to talk about in this I'm going to keep it pretty like light there will be slight audience participation but it won't be bad um and yeah so I'm just gonna kind of hop in um other than that though I'm the director of data analytics for a company called Nova Coast that I'm sure none of you have ever heard of if not for any reason other than we're headquartered in the U.S if you can't tell by my accent um but yeah we do cyber security and Consulting and like I said that's kind of how I ended up in this field that I never really I don't know had any particular interest in I guess but now that I'm here it's a pretty cool field um but so like I said this talk is going to be a lot about um risk and uncertainty and yeah we're going to get into like statistical modeling techniques and stuff for it but I think to start we have to Define what even is risk um if we're going to be talking about it and how I usually Define it is at a really high level is risk in cyber security is is kind of the same as like a risk in life in that it's anytime you're doing something or something could happen where you don't know what the outcome is going to be right so like anytime you're taking a risk in life it's because when you're doing something you've never done before or you don't know what the outcome is going to be of something if it was all known and you could predict exactly what was going to happen it wouldn't be a risk as much of a decision um and so obviously cyber security has a lot of risks right we don't know if we're going to be breached in the next month or year we don't know how many attacks are going to be susceptible to and so it also kind of begs this question of if risk inherently by definition is the unknown or something that we don't know how can we measure it or quantify it like is that something we can even do if if it's unknown right it almost kind of is an opposite of what you would think so I want to first start by talking about the current methods that are typically used sorry for the white background in uh in cyber security to understand risk so a lot of you are probably familiar with this this is like a typical risk Matrix right so rating things on the medium to high maybe like a one through five scale one through three scale I see this all the time for um like vulnerabilities right so as a vulnerability low medium high severe but it can be a lot of things but they're very common in cyber security um so let's talk a little bit about these and while they're you know easy to understand why are they maybe not the best thing to actually use to understand our risk so let's just put it into play and use an example and let's look at risk a and risk B right risk a we're going to say has a likelihood of 50 and an impact of nine million dollars risky sixty percent impact to two million dollars okay if we're gonna just really easily get like an expected loss then you multiply likelihood by impact very easy to talk about risk in that way we can see risk a has an expected loss of 4.5 million risk B 1.2 million now this is not like a crazy weird whatever risk Matrix I haven't done anything fancy here but you can see that risk a is categorized as a medium and risk B is categorized as a high even though clearly if we're using the math and the expected loss risk a we can see is way worse than risk B so how does that happen on this risk Matrix so I want to put up another scenario let's say I'm trying to understand my risk of getting breached in the next year you know I'm an analyst or something at a company and so I say all right I'm going to go ask my CTO and my CSO because you know you want to get two people's opinions and I'm going to ask them on a scale of one to five what do you think are risk of getting breached in the next year one being very low five being you know certainty and so I go to the CTO and I ask and the CTO is like well you know ciso is an idiot we're not doing anything right and this and that and you know there's probably thinking they're thinking in their heads like maybe an 18 chance we get breached in the next year so they rate that a one on a scale of one to five and they say to me one okay great so I go and I talk to the CSO and the C says like yeah we're doing awesome and we have all these patches up to date and whatever and they think you know we're probably in a really good position there's probably like a three percent chance we get breached in the next year so they rate that a one on a scale of one to five and so both our CTO and our CSO are in agreeance that our risk of getting breached isn't one let's go one to five but all of us in cyber security our practitioners know right that there's a really big difference between a three percent and an 18 chance that you get breached in a year right that's actually pretty significant but they've both rated it in a one and it's actually what's happening on this risk Matrix here it's a statistical thing called range compression where basically the ranges that you're using to measure something are you know you're compressing from a quantitative continuous scale into this compressed range and you might say well you know you're talking about quantifying risk a one to five scale is quantitative oh one of my scale is actually what we call a numerical ordinal scale meaning that an ordinal scale this means that there's an implied order so low medium high is an ordinal scale even a regular user versus like an admin user that's ordinal one through five just means it's numbers but we could replace one with improbable and we can replace two we sell them breathe occasional right and all of a sudden it doesn't become quantitative anymore and so those are what what we what happens when we have range of Oppression and at a higher level this whole phenomenon is actually called analysis Placebo so analysis Placebo is just a broad concept for any time that the way that you're measuring something so the mathematical or statistical formula or algorithm that you're using to understand data either gives you no measurable you know more understanding or as in the case of risk a and risk B actually gives you a worse understanding of your data just because of the way that you're analyzing which is what you can see happen here and that's a huge problem with the way that we're doing a lot of things in cyber security right now when it comes to risk and on top of that what really starts to go wrong and what's going wrong in this risk Matrix you've got a one through five skill and people start applying mathematical operations to it so then they say we've got a one and a two so our average is 1.5 but remember when we talk about an ordinal scale does it make sense to say well we've got a regular user and an admin user so the average is a regular plus user you know it it's like saying the same thing you can't actually apply mathematical operations to ordinal scales but people do and that's what happens and that's how we get analysis placebo so now that I've kind of like debunked I guess quote unquote the the way that we're doing things right now I want to get into like how do we fix it but not only how do we fix it but is there a way to do it that's as easy as a risk Matrix because that's a really low barrier to entry right rating things on one for five so before I get into that I want to talk about you know going back to the beginning of like when I said can we even quantify risk because it's unknown we can but in theory if we get a little like philosophical if we're going to quantify risk isn't it always going to be a guess of some sort right because there's no way we're ever going to know when or if a breach is going to happen so it is all going to be some type of guess right right and I'm going to show a method that actually does involve some guessing when we quantify risk so before we do that I want to pose a question are you a good guesser and is there such a thing as a good guesser like can a person be a good guesser can you learn to be a better guesser right so just so that nobody falls asleep on me we're going to do a little experiment um one thing about me uh I don't know if this applies to the average cyber security professional but I love reality TV the trashier the better I love the bachelor okay this is a this is a great show and if you've never seen The Bachelor okay I challenge you watch it so it's on Monday nights it's two hours sit down on the couch so there are better ones but the bachelor's like that staple like that's your entry to reality TV right there it's two hours on Monday nights get a bottle of wine drink the whole thing yourself okay watch it and by the end you will feel so much better about your life okay so great show right been posing this question here how many seasons of The Bachelor have there been and don't say it out loud and don't Google it okay I want you to think in your head what you think the answer to this question is um knowing that you probably don't know it for sure but you kind of have some information so think about the answer in your head but don't think of just one number I want you to think of a range that represents your 90 confidence interval to the answer to this question and I know I'm bringing you back to college statistics when I say confidence intervals but it basically just means you know a lower bound and an upper bound that you're 90 sure the right answer to this question would be in that interval so about a 10 chance you're wrong right but you're like 90 confident it would be between those two numbers so now like I said hold on I might have messed up the order my slides here okay go to if you guys can go to menchie.com and use this code it won't make you download an app it won't make you sign in it won't make you pay I promise it's a really good audience polling app if you go to mentee and you put in this code real quick we're gonna um we're gonna have a little bit of fun so first though before you vote on that sorry I think I messed up the order just slightly before you vote at mentee we'll I'll go back to that slide in a minute think about your confidence interval and now let's say I'm going to play a game and I want to preface uh this in this game you can win a thousand dollars besides it did not give me the budget to do that unfortunately so this is going to be a fictitious a thousand dollars you can win right but say I give you two options for the way to win this thousand dollars option number one if the answer the correct answer to that question is within your confidence interval you win a thousand dollars if not you don't option two you spin this wheel okay it lands in green you win a thousand dollars if it doesn't you don't gut reaction don't say it out loud but I want you to think which would you prefer like which option would you pick and it's not a trick question it's like what's your gut instinct right which would you rather spin the wheel or would you rather go with your confidence interval to win this thousand dollars and now we're gonna go to mentee and you'll see there should be a question up there um and I want you guys to say which option you Cho you would choose okay like what's your what's your gut instinct um hold on might have to actually go start it one second and the codes should still be up here at the very top um but which option would you choose which would you go with your confidence interval would you spin the wheel do you have no preference it's at the top yep I will just say two quick plug if you guys are ever doing presentations mentee's like free it's awesome audience bowling it's really easy I don't work for them um okay so it looks like most people would spin the wheel few with confidence interval maybe one or two people who had no preference but most of you guys had some type of gut reaction and most people would spin the wheel okay so it's kind of interesting let's think about that all right so I'm gonna go back to the slides I'm not going to leave you guys hanging the answer 27 Seasons there and this I I didn't even preface with this is only The Bachelor so it does not include the bachelorette bachelor pad bachelor in Paradise Bachelor Winter Games I could name more all right but 27 Seasons right so let's go back to mentee one more time here all right whoops how do I play hold on sorry technical difficulties there we go was the correct answer within your original confidence interval no judgment yes or no be honest so I'll say I've given this talk before that's usually about where it ends up um so all right we'll say like 15 you know maybe had it in about 80-ish spin um so but I want you to now think about the original question I asked right I said give your 90 confidence interval so even without knowing the right answer if this audience was what we call perfectly calibrated meaning that you guys all actually gave your 90 confidence interval then 90 of you should have had the correct answer within your interval and 10 should not have which means this audience is like 70 overconfident in general okay and it's not just because we work in cyber security that's actually there's been a lot of research studies that's human nature so the average human is really overconfident when estimating things they don't know and it's the reason that when your boss asks you how long a Project's gonna take you say five weeks right and it takes six months it's the same actual philosophy that we're really bad inherent estimators and guessers but there's also a lot of psychological tricks that teach us how to be better guessers and so one of them like I showed here this is called the equivalent bets method and there's actually been research done that shows if you weigh some type of monetary loss with your guesses you can actually teach yourself to be a better guesser and the way that it works is that if your gut reaction was to choose your confidence interval you should shrink the interval until you have no preference between the two and if your gut reaction was to go with the wheel you should widen the bounds of your confidence interval until you have no preference between the two with that monetary loss in mind right because both should represent a 90 chance of winning a thousand dollars and that's how you know if you have a preference so for example if you wanted to spin the wheel that's how you know you didn't actually give your personal 90 confidence interval maybe you gave your 70 or your 60 interval right so there's a whole bunch of other really interesting methods that's just one but these ways that we can learn to be better guesses the tangent but we will apply it later to our methodology so getting back to our uh our risk quantification right the rest of the talk I'm basically going to show a one-to-one substitution for that risk Matrix that we originally showed and the reason I call it one to one um two reasons one no additional software or technology is needed um the example I'm going to show is actually going to be in Excel um you could do it in python or kind of anything else um and then two we don't need any additional input data for the model that we would wouldn't need you know for a risk Matrix because like I said at the end of the day we can do this all with just estimations and guesses if we need so this is the one more time I'm going to apologize for bringing you all back to the PTSD of college stats but um what I'm proposing a Monte Carlo simulation a lot of you are probably familiar with it there's a wide range of applications for it um like I said same inputs as a risk Matrix what's good about a Monte Carlo simulation when looking at risk and uncertainty is that it accounts for having limited input data which basically just means a lot of the time in cyber security we don't have a lot of good data to put into models and again we don't know what we're you know estimating it's all a writ like a risk destination and the way that it does that there's two different ways the first is that the inputs to this model instead of being static numbers or variables are confidence intervals right it takes intervals of values as the inputs to the equation and then it does thousands and thousands of replications on those different input you know confidence intervals that you put in to come out with averages and so anytime you're doing replications on a range of values you have to have an underlying distribution that you're pulling from so if you guys look at the red probably all of you are familiar a normal distribution right centered at zero standard deviation of one it's even if you were to use like the Rand function in Excel it goes off in normal distribution okay or actually that's the uniform one but like a lot of things are in normal distribution what we use a lot of the times when doing these Monte Carlo simulations is called a log normal distribution which is shown in blue and there's two important reasons the first is it can never go below zero which basically just means that as much as we might like it there's never a negative chance of us getting breached right so it simulates the real world a little bit better and then it also shows you can see a lot of the density of the curve is more toward the y-axis and then there's a long tail which which kind of accounts for like extreme outliers and if we translate that into the real risk and uncertainty it's that a lot of the times if you do get breached it's not that massive five million dollar Target data breach that's all over the news right it can happen so we want to account for that but a lot of the times it might be you know someone got fished and you have to wipe a couple laptops or a little bit of remediation needs to be done it depends but we're just saying majority of the data breaches in the world are not those five ten million dollar catastrophic losses we want to account for them but understand that they're not the normal so what we're going to do first things first you have to start by defining the risk that you want to quantify same way you would in a heat map so maybe it's a particular vulnerability and what's our you know exposure risk to that vulnerability maybe it's something more broad like I've been saying which is more around the risk as an organization of us getting breached but we want to be really clear and precise in our definitions and then we also need to define a Time range for that risk to occur because it actually doesn't make sense to say the risk that we get for each you have to say the risk that we get breached in the next month or in the next year right we always wanted to find a Time range with it and then we're going to come up with all our input variables and assign our confidence values you know our intervals with our our values with our confidence intervals right so now this is where we're saying maybe the risk we're looking at like the example I'll show in a minute is something around phishing okay so what different input variables might affect that right maybe we do phishing simulations and we know our average click rate in