Educating Your Guesses: How to Quantify Risk and Uncertainty - Sara Anstey

Name: Educating Your Guesses: How to Quantify Risk and Uncertainty - Sara Anstey
Uploaded: 2022-10-23
Duration: 26 min 51 s
Description: Educating Your Guesses: How to Quantify Risk and Uncertainty - Sara Anstey Asking for budget and justifying spend in cybersecurity departments can be a difficult task due to limited data and high uncertainty of future events. This talk will dive into quantitative risk analysis as it relates to cybe

BSides KC · 202226:5157 viewsPublished 2022-10Watch on YouTube ↗

Speakers

Sara Anstey

Tags

StyleTalk

Mentioned in this talk

Tools used

Microsoft Excel

Service

Mentimeter

About this talk

Educating Your Guesses: How to Quantify Risk and Uncertainty - Sara Anstey Asking for budget and justifying spend in cybersecurity departments can be a difficult task due to limited data and high uncertainty of future events. This talk will dive into quantitative risk analysis as it relates to cybersecurity - how to model uncertain events and understand financial risk. Attendees will see a first hand demonstration of how quantitative modeling can be used to communicate risk and understand ROI. Attendees will walk away with the tools needed to present cyber risk as a dollar amount that can be easily understood by other business decision makers at their company. Sara Anstey (Manager, Data Analytics at Novacoast) Sara Anstey is a Data Analytics Manager at Novacoast who is passionate about empowering businesses to use everyday data to make strategic business decisions. She believes that the intentional adoption of a data-driven culture can be a key differentiator to companies in today’s security climate. Sara has experience in custom web development, artificial intelligence, data analytics, business intelligence, and applied statistics.

Show transcript [en]

um yeah so like she said my name is Sarah and I work at Nova Coast which is a cyber security consulting firm um but I do analytics and so I am a statistical person I majored in engineering I never really did anything with cyber security until about five years ago when I started working for a cyber security department um so just a crevice this talk is probably going to be a little bit different than any of the other talks today that I could be talking about like cves or how to have anything because I don't even know how to um but instead are going to get a little bit mathematical and statistical and even throw in a little bit of psychology

and and kind of do a little bit more of a high level presentation but I'll try to keep it really interesting um there will be very slight audience participation but it'll be cool um and yeah that's the plan so at a high level I'm going to start by talking about the ways that we currently assess risk in cyber security I'm going to talk about what's good about them what's bad about them and then propose some other ways to do it but before we really dive into that I want to start by defining cyber risk right what even is it and I think to do that you actually first really have to Define what risk it is in general

um and cyber risk just like a risk in life is basically anytime something is unknown right like anytime you have a risk in life it's because you don't know what the outcome is going to be and when we think about cyber risk it's kind of the same thing where we don't know if we're going to get breached tomorrow we don't know if a vulnerability will get explained of this or that right that's why we have to understand risk and understand how much as an organization or as a person you're willing to take on um so at the end of the day everything about risk is kind of a guess right how can we analyze data from the unknown and

how can we even get data if in theory a risk is something that we don't know the outcome of right so it kind of poses this question of is guessing okay and that's what we're going to talk about in this presentation right is guessing okay and if so how can we do it and how can we do it better and is that even possible um but before I get into that I want to talk about what we currently do a lot of in cyber security um so everybody's probably seen this before it's your pretty standard risk Matrix right if you work at any type of company especially in a cyber security profession you're probably familiar with

having to rate risks or even a typical vulnerability right if you've got a vulnerability scanner like Dependable or policy they're reading it local medium high critical right how do we come up with these ratings and what do they need so let's talk about this risk Matrix a little bit and um I'm actually going to kind of tear it apart because our risk Matrix or doing anything like this is what we call qualitative not quantitative and we'll kind of get into the difference of why we really want to be quantitative um but the first way that we know that this is a qualitative or non-quantitative scale is because it's what we call ordinal in nature so

an ordinal scale is something that has an implied order right think low medium high one through five even though that's numeric that's still ordinal right not quantitative just new Merit but it's ordinal in nature um think of a user like an admin user versus a normal user that's also ordered right it has an implied in order to it so where this falls apart and why we can't call this quantitative it's because ordinal doesn't necessarily mean you can do mathematical operations to it so let's think again of an ordinal scale of like a one through five right and people want to call this quantitative because again it's numbers so you think quantitative and if you write critical or

vulnerability number a you know a two and then what are building a number B a three and it has a likelihood of one and whatever criticality of four a lot of times we multiply those or we add them right in spreadsheets it's probably sounding familiar to some people but the reason that that's wrong is because instead of one two three four or five you could say well we've got up here in the chart right improbable seldom occasional likely frequent it doesn't make sense to multiply a seldom by the likely right so why does it make sense to multiply a two by five and that's how you know that a scale is ordinal and not quantitative in nature

it if you can take your scale and replace the numbers with words like this you're probably not using a quantitative method and that's where the math starts to get really bad and everything kind of falls apart so let's look at an example of this using this exact same risk Matrix and I haven't done anything crazy at all we've got two risks that we're going to evaluate you've got a risque and risk risk a has a likelihood of 50 and an impact of 9 million risk B has a lightweight of 60 and an impact of 2 million so if we look at just the expected loss multiplying the two you can see that risk B is a lot less likely than risk a

right because a has over four million dollar expected loss so this could be only has a 1.2 but if we use this scale this totally normal scale right we would rate risk B is a high and risk a is a medium so now not only are we not understanding you know our risk to any like high degree of accuracy but we are actually getting a worse understanding of our risk simply because of the tool that we're using to analyze it so this is something called analysis Placebo analysis Placebo in general not even related to cyber security is basically when the way or the tool or the equation or anything like that that you're using to analyze data gives you

either no valuable insight into what the data really means or even a worse understanding than if you were to do no analysis at all right in this case it's actually giving us a worse understanding and it's because of something called range compression um so range compression I'm just going to demonstrate by giving a real life example let's say I'm in charge of you know quantifying our risk or understanding a risk with the company and I want to know what's the chance that we get breached in the next year so I'm going to ask I want to ask two people get both of their opinions I'm going to ask our ciso and our CTO and I'm going to say you know on a scale of

one to five what's the likelihood we could reach to the next year so go to our CSO and our CSO is like well I'm doing everything Ryan we're amazing blah blah blah we've got all these tools you know there's probably only like a three percent chance we get breached in the next year so they'll rate that one let's go one to five okay so I go to our CTL and the CTO is like yeah this is an idiot and I don't know what we're doing and everything's underfunded so there's probably like an 18 chance we get breached in the next year so they'll rate that a one on a scale of one to five so they both agree there's a one on

a scale of one to five chance of getting breached in the next year but anyone in cyber security knows there's actually a huge difference between a three percent and an 18 chance you get breached in the next year right um so that's called range compression right we're not getting as clear of an understanding as we can but how do we fix that right now that I've kind of said you know some of the methods that we're using to understand risk and stuff might not be the best statistically what can we do to fix it especially given that we don't have a lot of input data normally and that's typically why we're using scales like this because we feel like we can't get

to a granular you know 4.3 chance of getting breached in the next year because we don't have that type of input data to know that granularly right so before we get into the method of how to fix it we're going to do a little experiment but I promise like I said it'll kind of be fun and we're going to figure out are you a good guesser um because if we don't have all the input data in the world and if inherently risk means we don't know the answer to something so we're going to be guessing right can we figure out if we're good guessers and if we're not can we become better guessers is where the psychology comes in a

little bit so I promise it won't be bad I'm not making you download an app or anything if you go to mentee.com and just put in this code we are going to do a little experiment so I'll leave the code up there for a second and I believe um it'll also be on yes okay so don't choose anything yet but just go to mentee and put in this code I'll leave it up here for one more second but it'll be up later too if you don't get it right now um but I like to think that I'm like a pretty good judge of character I can kind of read you know read the room know what

people are into and what they're not and I can just tell that this room specifically everyone's probably a really big fan of the bachelor um you know I can just I I just get that vibe from you um and so I'm going to ask the question how many seasons of The Bachelor has there been okay but before you think of an answer to this I just want to give a huge shout out I like to stay true to myself um yes I'm in cyber security and do this and that but I love reality television I think it's hilarious I love watching The Bachelor and The Bachelorette and I just want to give a quick plug and say if you

have not watched it and you think it's stupid I promise you get a bottle of wine okay sit down drink the whole bottle watch the two hour episode by the end of it you will feel so much better about your life like uh like I I can't I can't recommend it enough but so I want everyone to think about the answer to this question right don't Google it think about it you're gonna guess how many seasons of their bachelor have there been but instead of just thinking of a specific number I want you guys to think of what we call your 95 confidence interval right so that means think of a lower bound and an upper bound that

you're 95 sure actually let's do 98 sorry I think it's supposed to be yes I am 90 confidence interval you're 90 sure that your answer to this question Falls in between right about a 10 chance you're wrong but 90 sure you're probably right okay so everyone think of your interval and now let's play a little game so let's say I'm going to give you the chance to win a thousand dollars and I did not unfortunately get the funding to approved to actually play this this is a hypothetical a thousand dollars um and I'm gonna give you two options to potentially win this thousand dollars option number one if the real answer to the question how many seasons of The

Bachelor have there been was in the interval you thought of you win a thousand dollars not you don't option number two is spin the wheel plans on greeny one thousand dollars and nine to go okay gov reaction which would you pick right would you spin the wheel would you go with your confidence interval or do you not care right and it's not a trick question at all it's just gut reaction okay and now we're gonna go over to minutes here and I want everyone to go ahead and say which option you would choose yeah not for questions at all just curious I wanna it's kind of fun to do you know body participation if you didn't get the code it is at the top

still and then I'll also do a quick plug if anyone does presentations a lot mentimeter is free and really good for like live polling audiences I don't work for them dedicated to say that it's just cool all right so we'll let a couple guesses come in here remember this is kind of like what was your gut reaction right foreign okay so pretty 50 50 split about half you would go with your confidence interval half and spin the wheel okay good to know we'll come back so I'm not going to leave you hanging I will tell you the true answer is 26. there have been 26 seasons of The Bachelor that does not include the bachelorette bachelor pad bachelor in

Paradise Bachelor Winter Games none of us there's more but I'll spare you um just the bachelor there's been 26 Seasons right okay so let's go back to mentee one more time and let's go to the next slide what's the correct answer in your original confidence interval yes or no be honest it'll be interesting

okay so majority know right it looks like about 83 of the answer was no the original answer was not in your confidence interval so here's why I made you do that let's go back here now if we were truly a perfectly what we call calibrated audience meaning if we were good guessers inherently ninety percent of you think about that should have had the right answer in your confidence interval right so 83 didn't but 90 should have only 10 shouldn't have which means and I'm not surprised we are very what we call overconfident group and that's actually human nature so they've done a lot of studies in psychology on this humans tend to be very what we call overconfident

inherently and by that it means you know if you truly gave your 90 confidence interval you should have had no preference between spinning the wheel or picking the confidence interval right because in your head both should have represented a 90 chance of winning okay so that's actually a really well-known um psychological tool to become better estimators and better guessers it's if you evaluate a monetary loss in your head with your estimate you actually psychology like psychologically become better at estimating it because you basically if if you if you said on the boat I want to spin the wheel right you should increase your confidence interval until you have no preference between the two but if you said I want to go into my

confidence you should shrink the interval until you have no preference between the two and that's how in your head you actually give your true 90 confidence level instead of what you guys all actually gave on average was your what did we say 17 confidence interval is what on average the audience date when I asked for 90 right and so where do we go with that I know that was a super random tangent but we're going to apply it so we're going to talk about the one-to-one substitution to risk Matrix and how we're going to use that 90 confidence interval to do this so I'm talking about here just a simple one-to-one substitution for your typical

risk Matrix that's going to give us a better understanding of risk the reason I think it's one-to-one substitution you don't need any additional software I'm assuming if you make a risk Matrix in Excel you can do this in Excel you don't need any additional input data the input data you had for your risk Matrix will do fine for this right nothing additional needed it's just a different way of analyzing your risk so we're going to do what's called a Monte Carlo simulation um same input sources but it's actually a statistical models move you might know Monte Carlos not anything too crazy or complex but it accounts for limited input data in uncertain events so Monte

Carlo takes ranges of values as its input in an equation instead of for example just one variable and that's how we're going to use those 90 confidence intervals and we're going to do thousands and thousands and thousands of replications on these in you know intervals that we're putting in and we'll be able to get an average and get a lot of valuable information from that which I'll show but there is one interesting thing I want to point out in a lot of our Monte Carlo simulation specifically for cyber security we're going to use what's called a log normal distribution so taking everyone back to their college statistics class really quick the red line here is a normal distribution right

centered at zero standard deviation of one the blue line here is a log the reason we're going to use this is because it better represents real life um activities or events in cyber security for two reasons reason number one it can't go below zero which in real life translates to there's never a negative chance that we're going to get breached right as much as maybe we wish that there was um and reason number two you see a lot of the area under the curve a lot of the density is very close to the axis and there's a little tail out here at the end but a lot is pretty close to zero the reason that's good is because

typically even when companies do get breached it's rarely the you know eight million dollar breach that's on the news and this and that the targets and the FFX and everything right it's rare it happens so we want to account for those extreme outliers but we also want to say majority of breaches require a couple days of remediation or a couple of helpless people wiping laptops or this and that right so we kind of want to account for all that range of possibilities so here's what we're gonna do we're first going to define the risk right what is the risk we're trying to understand I've been using the risk of getting reach because it's the highest

level and it's very easy but you can do any type of risk um you could do the risk of you know a phishing attack or something like that um I did one for a company recently that they wanted to know the risk of allowing a personal email on company-owned devices right so we simulated that risk for them we figured it you know if they buy a DLP solution how would that fix it things like that and then we wanted to find the time range this is an important one because it actually logically doesn't make sense to say what's a probability you get breached you have to say what's a probability you get breached in the next

year or in the next month right because those are two very different things and then we're going to assign values with our confidence interval right so you're going to have all these inputs and we're going to use those 90 confidence intervals and become better calibrated guessers right and there's actually a lot of different methods to become a better calibrated guesser I just showed one which was monetary loss in your head right but there's many different psychological methods and a lot of Studies have shown that with you know a couple hours of learning how to calibrate yourself you can become an almost perfect guesser meaning every time you give a you know 72 confidence interval or whatever it will actually be

your 72 so something interesting to look into just for life in general and not even cyber security um you know we're very overconfident it's the reason that when your boss says how long is this project going to take and you say a week and then it takes five months right so it's kind of very helpful for things like that um we're gonna assign our confidence at intervals and then if possible we're going to repeat with multiple people so that's always just better for statistical accuracy rate if we can get multiple people estimating risks that's going to be better than just one and then we're going to run our Monte Carlo and take it out an app so I'm just going

to show a really quick example of this I think I'm kind of blow on time so I'll speed through it but I did just want to show an example of like this can genuinely be done in Excel I'm not proposing anything too and saying they're crazy um so this is an example of like a phishing simulation this is just one I kind of made up but think about a company in the technology industry with about a thousand users and they have about 10 million records right so here's our input variables we're going to assign all of our input variables which basically just means what are the things that are going to affect the chance of

getting fished right so here we've got things like um likelihood uh like number of users that click on a fish in a 12-month period a lot of companies actually have that data because they run phishing simulations so they can see you know on average our our click rate as a company is 22 or whatever right and then you know likelihood that um the user actually requires response remediation recovery something like that if they do click on a fish and then what's the hour you wage that we're paying someone who's doing that remediation let's account for those things right um business impact of a data breach we actually can get that from industry reports so if you think of like the

Verizon data breach report the IBM cost of a brief report is really good but we can look at for a company of this size this industry with this many records historically when breaches happen how much on average are they costing and we can kind of get those numbers right and then um we can also look at so this scenario for example we were looking at the risk of fishing currently versus if we were to buy a you know female fishing retention response off or whatever how much would it reduce our risk so we can kind of look at that and I'll show an example which the type in some numbers here so let's do [Music]

lunch of what we get from this model okay so we run our simulation and again we're going to simulate this 10 000 times super easy to do in Excel and here's what it's going to give us it's an average annual cost so basically now what we're doing is putting everything in terms of dollar homes and this is when we talk about reporting to the board or the CEO or even sometimes the CSO right you need to get things in terms that they can understand because you cannot go to the CEO or the board and say we've got a thousand critical vulnerabilities on cve whatever like they don't understand it and they don't want to and if anything it's just going

to scare them right or make you think that you're not doing a good job in cyber security what we want to do and go to them is say right now the amount of risk we're taking on in the next year of doing the fishing is 302 000 on average in the next year that is how much we will lose due to fishing if we invest in this phishing detection response offer for 50k you know whatever it is it will lower our cost to 157 000. for an immediate Roi of 2.5 right those are like the board understands that I think it's just numbers and it's Finance and so that's one reason that one this model is a lot more statistically

accurate than anything you're doing with risk matrices right we're getting rid of that range compression and that analysis Placebo but two the outputs are actually a lot more helpful and useful to make decisions because a risk Matrix is never going to tell you the ROI on a product and they're not going to tell you you know the dollar amount loss of the amount of risk you're taking on in the next year and then we can also look at things like simulated loss histograms so this means of all of our 10 000 replications right how many times did it fall between you know zero and 20K how many times is it between I don't know what those numbers

are but whatever this bucket is and then right the bigger it goes the smaller it is and then we also get these loss exceeded exceedance curves I'm not going to get too much into this because I'm already over time but basically it shows us what was our inherent risk right the risk we're taking on before we bought anything and then what's our residual risk so how much risk do we have left over if we do buy this tool or we do you know Implement whatever initiative we want to do and we can compare it with what we call our risk tolerance and risk tolerance is actually defined usually by like the CFO or the CEO of an

organization who can say this is how much risk we are willing to take on because you'll always be taking on risk but how much are you okay with are you maybe a large company that's okay with the potential of losing a million dollars in the next year you probably are right are you a small company and you say if we get breached and it cost us 50k we're out of business so it really varies organization organization but just to wrap these up and bring It full circle I know I didn't get too much into the model but I'm happy to answer questions later I just don't have much time at the end of the day the biggest

kind of feedback I get a lot of times is okay but we're still guessing right we're just guessing with intervals now instead of one two three four or five and to that I say yes it is still a guess because again we have to go back to our actual definition of risk a risk it's just like in life is something that you don't know the outcome of so there is no way we can ever perfectly know what day we're gonna get reached on it's in the future it's unknown it will always in theory be a guest but it's a better one because now we're using actual statistical models that are meant for uncertainty they're meant for

guesses and we ourselves are getting giving better guesses because we've trained ourselves to know how to do that right it's all about you know trying to do things in a better way we're never going to be perfect and if that's what we're aiming for we're always going to fail but this is something that we can do better to understand our risk um the last thing I'll point out is that this is like I am not reading nothing to wheel at all uh the insurance industry other financial industry have been doing this for like decades um it's the same type of problem in in insurance right you don't know if someone's going to get cancer they're

going to get in a car crash or this and that so how much do you insure them for and what's their deductible they use these exact same types of models I'm not like doing anything super crazy and fancy all I'm saying is that we should use it in cyber security too right we're kind of behind the times in that and I think it's because cyber security as a whole likes to think that we have these really hard challenges that no one is how to solve and maybe some of you like hackers and stuff do but in terms of analyzing data it's the same thing everyone else has to do and all I'm saying is we should bring this into

cyber security as an industry so that's all I have but um if you're interested definitely feel free to reach out or I'll Stick Around um like I said I know I didn't get too much into the model also the fair Institute is a really good starting point if you guys are interested in learning more about quantifying cyber risk I look into the fair institute there's also a couple books and things I can recommend if you're interested but other than that yeah feel free to connect and thank you for listening

[Music] so real quickly so that I love that methodology but is an approach that you might have like a risk register that has like 20 or 30 different instances where you define what the risk is and then you would apply this to each one of those individually does that means yeah I mean it's really however you want to do it I think that's a good way to approach it but this can be applied this is a general framework and methodology so it can be applied to any risk it's not specific to phishing or the lots of a data breach or anything so it's all about as an organization understanding which risks are important to you and

which ones you need to understand better because at the end of the day like my kind of Mantra is always the knowledge is power right like the more you know the better decisions you can make so it's about understanding which risks are we concerned about or which risks do we not know anything about that we just need more information on

Educating Your Guesses: How to Quantify Risk and Uncertainty - Sara Anstey

Related talks