GT - No More Fudge Factors and Made Up Shit: Performance Numbers That Mean Something - Russell Thoma

Name: GT - No More Fudge Factors and Made Up Shit: Performance Numbers That Mean Something - Russell Thoma
Uploaded: 2016-12-14
Duration: 56 min 35 s
Description: GT - No More Fudge Factors and Made Up Shit: Performance Numbers That Mean Something - Russell Thomas Ground Truth BSidesLV 2015 - Tuscany Hotel - August 05, 2015

BSides Las Vegas56:3515 viewsPublished 2016-12Watch on YouTube ↗

About this talk

GT - No More Fudge Factors and Made Up Shit: Performance Numbers That Mean Something - Russell Thomas Ground Truth BSidesLV 2015 - Tuscany Hotel - August 05, 2015

Show transcript [en]

so my name is russell thomas and i'm here to present on the topic of performance metrics titles tuned to the b-sides audience it's both accurate i think and compelling so i've got a quick couple quick questions for the audience's i want to understand what who you are and what your needs are so show of hands if you consider yourself a metrics person either because you're a manager and you deal with metrics for you or an analyst or somebody who creates metrics for management okay very good second question is i have a spreadsheet a working piece of software that you can download and start using either for demo purposes or to adapt to your own needs but i like a show of

hands of people for whom it's a priority that i spend time in this hot walking through the spreadsheet the alternative is I spent a little bit more time on slides or we have more time for Q&A so show of hands for people that it would be important to have some time walking through this spreadsheet okay so I see two hands I'm going to make an executive decision not to do that if you go to exploring possibility space blogspot com you can download this presentation you can also download the spreadsheet and if anybody wants to connect with me after the talk I've got some time and I can also connect at a later time to walk you

through the spreadsheet okay so we're all metrics dudes and dudettes in this room how would a metrics dude improve their primary relationship hmm interesting approach well the metrics approach is compelling it is as it is for us may or may not be successful in any circumstance for a whole variety of reasons one of which is how do you encompass everything that has to do with your primary relationship into a metric on some scale and in such a way that the audience that is your partner buys into it so what I've just said in a nutshell is a direct analogy to performance metrics in security my talk today is a practical method how you can address some of these complexities

just a caveat on what I'm not going to do this is not a method that eliminates all need for thinking so you still have to think second thing is the first time you do this you will not be done the first time you do this you learn what you need to do the next time in the next time in the next time it is explicitly designed for continuous learning and improvement the last thing is it will not work all the time because not all circumstances are suited for creating an aggregate metric I would argue based on a certain number of years of experience and relationships that put putting a relationship happiness into a performance index is not a path of

success so what do we mean by an aggregate index I'm using the word performance in here but it's really any aggregate index on a scale that aggregates a lot of different pieces of what I'll call ground truth theta or indicators and I'll take the word truth here loosely it's stuff that you can find and see and gather could be audit results it could be how long your staff has been in place it could be greatly detailed statistics on your ids traffic it's anything that is very close to the day-to-day reality of security ok so what a performance index does is try to boil and synthesize and put a little blender all of that information into some kind of scale that

some decision makers can make use of and make decisions on ok william james when my famous craver 'it pragmatist philosophers might construe this this way an aggregate performance is a construct it's something we make a decision on doesn't necessarily exist on its own in reality and it expresses a judgment about how this thing is all working and it is it is important to the extent it is useful invalid but we're going to touch on some of these themes but one of my one of my hats is an academic I'm in a PhD program so I like invoking philosophy whenever possible so in case you hadn't noticed performance indexes are pervasive in our society there are some people who might even say

it's an infectious disease of Western society but I'll leave that for the after talk conversation college rankings I have a son entering a senior year of high school college rankings and the consequences of college rankings are all our online everybody goes through performance reviews their school performance indexes credit scores standardized tests happiness indexes for whole countries one of my favorites the hotness quiz from Cosmo there's also a male version of the hotness quiz this didn't take too long to find and the interesting thing is out of all of these quizzes you answer bunch of questions you feed a bunch of data and somewhere behind the quiz there's some wisdom to tell you whether you're hot or not or

your college is good or not and so on based on some numbers and even in the realm of information security there are more and more index based services so here are two companies that offer this as part of their product offering there's a number of consulting firms and other organizations that publish index like information about security cyber crime index from a few years ago here's a fairly elaborate report that comes out of Booz Allen Hamilton I've highlighted the phrase down here simple average this is a theme we're going to touch on and most of what you've seen most of what I flashed through here how do they arrive at this number on this scale from all

this information it's typically what I call the usual method some sort of weighted average now I'm the kind of guy that reads the fine print that explains the method for getting these methods I can't remember time when I didn't see some version of the weighted average as being the methodology okay hold on your seats here's some math what is a weighted average or some so s here is the score each of the exes is a different piece of information or different metric the w's multiplied by each one is the weight the rule is the weights have to sum to 1 and this is how you end up with your score now the thinking behind us the people at booz

allen hamilton the people at US news world report they have meetings and they go back and forth between will this factor is important now this factor is more important they respond to their constituents because colleges don't like it when their rankings fall so they lobby for this and this is all arguments over weights show of hands just out of curiosity how many people in the room have ever been in one of those meetings for you arguing over the weights associated with scores it's a it's a cause for an extended happy hour afterward if not the more serious medication I'm not going to go into these but there's a bunch of statistical methods that are also baited

based on this linear model type thing many of these modeling approaches have amazing applications in an amazing power but the simple underlying truth behind them is they're all starting with this linear or weighted sort of approach so I want you all to think about the what I call the performance hypothesis or model that's behind us every element is independent okay so that so generally there's no X 1 times X to factor in there right every element X 1 contributes the same way to all possible scores and all contributions are linear I like graphics so we've got three workers they're pushing up on this structure we want to know what's the upward force on this we might say the

weights are the intrinsic strength of each man the exes are how hard they're pushing some them together and that's a good estimate of the force pushing up artists but does it work in the world of information security or the other realms that we operate in one of my favorite hobbies with my son is to play the realistic battle game total war so we spend a lot of time looking at what constitutes a successful defense against various attacking technologies so if you think about that linear performance model and contrast it with the world that we live in there are some easy and obvious alternative performance models that make sense in certain cases so I've listed just a few and this is not

comprehensive one is the weakest link right so some set of Defense's or some set of detectors fail if any one of them fails or whatever the lowest performance is the best effort is the reverse of it so this is like you got a basketball team you've got a star player the other players maybe schlubs but maybe that that best player is going to carry your effort house of cards where you have cascading effects I'm going to explain the drunk under the lamppost idea these variables work really well as long as the system stays within the known range or within certain known parameters so the drunk can find the keys under the lamp post as long as the keys are under

the lamppost but outside the camp harmonies so it was frustration with this having lived through it and also thinking about some need to express performance in cyber security generally that led to this and I did some research on academic methods and reasoning based on evidence and uncertainty and came up with this this slide shows the key ideas to this method each of these pieces each of these metrics the values are treated not as numeric things to operate on with arithmetic that treated as pieces of evidence you put them together to estimate the weight of evidence for every value in your performance index so if you have five possible values you have essentially by formulas that way

does this evidence point to a value of one does it point to a value 2 3 4 or 5 and so on looking at the distribution of weight of evidence gives you information about the quality of your evidence and how strong you can make claims when I first gave this talk that's next line gotta cheer of applause the method works on spreadsheets you don't have to learn are in fact I started doing a demo and are and it was actually more complicated to do in ours and on a spreadsheet our corporations are populated with people who are skilled in spreadsheets your executives will feel comfortable if it's in the spreadsheet so there's a lot going on these there's

a lot going on that I like will walk you through there's nothing about it that requires higher Matt and I'm not going to talk about this aspect but I do believe that this method can work in a machine learning or big data environment if you want to tune and refine your performance hypothesis based on more detailed analysis so with brazen lack of humility I've labeled this the Thomas scoring system why I get no money from this I got no fancy hairdo I got no tattoos I got no piercings this is my personal branding so it's available creative commons attribution-share alike and I've made the spreadsheet available which is documented and you can free to use and edit that so I like diagrams we

think of the scope of things that the usual method consol thomas scoring system can do that and more on the blog site I've got as sort of tutorial blog posts about this and I do a replication where I say here's how you do the weighted sum method if you really wanted to so I proved that it can replicate it and it can do a lot more so now I'm going to go through some of the key concepts of the output you get relative to some of the inputs and then I'll walk you through sort of a simplified demo so we're using the mathematics of probability to express degrees of belief around each score value so put aside in

your mind probability associated with experiments or what percentage of people in this room are over this height this has nothing to do with experimentation or frequentist statistics this is we're using the tools of probability from a mathematical standpoint to express degrees of belief so if I have complete belief that the value of this performance index is 3 this is what the what the graph would look like all of my evidence is on 3 it goes up to 1 and there's no weight of evidence on any other that's how you interpret that and by the way in the weighted method if you come up with the score of 3 this is what it communicates okay more complicated

what if I know for sure it is not three but I have no reason for choosing among the others this is what it would look like there's no way to express that point of view or that belief system with the usual number this is a uniform distribution maximum entropy is the fancy term this is I have no belief in any alternative or any other or equal belief so there's a dual interpretation here John Maynard Keynes some of you know from Keynesian economics many people don't know he did pioneering work in using probability as expressing degrees of belief or weight of evidence some of you may have heard of Bayesian updating the method we're using today

can be viewed as a manual method of Bayesian updating but the advantage of it is you don't have to go through the statistical mechanisms of estimating all of these and doing the Bayesian update procedure but it is essentially compatible with this from the mathematical standpoint okay so here's a toy demonstration of how this might work so over on the left hand side say this is my pile of possible evidence these are different metrics taking on different values and to start out with before I have allocated them I have no belief over any values so my default is this uniform distribution ok so i allocate one piece of evidence but this piece of evidence is not very

informative to me it tells me well this is not not one it's a button to or above but it doesn't differentiate so look what's happened up here so one the weight of evidence under number one after you do normalization by the way normalization all it does is convert this unbounded scale to a scale that goes between 0 so I I have a little bit more belief that this up here but I still can't rule out one because look at all the evidence I haven't yet allocated okay now I've allocated some more evidence three starting to look more promising but not very much so if I'm doing the scoring system you know this is all the evidence I have I know

something more than I did before but as an analyst do I report this to management is it three who do I say we're in the process of understanding this we only have a hazy ID and we're not yet ready to report a number all right so and I don't know about you but the executives that I have worked out will sort of respect that better than numbers that keep changing all the time or numbers they don't understand that sort of thing and by the way if they say well what do you mean you two can't report the number you say we know three things we don't know seven things we think it's important to know

at least five more things before we can say anything the contents right it leads to a very straightforward transparent conversation now some of my favorite cases security sometimes you're going to find situations where you have metrics telling you diametrically opposite things in fact I think it's pretty fruitful to have diametrically opposite information because my audit team goes in and talks to a bunch of people and they say oh yeah patches are update and everybody's gone through training and then you've got a bunch of log stuff that shows that they're clearly not up to date that is a sign that further investigation you don't have a full picture of what's going on so look at what happens when you have ambiguity you

start seeing multiple peaks and depending upon the relative strength of them you may not be able to decide analytically what's the true message here now this is one of the great failure modes of the usual method because if I have all of the weight on five and one and none of the weight anywhere in between those averaged out to three and there's no way to differentiate that from the case where I have no information or all the information all the evidence on number three so let's be clear about what we're doing here we are modeling your reasoning and judgment from evidence so I call the performance hypothesis and the nature of that evidence it's not a

model of the world you own the model of the world you're answerable to it all this does is make clear what do we mean by this how do we put this all together what's the whole picture it's a little bit more math a little bit more formality so your index has n values you have K metrics each metric or combination metrics is treated as evidence so the ice score is the multiplication of the condition upon which this metric is applied its relevance and its significance and I'm going to go through and explain what I mean by each one of these yes sir that is the superb question that everybody here is question the question is do you feedback the

number of metrics that you have because as you increase them that changes the probability right let's go back here another way of expressing that is what if this pile here grows you even know you need to know how many things have not yet been allocated one of the important less understood features of this method this is a no free lunch method is you need to make some allocation for that which you have not yet measured that which you have not yet assessed and there are many academic books written on well how do you get your arms around that and how to estimate how do you found it in practice there's no better way to do it other

than sort of the logical process of framework so I've published a ten dimension performance framework missed hazard cybersecurity framework to me the value of frameworks it helps define you the universe in which you are operating and then subdividing that universe and you can start saying how much do we know about this piece how much do we know about this piece how do we know how much do we know about the interrelations if you go through that says logically I think you can come up with an estimate and there's another way to do it which is remember i mentioned that learning cycle before all of us will start out with a very crude and probably a bad model all right you start

out with what we have we put it together we look at the results and say crap I can't that then you send your staff off I need metrics on this we need to start measuring that you start putting those together well some of those things are going to be really informative right when I was working with our firewall team and I'm trying to understand our response time on implementing firewall requests staff turnover was not first on my list of things to look at that eventually we got to the point where is like the most important thing to look at so through an iterative process of trying to evaluate the evidence syncopations going back to the process

you're learning more and more about what you know what you don't know the impact of what you know and hopefully you start throwing away metrics which are redundant and not informative okay so before i get into sort of a toy example i want to talk about this one here because it's i'm a little dissatisfied with how i've described it and it may not be obvious to people i have any given metric let's say the simplest one a audit result pass or fail how informative is that audit result and under what conditions do I want to put the result of that audit in sort of a raw level you know so it's informative about the scores or maybe that audit

result combined with something else is informative maybe the failure of the audit and some other condition might be more informative than either one of them alone so what I have here in the method is two types of conditions these are logical conditions of that return true false results one I call the metric condition conditions on metrics you use these to produce evidence so if i have a metric on defect software defects let's say that maybe an unbounded scale but for my purposes my performance hypothesis what I care about is is it zero is that less than five is it less than 100 or anything else right so it helps you find grain or chunk these

things in ways that are useful for you and the second is when you group or combine conditional relationships between metrics is in the case that I just described right so you don't have to do anything like this if you don't want to I found it very useful in terms of highlighting the value ranges and the metrics that are informative to my performance hypothesis the second I'm calling logical condition or dependence these are conditional relationships that govern when I apply this to the performance score so even though I may come up with these as individual pieces of evidence I may not apply them all at the same time so in the in the spreadsheet demo which I'm not going to

spend time on this was a maturity model a questionnaire oriented maturity model and we chose to have a scoring system that was based on two logical conditions that would automatically reduce the performance score if they were true right if you had so many deficits start knocking it down if you had one of five red flags no matter what the other scores were it would take you all the way down okay so that's how you can factor these in in a way that doesn't blow up your scoring system you don't have to hide it somewhere in some obscure spreadsheet cell and you can balance that out with the other aspects okay so this is a completely goofy scorecard you know this

is just some random collection of metrics i cobbled together so these are the values of metrics here are two metric conditions based on metric number one did I pass that are not pass the logical conditions i just talked about now relevance x significance sorry these two together gives the weight to these conditions the weight is not probability weight is dependent upon the total amount of evidence you go through the normalization process at the end to convert it to probability so if you want to get your hands on this I've got the demo spreadsheet available for download it's fully functional you can work with it what I'm going to give you is a little bit of a cartoon version of it

using these these goofy ones so if I've got no data it produces a chart with this uniform distribution here are all of the metrics values if our highest level notice that it has not produced a zero weight of evidence on these other scores and that is totally related to the logic of my performance hypothesis so even with these scores here I cannot rule out the possibility that in reality it's a four or three so as you tune your performance hypothesis compared your evidence you're always passing yourself do I really believe is how much does this really tell me is this related to that and you do so in a way that is transparent meaning all the

information is on the spreadsheet and the logical place again not buried in formulas a quick aside how many were in Michael Wright mins presentation earlier the CBS s he told a story about CBS s scoring method steve s scoring basically is a weighted performance score somebody in the mailing list said yeah noticing that we get in a real cluster of scores around nine point two that doesn't really look very good so I put in this little factor here to spread out those numbers here that's exactly what I mean by a fudge factor and if that's all that that happened it may not be so bad you take something like CBS s and going through a revision after revision and

different use and it's hard to know what it means anymore after some time so here is the reverse case all of the scores are at their most negative value and by the way it totally depends upon the logic of your scoring system to determine whether or not this is going to happen one of my early versions of my security maturity was not giving scores in the lower range I check my logic and I checked everything and it turns out I didn't really have conditions set up that were they would highlight the lowest low level of performance so it starts beating taking you as an analyst back to the question of what evidence do I have

across the range not just typically what's going on so here is a score where all the values are right in the middle it so this is sort of the sanity check this is this is the scoring system as I would intuitively expect based on the inputs so whatever however you use the spreadsheet you always want to go through the sanity check to make sure you don't have a type of an error so what happens when this is a muddy mixed case well you have no clear central tendency now I have to diagnostic metrics that evaluate this distribution one of them is called clarity and that simply means how much of this mass is behind any one of them right the third

peakers it spread out and here we show the clarity number it is not very good okay here's our old Bugaboo of ambiguous evidence so I actually have separate diagnostic around ambiguity and this simply measures are there multiple modes and what's the weight of those modes and how far apart are they so here's the worst case ambiguity I can get right so I've got some of these at their worst level and some of their best level so this gives me by my worst case so I think I've harped on a lot of these benefits in the interest of time I think I'm going to skip on to some of the next details it's not well publicized that there's

more than one variety of uncertainty if you've ever gone to a statistics class if you've ever read books on statistics even noted economists and behavioral scientists they'll talk about uncertainty as though it's one thing there's some interesting work one of my favorite authors is Smithson he talks about a whole tree I like to think of it as sort of a tropical rainforests of gnarly nasty varieties of uncertainty and some of them are listed here the usual method can deal with some of these somewhat well so if some of my metrics are vague the typical approach is well we'll do an upper limit and we'll do a lower limit we'll publish that it's like okay my work what if I have varying

precision right so some of these things I'm measuring the high precision some to a single digit I probably still have to use them in max approach what if some of the metrics have some systematic errors or some errors I can't control for missing data but when these things don't line up arithmetic alee enter the punch factors my friend so this is my boldest claim in the whole thing and if you guys try this out you find it does not handle some of these things i'm really interested now i mentioned before this is not a magic bullet there are circumstances where the evidence you have and the way they interact will not result in a sensible performance index

there are you know paradoxical completely systematically dysfunctional cases or things by which the evidence you can't get to by the means that i just described but i think this will handle a hell of a lot more cases in a reasonable way right not bearing and hiding it bringing out the nature this uncertainty and what you might do about it so here's a quick overview of a couple of cases where this has been applied so our context was we're doing vendor maturity the decision really is not whether we're going to use the vendor but under what circumstances do we make the case they need more reviews or maybe an executive approval that type of a thing our focus was on questions and we

needed the questionnaire to be simple and not the bits assessment or some of these others that are really elaborate so our performance hypothesis we had a lot of people at a lot of years of endor assessment and it really came down to well if we see these five things we know they're crap we don't need to investigate everything else and that's the red flag model then we started identifying things that were clearly deficits but in themselves would not qualify as a red flag but if we saw too many of them that would sort of get into that category and we also said we're not going to make the highest rating to be an easy thing because that's not really

important to our analysis vendors get more benefit we really want given this limited survey of information we really want to have it to be pristine so the next case here very different setting we're talking about operational support for firewalls the goal is to maintain security operation functionality as we shift organization responsibility we had lots of interesting sources of ground truth and yours truly examine spent a lot of time poring through a lot of data looking for correlations and relationships and so forth and trying to figure out how we're going to put this together in a way that management can make decisions it turns out that the simplest performance index was going to be the best because

the nature of the action was so binary the decision is do we need to take correction corrective action or not if we need to take corrective action that means the managers can spend time with them we may need to ship some resources the performance index was not going to do all that work okay so these three levels of in the performance hypothesis had to do with the the performance on these individual items that I'm listing here so there was some automated ways to collect them and the scoring was relatively simple because we could see if availability or change request or adequacy of monitoring existence of disaster recovery was or was not in place relatively easy thing to see if

those were happening or not it turns out that one of the most critical things driving this from an operation quality standpoint had to do with staff turnover and whether the new staff and the old staff were in place at the same time to make sure that no requests fell through the cracks and people were properly trained so the last case I'm going to talk about the sort of the genesis of all of this I don't know how many people in this room are on the hook to evaluate your information security for your whole organization if you are you have my support and sympathy because it's a really challenging job it's vital because executives make decisions they

spend money they make business model decisions they put stuff in the cloud they put stuff overseas they decide to outsource certain things they decide to put things in their personal emails or not lots of things going on the executive level that you want to informed about this so you leave you're never going to get to any sort of overall performance unless you have some sort of analytic framework to make sense of what you're measuring so there's the NIST cybersecurity framework I've come up with this ten dimensions of cybersecurity performance difference is Miss CSF collects practices this does not collect practices the 10 dimensions is focused on performance the biggest challenge I've had is how do you measure

all of these things that we can't immediately tied up to activities so I needed to have a performance measurement system so the approach we use if anybody is familiar with the balanced scorecard method and corporate performance management it's the idea that you don't roll everything up to shareholder value you keep things somewhat separate and related and you look at them as a interrelated portfolio and instead of four dimensions or five I pick 10 because I think that's the nature of the situation we're dealing with so each dimension has its own performance hypothesis or sets and they need to be thought through based on the nature of your business and how it's how you sit in the information ecosystem so one of

those dimensions I was talking with a colleague earlier a much esteemed colleague I might add how do you evaluate the quality of your protections and controls independently it are they the best ones of the right ones somewhat independently of the threat environment can you look at the quality of those on their own so I advocate sort of a tqm approach to this were you thinking about defects and root cause analysis associated with them so there's an amazing literature in 25 years of international experience on how you do systematic quality improvement and I think we can import about odds say two-thirds of that thinking maybe half of the methods and apply it to the quality of protections

and controls that you have in place now with the other dimensions and I won't go through the whole model you can find that on the blog you look at the other aspects do we even have the right controls in the first place do we have the right intelligence from them are we managing our interfaces external the organization and effective way and so on so I'm going to wrap up here then we'll have good ten minutes for Q&A and debate so here's where this thing is I think it works in both in theory and practice from a small base the usage is really it's up into the up into the right there's really strong global momentum

toward adoption and a vibrant user community and I want to close on this cautionary note the world metrics people significant number of us have significant relationships before you implement this method or any other think through do I really want to collapse this complex system into a performance index and what consequences might flow from it so here's my contact information feel free to email me mr. Marriott ology on Twitter there's the blog link so we've got about how many more minutes sir 13 yeah great so if we can close the door I feel free to step up to the mic or if you want to shout out whatever you want to do questions comments I like it

I hate it okay haga turned on and that is on okay this is for I assume Adrian actually pipe this mic in there so that you know the people that are watching the software can actually hear what we say okay so my biggest question would be around your your potential outcomes your states and so I think you defined a couple like this is good we're not really sure this is bad things like that how would that apply to a risk program if I'm rating risks you know what I berating them low medium high and if I was doing that would I be taking into account how would I take into account things like taking it account the actor it was acting

against the risk excellent question very happy that you asked that part of the reason that's significant as I spend a lot of my time over the last seven or eight years on specifically quantitative risk analysis the short answer is that this sort of approach is not analytically adequate to deal with probabilistic risk analysis you can subdivide using fair or any other taxonomy subsets so let's say you wanted to measure resistance as which is one subset you might do create a performance index on a certain set of controls that fits into your model of resistance as fair defines it but there you can take the fair approach is one you can take other approaches to probabilistic risk

analysis you can extend it further to include adversarial behavior this approach is not intended and explicitly not intended to try and envelop that sort of complexity because you need conditional probabilities and a bunch of things that are not factored into it other questions comments yeah please good talk I just actually went through your spreadsheet very good I have two questions the first one is can you talk a little more about the results after you implement the system I guess where you are in terms of a little out of thinking you can talk about your results after you implemented this system currently in terms of either your interactions with the board or executives or just overall I guess

improvement in your program that's the first question the second question is I see that you're also in the financial vertical as I am also and a lot of times they want to know aside from these metrics are really good it is uh how do we compared to our neighbors right so this is great but what about so-and-so how we compared to them so for example when I went to your spreadsheet I'd bring everything in you know I got this the score elite average above average question I can see my management asking is okay but who's elite like I know compared to them okay very good questions I for reasons you might imagine I can't go very far in

describing what my employer does or doesn't do in consequence what I can say is a couple generalities first generality is this has been implemented and tested in some operational settings first this is not being yet implemented to my knowledge at the board level so this is aimed at sort of smaller problem I personally believe it will work with the chief risk officer presenting to a board but we don't have a use case of that and my particular role I'm I'm focused on much more narrow data science type stuff so you question your second question is well how do we compare this is a vital issue for all of us analysts to understand because this is behind

this what's going on in the minds of the people that you're trying to influence you're giving them information and support decisions that they're also thinking about well how do what I mean I hear this from my guy but what about this other guy in so much of executive decision making is talking with and comparing notes with what their peer companies do sometimes they're pure company looks like them sometimes they're pure company is Netflix you know depending on who ever their favorite CIO is or you know the routine much of the falls under the umbrella of compliance risk audit and financial services and many other things is designed explicitly to make that cross organization comparison effective

yet in doing so it squeezes all the juice out of my opinion until this becomes more widespread and adoption I mean sometime before I die I would love to see some significant fraction of companies adopting this ten dimension framework if they adopt these performance dimensions by some definition they should be able to convey compare performance indexes between them and frankly if intermediaries let's say the companies that are at the center is a value chain you know the walmarts or bank of Americas or whoever that drives the behavior of all the vendors if they start baking this into their contracts if it starts getting baked into cyber insurance contracts it's through those vehicles that we're going to start to

see standardization much like we see standardization in business and consumer credit reporting anything else okay one more so do you have to make any allowances or are there any peculiar peculiarities with using this with time series data

so there's time series data i'm going to call there's two flavors of it there's time series data for which the metric values themselves change every month or every week or some and the only consideration there is if some metrics are gathered quarterly and others monthly and others weekly and others on the Julian calendar and I you know I in the past used to work in marketing we used to aggregate numbers somewhere over quarters and some other fiscal years and you have this alignment problem so the pragmatics of alignment and how fast change things change and does a metric lagging or is it synchronous those can be accounted for in your performance hypothesis as you account for the

relative reliability of each one you might even have multiple metrics to account for the fact that this one only updates quarterly you might have a cruder metric that updates more frequently okay a more subtle problem in time series is things that only unfold in patterns of time so it was time to discover a new incident or breach or time discover new vulnerability or time to close a firewall request we had a debate and Michael session about whether mean time to close was a useful metric is it the 90th percentile is it the the change between them that is fruit a fruitful ground for further investigation so my advice and this is how this scoring system is different

start with whatever simple thing seems obvious or nobody's going to argue with or you can't think of anything better use it and you might find some opportunity to improve upon it and eventually after the second or third or fourth iteration you might have a diagnostic you might actually have a machine learning program operating on that time series to detect anomalies in the time series and that's your metric so if you know here to the phrase control charts and tqm control charts are nothing more than a simple method for determining whether the process is governed by a normal distribution or not because if it's not governed by a normal distribution you have to do other things until it is and

once it's governed by a normal distribution then you can focus on average values and you can improve those average value systematically so time series stuff is important there's some trickiness there's also plenty of room for opportunity there have you run into anything where where because you're measuring things they change the rules on how they actually measure things because i found when a metric becomes a target it's no longer a good metric sometimes I'm so glad you asked that I actually thought about putting a slide in this how many heard of the recent news with toshiba corporation some high percentage of toshiba executives have been fired they have been severely punished apparently over the course of a

couple years there was systematic gaming or cheating on revenues and costs to the total sum of a billion dollars so we talked about you know cyber impact I mean billion dollars that's real money outside of the realm of cybersecurity the folks in the Atlanta school system multiple people gone to jail apparently suicides some serious problems with gaming systems I really believe in the value of metrics I really believe in the value of aggregating performance I really believe in the value of perverse management of that to drive the worst possible behavior so this is powerful and dangerous stuff anybody involved in it if you're feeding ground truth metrics if you come up with the scoring system just like you go

through your penetration tests on your technical systems you need to do a pen test on this how can people game it I'm shocked that there's gambling going on in this casino any some more chips mate so its intrinsic to the nature of measuring things its intrinsic to the way Western society works and the way incentive systems work and the only way to deal with it is integrity in the scoring system using it for useful purposes like continuous learning and not for unproductive purposes like tell me who I should fire ok thank you all ladies and gentlemen [Applause]

GT - No More Fudge Factors and Made Up Shit: Performance Numbers That Mean Something - Russell Thoma

Related talks