
I talk here at B-Sides Salt Lake City. Uh the math behind the hunt. >> [applause] >> Uh a little bit about me. My name is Edidiong but everybody calls me Eddie for obvious reasons. Um I put like some important things I need to know about me here, but um the most important thing you need to know is >> [snorts] >> that besides um my family and my friends, um um my entire life kind of result revolves around cybersecurity and and soccer. So, after this, if you want to have a chat, um I'd like to talk about soccer with you guys for sure. >> [clears throat] >> The purpose of me giving this talk today is kind of how we can use statistical
concepts to um to help kind of like back our threat hunting, right? Many of us have been in situations where we have to explain to like senior leadership why this thing is why this thing why we need why we need to take certain actions, but we're not just maybe able to explain everything at that time. We just have like a gut feeling or intuition that something is wrong. Um but with statistic, we can provide um credibility to our um opinions, to our hypotheses. And usually that gap would cost us response time, would cost us credibility, and sometimes the entire hunt in in general. Well, what this talk is about, it's just going to highlight six
statistical concepts, um simple things that you might have learned in one of your stats class that you have you might have not really thought about um in applying in the security sense. And I created like a synthetic scenario with some data sets to kind of show this. And then kind of highlight like a framework you can apply in your own environment. And so, this is just kind of like the attack timeline. It's not super important here. Um so, the first concept is frequency distribution, right? It's like the simplest question we can ask in a data set. How often does every value appear? Right? We can count every occurrence of every value and plot them. I love I'm a visual
learner, so I love to see what's going on. And so, putting this on a graph, um it's easier to kind of spot when something is off, right? If you have to look at a lot of logs and spot like a weird thing, it's it's going to be hard without the help of without the help of AI for sure, but once we're able to put that on the graph, we can easily get a baseline, and we can see um where anomalies um occur. Now, these anomalies, they won't be they won't always be malicious, but definitely they they create like questions that require us to like find answers to, right? Why is this here like this? And then we can
go and see, oh, this is malicious or it's not malicious. And looking at the graph of authentication events um for this fictional company, we can see that most of the activity happened within business hours, which is kind of expected right? >> [clears throat] >> And then um for the day one of our attacker, we see here that a lot of activity happened in the early hours of the morning, which is different from what our original baseline was, right? This is just simple counting and then put on the graph, right? This shape here has been broken by that attack attacker's activity. Um we want to know what happened here and find the answers to that question.
In this case, there was a brute force attempt on a user account J. Smith um from an external IP address. Our next concept here is central tendency. Um it's just basically answering the question, what is typical in our data set, right? We've We've We all know what mean, median, and mode are. We did it like in elementary or high school, right? Um but in security, we can kind of get a good idea of of what is going on based on that, right? For most normal data, the mean, the median, and the mode, they're going to like um align or agree with each other, right? So, if you work generally like 40 hours in a week, if I if I was to take
an average of how many hours you work in a month, it would probably be 40. If I just take the median of that, it'd probably 40. The mode probably 40. Like everything agrees. And so, I can kind of trust that indeed you you work 40 hours in a week. However, let's say you had some crazy incident at work and you needed to be or work longer and you end up like working like 80 hours or 60 hours or whatever. If I was to take an average of that, your average is going to be skewed towards um 80, 60, but your mean or median is going to stay the same cuz they are resistant to outliers, right? So, whenever we look
at a data set and we apply these um measures of central tendency on it, once we see that like our average is not agreeing with these other two, um again, it should prompt it should prompt a question that we need to go answer, like, what is pulling this average away? Um again, it's like super simple math, but looking at it this way, we can able to like um surface a lot of like outliers and anomalies within our environment. And so, this is what like a healthy account looks like for this organization. The mean login time is around like the mean login count is about like 15, right? We can see that like everything is kind of closely
aligned together. And so, we can trust that average that um is indeed 15 about 15 logins a day. However, because of the attacker's activity on this compromised account, we see that their mean login time went up to about 23, but the mean the median and the mode are still kind of close together, right? Again, this is a question that we need to go find find the answer to, like, what pulled this away? Um in this case, was there it was the attack activity. So, it's never it's never good that we can of trust one single measure. We have to look at all three of them as a whole, and then we can start to like spot anomalies within
our environment. The next concept here is standard deviation. And this basically is just answering um how spread out is our data around the mean. Um [clears throat] right? This kind of helps understand, is our data reliable? Is it not reliable, right? So, for instance, um you have a a striker who two strikers they're scoring an average of 10 goals a season, right? One scores like 10, 9, 11, 10, kind of like almost consistent. And then you have another striker who scores like five one season, 15, 10, 8, right? Let's say you take an average of both of that, it's probably going to be around 10 the same. For someone who's making decisions, right? They'll probably want to go for
the striker who is more consistent, even though their average kind of looks like they're like they're the same player. So, same thing with standard deviation, right? We want to know how predictable or how reliable is this data set. And one thing to note here is um whenever we're comparing um whenever we're making the calculations for standard deviation, we have to be careful that we're not comparing certain entities to to the larger organization, the larger environment, right? Because we have different computer types, different devices, different roles, different users, um so many different things within the organization. And if we were to take a standard deviation of that as a whole, um it would be such a
wide net that it it would not help us like spot or catch anything. So, to kind of like counter that, it it's important that we compare like certain entities themselves, right? I want to know how Eddie behaves in this environment. I'm just going to compare Eddie to Eddie. I won't compare Eddie to um Sarah or whoever, right? Because they're two different people, right? Eddie could work as a night analyst or Sarah could work as just a normal um HR 8:00 to 5:00, right? Comparing both of them won't give me the the right thing. So, this is kind of what happened in this case, just an example of um this account and Wilson. She's a SOC analyst
on the night shift. She logs in every time at midnight, right? That's normal for her. Um but J. Smith on the other hand, logged in his average is around like 8:30, and if he logs in like early in the morning, that's kind of abnormal for him. And and if if we did the calculations using the entire organization, we get a standard deviation of about like 259 minutes, which is maybe like roughly 4 hours. It's so wide like it doesn't do anything for us, right? And I know that we will not be in situations where we can do per entity, um we can we can compare just like a peer group, right? So, we can compare
um devices in the same building, um users in the same role. Um that will probably give us a little more precise um data or precise calculations than the entire organization or the entire environment. It won't be as precise as per entity, but it's definitely closer to that. And this is kind of just a graph of that cuz I love to I love to see things happening. And then for standard deviation, um if we follow the empirical rule, which says that 68% of our data is going to fall within one standard deviation, um 95% of our data is going to fall within two standard deviations, and 99% of our data will fall within three standard deviations, right? So, kind of
following that, um for the ex- the most extreme times that this user can log in based on the math, would be between 7:00 a.m. to 10:00 a.m. And their login time here was about um around 2:00 a.m. That is like so far outside the following this empirical rule that it's almost like statistically impossible for that to happen, right? All right. The natural progression here from uh standard deviation is Z-score. So, standard deviation looks at the entire data set as a whole, but Z-scores can answer specific question about like a value, right? So, we're trying to ask ourselves, um what is how unusual is this particular value, right? Um within our data set. And we answer that
question in terms of standard deviation. So we say it's one, it's two, it's three, it's eight, it's whatever number standard deviations away from the average, right? And doing the calculations um the attacker login um for that for that user account is about neg- negative 13 standard deviations away. That is like it's so large like following the math, it doesn't make sense why that should happen, right? All of this should be questions that we need to go and find answers to. It It might not be always malicious, but they're definitely questions that we need to go find answers to. >> [clears throat] >> And this is just going to show you that. Um Our next concept here is
uh it's called the coefficient of variation. Um and we use that to hunt for beacons within our environment. So we can see how predictable is the connection intervals um [clears throat] from our endpoint to whatever domain or whatever IP address, right? For most human browsing, this is going to be uh very random, very irregular because we're humans, we're we we go to the internet, we get up, go get a drink, go talk to a coworker, go use the bathroom, like it's all over the place, right? It's never predictable. However, for a beaconing home phoning home, um they would be phoning home at a precise time. So every 2 minutes, every 30 seconds, every 5 minutes, every 2
days, every month, whatever it is, that connection is usually very precise. And we can use the math to kind of like surface those connections within our environment. Um again, for most C2 frameworks now, they um employ like jitter and delay to kind of like make it look a little bit more human. Um and there are there are like more statistic statistics that we can do to even catch it as well. Um it might be a conversation for a different time, but if you're interested in in kind of how to um by how to counter like jitter and delay, I'd suggest that you look at the work that Active Countermeasures do with their Rita tool and all the amount of the do post, it's
super interesting how they can use math to kind of surface all of these things. Um and this is just math here, but looking at uh this is the graph of what human browsing look like. It's very random, it's not precise, right? But when we look at um the activity of this C2 beacon is very close together, very predictable. >> [clears throat] >> Um it's it's way different from what uh the human browsing look like. And so these are things that we can use to spot um within our environment. Um sometimes we might have detection rules that fail or aren't personally implemented. Uh as threat hunters, we we want to make sure that um we can spot
things like this, right? Most of us have uh different ways we approach threat hunting. We could have like an a hypothesis, right? Or I want to see if this is going on in our environment. Um we can we can use like attackers like tactics, techniques, and procedures can hunt for that. Um some of us who want to have like, "Oh, I want to find out what the baseline activity is for this and then see if I can spot anomalies, right?" Statistic um in this case can help you like surface the data, right? Get a baseline, um kind of graph it out so you can actually see what's going on, and then apply the math to kind of help
you like surface all these anomalies within our environment. Uh So this is kind of like a side-by-side comparison for that. And then the last one is um it's called producer-consumer ratio. Um I learned about this one a while ago and I thought it was super interesting. Basically, it helps you spot um how devices in your environment um receive or send data, right? It was introduced by Bullard and Gerth at FloCon 2014. So it's pretty old, but it's a pretty relevant um um math in my opinion. So looking at the extreme here, a negative one um after doing the math, um shows you a pure consumer. So they're always downloading data, they're never like sending anything out. And a positive one
is a pure producer in our environment. Right? So they're always sending they're always sending data out. Um every device in our environment has some kind of like PCR identity, right? Um for for most workstations, they'll probably be like closer to like a negative one because most of the time you're downloading stuff from your computer. Um there are very few times you actually send stuff out like via emails, but that's like very small. Right? And so um it's important that we kind of see that, but the most important thing I'll say is to always spot the changes. Cuz for the most part, it'll be within like a particular range, um but whenever we see um a device who
was like almost like a pure consumer just jumped to like almost like a pure producer um it should be a question that we need to go ask and see why is that happening? Why did um why did Ethan's workstation go from almost a a pure consumer to almost a pure shipper, right? That shouldn't be happening. And And the math can help us kind of surface that activity in our environment. And so I don't know if you guys can see this, but the first few days, we can see that their baseline is about like a negative um 0.7. So it's definitely closer to a negative one. >> [clears throat] >> However, we look at the last day and
they jump to almost a one, right? This is when the attacker is exfiltrating exfiltrating data out of the environment. Right? So um between that, uh they had staged data. Well, because we're looking at this from the perimeter out from the perimeter in, we're not able to see that that data staging because it's within the internal network. So one thing to make note of um whenever we're trying to apply this statistical tool or concept in our data set is what is our vantage point? Are we looking at this from the internal network or are we looking at this from the external network? Right? If an attacker is is staging data from other um servers within the environment
to eventually exfiltrate, right? Looking at the looking at from looking at that from the outside in, we're not going to see that. But if we look at that from within the internal network, we can see um that shift in the PCR identity. Um so that's also one thing to to make sure of as as really as cool as this tool is, um one of the things that can break it is the vantage point, like where are we looking at this activity from? Is it internally within the network? Is it externally within our network? Outside of our network, sorry. And then here, again, another graph just to kind of show here um the first few days, purely consumer,
just a little bit below zero right there. And all of a sudden, it jumps up to above one. And then the the the yellow color or orange, whatever color that is, is when the data was being staged for exfiltration. So again, it's a really cool math um concept that we can apply in our environment to kind of help us surface this, right? Most of us have pretty good detection detection implementations, but there are some smaller organizations out there who don't understand this, and we can help kind of surface this um just by applying like very simple math, right? Math that we had done in elementary school, high school, right? Ratio, divide the top by the bottom, right? It's not super
complex. Again, just kind of comparing that particular device to the rest of the devices within our network. And then this is just the the timeline. Um it's not super important, but just kind of how the various tools are able to spot things at these various points. >> [clears throat] >> Um so three things to know here um that I hope that you leave here with is whenever you're baselining, um it's important that you do that per entity. So compare every entity within your organization to themselves because they'll most likely stay the same like habits, right? It could be users, it could be devices, whatever it is, um it's important that we compare each entity to itself. Um and not the entire
organization as a whole cuz it'll be different. And if we cannot do that, we can always employ a peer entity baselining. So compare devices or users with like similar roles um to each other, and we can easily surface anomalies or outliers that way. And the second second takeaway here is applying these three tools. So Z scores for authentication anomalies, um the applying CV for beaconing detection, and then applying the PCR for traffic direction shift that we can spot like exfiltration attempts within our environment. And also important that we know the failure modes here, right? So Z scores, they would fail um if our baseline already contained um attacker data. And one thing to note here that for us to meet the
requirements of Z score, we have to follow two laws, the law of large numbers, which states that um as our data set as our sample size increases, um we we go towards the the true mean of what actual value is, right? So if we're looking at a data set of like 5 days, 10 days, we're not going to meet the requirements for a Z-score to give us accurate data. So, we need at least 30 days worth of data for for our calculation to give us accurate data. The other score is called the central limit theorem. And another requirement for the central limit central limit theorem states that as a sample size increases, our data becomes more normal. And by normal here,
I mean like a bell curve a bell curve shape. These are the two requirements for us to use this particular calculation. For a population data, it also needs to be normal, but I don't know what the entire population of authentication event is in in our in the world for instance or in this region for instance. I cannot know that cuz I don't have access to that data set. So, for me to counter that, I need to just use what I have in my sample. And for me to still meet the requirements without having the entire population, my data set needs to be greater or my sample size needs to be greater or equal to 30.
And so, that's why having like a 30-day clean window is important for us to to make sure we're getting accurate scores for our Z-scores. And then for the coefficient of variation CV, it breaks when like jitter and delay is applied. Um So, just make sure you're mindful of that. And again, like I said, if you want to do more research how you can still counter that at that um uh that technique used by attackers, definitely check out what active countermeasures are doing or check out it's a Sans DFR Summit by Eric Gett Eric Gett Mehmet something like that. I don't know how to say his name properly. Um but he has a talk about hunting C2
beacons in the modern age. Some really good talk um using math for that. And then PCR um breaks when attackers are like super smart. They're using like low volume um and slow exfiltration, right? They can easily like um exfiltrate a lot a ton of data if they're taking it like little by little. However, if you're track tracking the delta and not just what the absolute value is, um we can we can spot things like that. >> [clears throat] >> So, I hope that um that you've been able to like learn some very simple math concepts here that you can go back and apply in your environment. Um it's not very complex. It's like really easy math that you could do in
Excel. Um but it's very good in surfacing like outliers and anomalies in your environment. Um thank you to BSides Salt Lake City for this opportunity to to talk about this. I'm grateful. And thank you guys for attending. Appreciate it.