← All talks

Saving Time, Saving Money: The Business Case for Tuning Your SOC

BSides Seattle19:2729 viewsPublished 2025-06Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
Christopher Hamilton False positives aren’t just annoying—they’re expensive. Every alert wastes analyst hours, increases turnover, and diverts attention from real threats. Tuning systems and reducing noise directly impacts the bottom line. Lets Focus on the dollars and justifying the use case for security operation monitoring. Christopher Hamilton Director Security Operations @ Oracle From Cincinnati Ohio, Living in the Northern Virginia Area. Currently at Oracle working in Cloud Security Operations Previous Security Operations work: Microsoft – Where we went from an outsourced T1, Small T2 model to a robust multi-team/multi-country SOC with ancillary supporting teams. KeyBank – Where we built a Security Operations Center using MSSP and Internal Staffing from the ground up. US Army – Where we built out Security Operations Planning and Staffing to support Incident Response, and Ancillary teams.
Show transcript [en]

This is a pretty short talk. Uh looks like this is an older version of my deck. So hopefully there's not too much left off. I don't have anything down here, but that's okay. I don't need it. Uh this is the agenda. Our mission today uh is pretty quick. Um we're going to catch bad guys as fast as we can and we're going to measure that efficiency in uh time and we're going to say wasted time is our is like where we need to stop. Again, it's an old slide deck. Uh, sorry it's not up to date. Pulled it offline. But so, one of these things up here we have up here is sock statistics. Just why do we do this? Uh, I think

we're actually going to need my updated thought. No, we should be good. So, why do we do this? Um, so 50% of sock analysts uh think about switching jobs. Uh, we have a higher attrition rate in the sock than in everywhere else in the tech industry. Um, so tech industry has about 17% turnover rate. The sock has about a 25% turnover rate for 50% of socks. That goes up. If you go up to 80%, you're going to lose about 10% of your sock year over year. Just how it is. Why do they leave? They have burnout, right? We talked we saw a talk from Carson earlier, but they're also Thanks, man. Uh we have they have

burnout. They're stressed. They have a lot of extra work on their on their plate. So on average, most sock analysts will work an extra day a week if you have a 5 by8 schedule. So they're working an extra one day a week. So, six days a week, what's the point? Later in the talk, I I'll point to it if we have time. But the idea here is that we're actually going to move to Jim 410s. That's my best recommendation for scheduling. This is me. Uh, I've been doing security operations work for about 15 years. Uh, I'm from uh the great state of Ohio. Oh, Ohio. There it is. Thanks. I appreciate it. Uh, but but uh that being said, I'm

coming from Virginia. I've been here at uh I'm currently at Oracle, but I've been here as well at Microsoft. I work at Ebank as well before that, Dell Secure Works, and I was a member of the CPB at in the United States Army. So, what is a sock? Now, if you look at ISO, COVID, CSS, CDR, etc., they will not define a sock. In fact, you probably may not have a sock. You may have a fusion center or a cert, right? Does anyone have any of those things or all those things? Yeah, you probably have different names for it, right? So, what is a sock? Well, I'll tell you now, if you look at those places, they don't

define it. They say, "What should your organization do as a sock?" And so, we're going to say is this is what we say a sock is. We're going to say a sock deals with physical and cyber security issues by monitoring and improving an organization security posture by preventing detecting analyzing and responding to incidents through a combination of technological solutions and processes. Ignore that extra comm. But does this sound right? What a sock does in general? like for the most part. I hope so because we're going to pretend we're them. We're going to skip over this. So this is what we're going to focus on today. The cost of a detection. And so when we're talking about a sock, like

what's wasted time? Okay, we think of it as the false positives and benign positives, right? True positives aren't a waste. We like true positives. Everyone loves catching bad guys, but benign positives and false positives we don't like. So that's what we're going to focus on today. So how do we do that? So the easiest way for us to do that is the there's the cost of the log. the investigation time and the false positive rate, but really it's not the false positive rate. It's the time it takes us to assess the crit an incident and determine what it is. So, what is the cost of the log? So, if you've been in security for a minute,

you probably remember the days where we measured our cost and logs as EPS, right? That was when the cost the log like the biggest cost for our SIM was having it on prem, the hardware itself. That was the big thing. A hard drive cost us $100,000. We don't do that so much anymore. We do service based. So a lot of times we're talking about logs and storage, we're going to be focusing on the volume size, gigabytes and dollars. That isn't the only place we get costs though. We get it from the service in itself. So if you're buying things like uh crowdstripe, which does service cost logs, but uh even if you're using defender, we we price you in at

cores, right? That's the cost of your log. So it's not just storage. There's other ways to get your cost. You need to understand how that cost works and looks like. That's just it right there. And there's some values there. So why do you log? So we log because well we have compliance reasons like they say we should log we have to log so we do in addition to that we we say well we've had past incidents that's a big deal right we've had incidents in the past there's trends in the industry that's a big deal too you know when we start to see things pop up like I don't log that today but an executive says start

logging it right now I don't care how much it costs figure it out that happens all the time then we start to like what is the the cost to us though when we do that well is dollars who here manages their budget or knows what their budget is, right? Manage your budget. Know what your budget is. Okay, but when you look at that dollars, you also have to look at it. What can I not get with my dollars? Now, typically I don't have a blank check. It's very rare when someone says do it. I don't care. Typically, they say balance my budget. And so, whenever I ingest a log, I also have to balance the log cost out as well. So,

they say, hey, if I want to ingest all these uh signin logs, I can't ingest all these 40 to 86 logs at the same time. I don't have enough bandwidth. And that that'll scale based on the organization. Next. So, how do we The next thing we need is we also need to know two other pieces of information in order for this to work. One is we need to know the alert classification was true positive, benign positive, and false positive. That's pretty simple. How do you guys do that today? When I do it, I just say, hey, when we have a case management system, in order for you to close it, you must say true positive, false

positive, or non positive. It's not an option. Typically, I will fall into some socks and I'll say, hey, what's this detection fidelity? They're like, "I have no idea. We just have it. It just triggers." I was like, "Okay, well, but how often is it right?" Oh, I don't know. Okay. Does anyone here do a a regular annual or quarterly uh assessment of the detections? Custom detections. One, two, supposed to do it. It's actually in this, right? You're if you are compliant, you must do it. So, someone is saying you're doing it in your organization. If you're not, someone's saying you are. Now, sometimes people will pass that buck off to their vendors and say, "Well, the vendor does

it, so I don't need to do it." And that's okay if that's what you want to say. But in reality, you probably have some custom detections. So when we classify those detections, we say true positive, benign positive, false positive. That's just kind of a breakdown. And then the next thing we're going to look at is our median time to closure. So when you're looking at case metrics, a lot of people will look at these case metrics and say start time is when I started to like when I got the event. That's actually not true. Okay. So start time for you for this EV for this piece here. We're remember we're trying to save time of our analyst. So

what is our first time stamp? It's when the analyst looked at it first. Not when I not when I got the event. It's when my analyst touched it first. That's what we have to measure against. So it's medium time to remediate. Now you can look at it from the big the macro perspective and you can say hey this is what we're saying when a log actually comes in and fired and that's the meanantime to remediation all the way through the incident. That's totally true. It's not wrong. But when we're measuring case metrics and itself we're measuring sock analyst performance. We're measuring performance of our organization. So we have to say when did I first look at get

get eyes on glass. If you want to shorten that window flash to bank make that a metric and say hey I need to speed up the amount of time it takes me to go from when detection occurs to when my sock analysts look at it for the first time. So there's two different metrics here but we're going to focus on that latter half. And I keep saying this I can't say it enough. It's median not mean right? We all remember we learned those things in primary school. Mean median mode. It's median in this case. When would you use mean versus median? You guys know it's a skew. It's a skew. That's it. Like, so if you're mean,

you're looking at like a basian model, then yeah, use mean. Okay. When everything's kind of centered in a tight shot group, use that. All right. When we look at Andrew's talk this morning, precise accurate, if it's all kind of together, yeah, use mean. It makes a lot of sense. But when we're looking at things with a a wide skew in time, like a longtail analysis, you want to use median instead. If you don't do that, what you'll end up seeing is you'll have one metric, like one case that takes six days to work because it's a true positive, right? True positives don't get solved in five minutes. They get solved in days typically. And then we

would say a median would that a mean with that would bring that far to the right. So a median gives us a better shot of how actually how long these things think to work by themselves. Bear with me here and deal with the uh the scrolling if you wouldn't mind. This is going to be an an eyesore for you and I apologize. But so has anyone heard of Chester Spence? You guys know what I'm talking about? Chester Spence. It's this idea that you you're walking down a road and you see a fence in the middle of the road and you say, "This is in my way. I I want to get around this road. I want to keep going

down the road." And you say, "Well, if I take the fence down though, what happens?" You don't know why the fence is there. Okay? But you take the fence down and now some cattle get mixed in with some other cattle and you have a whole problem. The idea here is that you don't know why something exists. So, you shouldn't take it out of context. So, if you don't know why detection is there, do not disable the detection. Okay? That's just we're just going to start with this this process of do not touch things that you don't know why they're there. until you know why that they're there themselves. So that fence, Chesterson's fence is the anecdotal

example here. That thing is used for if you can fix it, sure, go for it. I can work. Yeah, go ahead. I did. I had to, unfortunately. Yeah. So, Chester Spence there is is the example here of like whenever something's there in the middle of a pit, a field there, uh if you don't know why it's there, don't touch it until you know why it's actually there. This is going to be a tough to read like this. Uh but we'll give her our best shot here. Uh good luck. It's like a It's like a Star Wars. You guys have seen that, right? So, a long long time ago. So, what I'll do is I'll click

on this here in a second. Uh I appreciate it. Thanks for coming back. No, it's okay. Shame. Shame is what we say. No, thank you so much, sir. I appreciate it. Uh so, what we're going to do is we're going to look at some data here in a second. This is just our data catalog what we have here. Uh what we'd like to do is this is the dummy data. And so this is just going to be a quick walkthrough of the tech that we could use. So like I said, this should be done as part of your quarterly review of your detections. And so what we're going to do is just a very quick

and dirty quarterly review of those detections. Just a second. I actually do need to use my laptop for this. So it's a hands-on thing. Okay. No, no, it's okay. All right. Cool. Thanks for bearing with me. So this is a a demo spreadsheet of of just how this kind of works. Uh again, bear with me here. It's a it's an older copy of my deck. Um but essentially what we're going to do here when you're doing these quarterly reviews, what we need is the detection itself, what the name is. Now, some of you are going to be bringing in sources from like your edr source. So if you do that, um just have some type of

annotation, whether that be uh an ID or you have a name, right, to use. So when you're using those, that gives you some something to break off of. And when you're bringing that in your case management system, that can be a detection. For right here, these are all just dynamic detections. So what we've done have done here this is okay what we've done here is went ahead and brought this over into uh see broken down our log costs. So we have the true positive rate the false positive rate and the benign positives and then this breaks it down to a true positive rate over here and then that allows us to do some simple math. Right now you're going to see this

potential negative and this is where we have a a kind of a different opinion. So when you say like how often like what is a potential negative rate going to actually be, it might actually end up being something along the lines of what is the total cost of a normal breach. So if a breach is going to cost you $4 million a year, isn't every one of these detections potentially $4 million? No, that isn't how it works. You actually string together multiple attacks generally and that's how you get through your $4 million. But in this case here, how how much these these are just like pie in the sky numbers. when you're doing your detection engineering,

your your theory crafting, you're going to say, how much is a one of these things going to cost me when it doesn't get annotated, we times that by our true positive rate. So, if we look over back over here in our true positive rate, we say, hey, for this detection, we had 31 detections. Thanks. 31 detections. And then we say that would cost us a true negative rate. A true negative cost would cost us 10 $100,000. So, this would cost us $310,000. We can look at that and then break it down into our total cost for our logging as well. And then that allows us to get an idea of how much time we're spending as a

sock analyst in a year that's not getting worked on false positives. Remember, true positives have value. False positives do not. And so as a result of that, that's wasted time. So if we add all that together, fingers crossed this is all in this spreadsheet here, this old version. I don't think it is. It's not. So if we add all that together, I think we can just get rid of this guy for now. Go back to the the presentation.

So if we add all that together, what we end up with is finding out that our sock analysts are spending a lot of time, oftentimes somewhere in the ball ballpark of 75% of their time working on false positives. Do you feel good when you investigate something for four hours and then don't find I don't. I feel terrible. Okay, it turns over I I ruined my entire day. Now imagine that you have to work 100 cases a day which is a lot but it happens all the time like so when you work in those cases one of the things I would tell people like hey make sure when you're getting cases that you're merging cases together when

you're doing case management don't let things pile up merge them together when they're there that gives us accurate measurements also when you're working on cases if it's a false positive go back and tune it say why is this a false positive how do I get this to be a true positive in the future well because when we do the math we end up finding out that our sock analysts we might have a staff of 25 people but we can actually do all of our true positive work with five. That's not a good reason to do that work. Makes it feel like like you know when you're doing that. So what I would tell you there is is make

sure that you're uh you're calculating that metrics out. Do it every year. All this comes down to TLDR. Skip the whole talk just come to this is include your your case metrics in your detection review cycle. So whenever you're doing detection reviews every year, seeing the fidelity of your EDR, NDR service, uh whatever that might be, bring in all your detections, say this is my true positive, false positive, benign positive rate. This is what it looks like when I start to do the math. And because of that, we start seeing I'm trying to uh because of that, we start to see the math kind of shake out, but we can do things a little bit

better. There would be some extra slides here, but I think this says Q&A. So I think I am done. Sorry for all the technical issues. Uh I will have the slides published with all the the correct stuff up there. Any questions in the last two minutes? Um so yes so when you're doing this internally you can already have the practice of measuring true positives false positives. You're doing the same consulting capacity. Yeah. When you're advising an organization do you how do you factor in the time it would take for them to classify true positive false positive? It should it should be baked in. It should be part of doing your everyday work. Oh, I mean the initial

upfront cost for moving to a system like this as part of your case management. All you got to do is add it in as a closure requirement. So when you're doing your yeah when you're doing your CMS whenever you're doing that sore organization you can say hey this needs to be as part of closure material needs to be true positive false positive or not positive and you unfortunately you can do it in reverse but there was a good talk today about agentic AI. You can do reverse and classify things in reverse but just understand it's going to be spotty. This is like a trend. So when you're looking at data when you trend it data is more useful when you

look at it like in the past right so you have to say in the six months from now what am I going to need and I'm gonna need this data so you got to start today unfortunately that's the easiest way to do it you have a question sir yeah you talked about two pros and the question before that is how do you decide what to what do you all yeah so on that spreadsheet didn't do a very good job but we have a detection genesis so Typically, there's a reason you have a detection, whether that be compliance, an incident happened in the past, trends in industry, and that's how you get up there. And then sharing the log cost as

well is another whole issue. How do you say like, well, 4624 events are used in like 75 different detections. How do you justify the log cost among those? Well, you can divide it out or you can look at it one at a time. I'm trying to go in order. Question. Can I add a Yes, sir. Of course. So, I I managed to talk with uh 700 retail locations. name the company, but the when I first got there, they were getting all kinds of of false alarms. And so the the rule I made for them that they had to have time was if it requires action on the part of the analyst, it's an alarm. If it doesn't

require action, it's a law. Yeah, that's a person. A lot of stuff was being alarmed and they have no business alarm 100%. What about tuning out false negatives? See people tuning out false negatives. So things that didn't get false negatives. Yes. Things that didn't get caught. Tuning them out. Yes. Tuning them in when they're you're trying to but when people start creating logging that doesn't make noise. Oh, I see. Or not understanding how to log. Yeah. Sometimes. Yeah. So that happens. CMDB, you know, lack of CMDB and it just flies by and nobody knows what the hell it is. Sometimes people just start logging things for no reason. Noise. Yeah. They just generate noise. Yeah, unfortunately

sometimes we have compliance reasons. We just got to say why am I logging this? And then also you should have a data dictionary or a data catalog that says hey should I be logging this already or do I already have this log? Can I satisfy my compliance requirements without having to do this? It's nice to see you. Go ahead. I think I'm out of time. Um I have two questions for the sake of time. Yeah. Uh, how do you contextualize your your tickets or your how do you what would be your best practice in contextualizing when you're on call and reporting to the next socks analyst that's going to be coming on when you're not able to finish that report

or incident. Yeah. So most socks um should have a triage layer. So your tier one coming in and you should have an incident handling instant response layer. You're going to take that in. they need to do that ingestion that that layer that tier one lot a lot of people call it needs to be 24 by7. It needs to have overlap so they can do shift handover and change without having to have somebody else come on and you just say hey like you got to have staffing to do that work. Um and then just documenting all that that work that comes with it. And if you work with an MSSP I work for a governance and GRS

point of having to do this as a bonus part of my job. What's the best way to make sure that the MSC is staying on top of it and seeing it all the way through? It's not just we log this, we review this, I'm doing a quarterly review of the KPIs that you just talked about and I'm just checking the box. I want to make sure we're unfortunately it's going to be auditing and and checking up on your MSSP at that point. Um, you can bring in third parties to do it. You can have pin tests, do blackbox pin testing as well to see what's not getting caught, things that you know happened. That way you have signal that you know

they didn't bring up on and you can be like, "Hey, why didn't this get detected?" Have a good relationship with your MSP or MDR service. Um, hold them accountable. They're a sock analyst just like you. You have sock analysts. So just say like, "Hey, yeah, but you're buying a service, right?" Say, "Yeah, I'm buying perfection." That's all I have time for, guys. I appreciate it. Thank you so much.