
Let's see if this thing is working. It's rolling. It's rolling. So, okay. First about this about the presentation. So, detection engineering and use case design kind of intersect at least in the last couple of years. Oops, I forgot my [ __ ] um remote naturally. Here we go. So, a little bit about me. If somebody's asking me what do I do and even my relatives, which are by the way crazy antivaxers, I kind of say that I'm a I construct mouse traps. I make mouse traps to catch mice. You know, that's that's the main idea. You kind of build stuff that alerts on some behavior. That would be a detection. Hence, detection engineering and use case design.
So, this is what I do. I construct mousetraps. Now, I'm not going to go employment history and all that jazz. Quite frankly, it's boring and nobody cares. So currently I'm in involved in the basically compliance perview stack which has a lot to do with use case design and pretty much the approach that we are going to go over can be used to any security system really to design any custom detection that you might need and that's kind of the basics right so let's start [Music] rolling so what is detection engineering detection engineering The main goal if you have to distill it down to a I have to do this is basically you have to derive meaningful security alerting from
the data coming from your devices right so you're going to have logs they're going to come out of everything firewalls endpoints your EDR your email gate or whatever whatever and this thing has to be derived for some to some pretty much actionable alert alerts that your sock or you or the admin guy or whoever basically can act upon. You know, you need to track intrusions or changes in posture whatever it's kind of whatever the business need of the company you are in it. So we have have to distill it down. A use case is a security cyber security use case would be something that the business needs to monitor in terms of either being compliance or being breach
alerting whatever you have over there. So this is kind of the basics of the operation and pretty much you go you guess what you're going to look for then you write certain code that is going to facilitate that particular you know looking for stuff in automated means because you cannot do it always manually. Um benefits obviously if you have a good detection engineer they can tune your rules your CM system your EDR whatever so your sock is not going to die out a flood of alerts um if you have good detection engineering team obviously you're going to spot the things faster because quite frankly there is nothing like matching a detection to the certain environment
that the client has while you know solutions come with something out of the box this is customized for billions millions and billions of people and just little customization transformed the efficiency from let's say 20 30% to about 80 to 90%. You know detection ratios of threats that you might miss. So it's kind of beneficial. Um yeah and also when you control your rule set you can pretty much control everything in the vertical from the detection to the actual response to what detection you want to weed out or what you want to look at when you buy something and you take it out of the box and you implement it that thing is not possible quite frankly. So you know this
is what detection engineers kind of do. Okay, to be fair, most of the times we just whack false positives because the sock is dying. But hey, all right. So, so what do you need to actually to start detection engineering, right? I mean, you want to do detection engineering, you got to do some prerequisites before you start rolling. First, you want to do log survey. So, what is lock survey? So you're going to sit down and you're going to think what you have in your environment firewalls endpoints whatever is available and you're going to figure out if those locks, you know, are of some value to your specific monitoring needs like uh obviously the mail server is going to be kind of a
value. Your endpoint logs, whatever you got, but you kind of survey the whole picture and you build a log inventory, right? So when you build a log inventory, you have kind of starting point to figure out what you have and how you're going to use it. Uh log collection, that's a big one. Many companies are actually this is an ongoing process and it's constantly ongoing. Lock collection means you actually build an infrastructure to ship those logs from those devices to a centralized location. So hold up. I think I run a lo survey collection. Oh, I'm here. All right. So, then you got your use case design that comes in. You figure out what you want to detect. You want to
detect the for example file put in the wrong folder. You know, your guys for example are working in a specific folder. Your programmers certainly an executable file ends up in another folder on the server. Something is going on. You want to look at that stuff. Um, then you're going to figure out your detection logic. So the detection logic would be pretty much you got the logs now you got to figure out how to you know basically in automated way to go over these logs. I mean you got to have something there running to produce meaningful alerting you know something that is actionable and something that you can give your sock or your engineers or your admin whoever but
basically you know you got to have some sort of criteria to weed out those events that are of value to you in terms of security monitoring or compliance monitoring or whatever you want to do and then you're going to have your response workflow. So this is important and many people overlook it. But if you see that the red light is blinking, then what the hell you do? You know, when that red light blinks, you got to have some sort of a process to respond or chaos ensues. Servers get encrypted. Um then the backers get encrypted and you're out of business. Uh those things are kind of interconnected. You cannot have obviously detection logic without shipping your logs. So there is that
there kind of a iterative workflow that you constantly work on and improve. So you know some key questions that you need to answer is basically where your logs are coming from. Is your collection working properly? Do you have gaps like do actually you lose logs? For example, some log shipping systems go down and nobody gives a hell because they're not under monitored and your logs go your log flow goes down. Obviously, your detection pattern will not see stuff because stuff is not coming. So, apologize. Okay. And this obviously um we have the right data. You might be overcolcting, you might be undercolcting. So, am I ging the right logs? I mean, for example, let's say you
have the logs from the the door entry system or whatever and you want to want to monitor, I don't know, for example, who is going on stage. That's not going to give you I mean, it's going to give you who comes into the room, but it's not going to give you who goes on stage now will it? Um, how long what's your attention? So let's say you have you know you need to do detection logic and you have only 72 hours of log data but you need to correlate across for example a week of events. So you got to know your constraints and how long are you keeping that data or you going to go obviously you're not going to be
able to realize some specific detection you know common block sources that you want to look at user authentication locks email systems firewalls and network devices cloud services database application all of that is of value if you have you know the right detection to attach to it if not you're just wasting money on pretty much lock storage. Uh so kind of the basics before you start rolling anything is you know to if you need to design a use case there are some basic elements that even before you start rolling you need to have an idea let's say so obviously you got to name it something I mean that rule is got to be named something is going to come into
the sim so kind of important purpose so what are you trying to detect what are you trying to how does it actually work in your specific environment um context. Um problem statement is clarification on the purpose. So problem statement is you actually describe in detail what you want to see, what you want to detect or what you want to basically what are why are you doing this for data sources. This is important. So every detection that you design is going to depend on some data sources. Any basically the data source is a system that is issuing logs or something you can call it a sensor that is providing telemetry based on which you're going to make decisions. Right?
This is going to be your data sources where your logs are coming from. Then you're going to have your detection logic. How are you going to detect? How are you going to see what is going to be your detection rule? What is going to be the syntax? What is going to be the false positives? What are be going to be the false negatives and all that criteria that is important for you to process further. I mean this is where the core of the magic happens. This is where you produce the alert. You know this is the actionable event after this. And you're going to have your response action. Response actions are like we said the red light blinks. What do I
do? This is important. If you're one guy, doesn't matter. Two, three guys, doesn't matter. But when you have a sock that is distributed, let's say in four freaking countries and you want to get them on the same page and they respond to the certain standard, you know, you want to have consistencies between basically what's going on, then describing the response actions becomes very important. All right. So there's a little bit more of a extended let's say this is what I use for a template to start working on every use case that I kind of have to deploy. So obviously you got your you know we spoke name purpose problem business impact this is an important one when you
roll your detections what will happen because for example if you push a detection to the edr and depends on how this thing is rolling you might slow down the endpoints or when you need to ship the logs from your email servers you might clog up the network or I mean basically figure out when I put this in production what happens So data sources we spoke about this detection logic we spoke about this response actions kind of you know specifics like I'm going to rotate the credentials or I'm going to reset the password or I'm going to quarantine that endpoint or I'm going to block this mail center whatever it is you want to have your response
actions and then you're going to have assurance metrics and this is often overlooked. The assurance metrics says basically how if I deploy this, what do I need to say or what needs to be matched or met as a criteria for me my detection and my use case to be considered successful and something that is worth running because quite frankly running custom detections cost money. It translates directly to cost. You on board logs, you process those logs, you have computation that is running. So this translates directly to money. So if you you got to justify to the business, why am I doing this? So you can say for example 5 minutes for detection of a specific event that it's in the use
case. So those are your assurance metrics and they kind of serve the CYA scenario which would should be cover your ass. Okay. So we're going to go over a case study. uh detection suspic detection of a suspicious oath application access. This is kind of a it's been a thing in the last couple of years. Basically the bad guys will send a fishing link with a authentication not authentication but actually a link to an application that authenticates against the environment and they kind of get a persistent access to the user mailbox or god forbid if a um you know admin actually gets fished with an application they get even more it's persistent it's nice you know it's it lives directly
into the Azure AD tenant it gives you you know it gives you basically everything the user has access to and um it's often overlooked because quite frankly people don't look there uh if you're using the Microsoft security stack and spec specifically mcast or other CASBY solutions they monitor specifically for this but to be fair this is first people don't implement in MCCast because it's kind of something hard to service on the back end in terms of you need to process the telemetry from it and you need to have analysts on it because this thing produces a lot of noise and second it's actually expensive. So you know that basically how the flow works attacker creates a malicious app
user restricting to approving permissions attacker gains persistent access. This is where they sit and they basically excfiltrate data for more or less you know they they sit over there pretty and they're stealing data which kind of is the the end objective in this case. So the nice parts about it is bypasses password authentication. You don't need to fish any more passwords. You got an application over there. It sits there until somebody removes it. Um activity could be appear very legitimate. They can have a tenant. they can sign their application, all that jazz. You know, Fireey got compromised. Exactly. I believe with some sort of application like this. Um, and uh, yeah, users don't understand what the hell they're
approving. You know, fishing links, they're kind of on they're on the lookout, but like when it's an application, do you want to approve this? And you get to like a legitimate Microsoft window that says, "Hey, approve this." And they click it. Okay. So what are you going to do in this type of situation right? So your core components obviously you're going to name your use case suspicious or off application consent purpose would be to identify potential potential malicious orth grants problem statement or fishing enables persistent access basically they get in they stay in business impact well they can read mailbox they can read whatever they have access to and they can do even more damage.
So your data source would be Azure AD audit locks detection logic basically you're going to look for high value permissions non and nonapproved applications response would be to revoke the application access to the credential if that happens and for example for assurance metric you're going to have let's say 5 minutes detected 15 minutes triage SLA so upon 5 minute of registration you want to catch that thing and you want to in under 15 minutes you want to get it out of your environment so nothing bad happens that's your objective to do all right so for the textual response how would the look basically the flow would look like alert would be created in whatever monitoring system you have
security team will be notified by this set alert um obviously the SOC is going to have some investigation to do is it legitimate is illegitimate, whatever is going on. And then they're going to decide on an action. Are they going to follow their response process what we outlined in the beginning or they just going to, you know, scold the admin for approving something? So obviously what what happens, you know, it's a little bit of an if then logic. If it's a confirmed threat, you're going to block the access, reset accounts, investigate impact. If it's suspicious, temporary block and then verify with the user. And if it's a false alarm, obviously, yes, you're going to close it as a false
alarm. So, what you're going to need every time you start, you're going to need some validations that you can actually pull it off. So, is the basically is the logging infrastructure in place? Is your lock source connected to your detection system? That would be in this case the Azure AD locks have to be connected to your SIM being Sentinel or whatever is going on. Then you're going to develop some detection rules. They are going to look for you know that particular activity in the locks you have on boarded and then you're going to establish your response procedure. You're going to like I said rotate credentials whatever you want to do and you're going to deploy monitoring
controls which will pretty much you're going to look for that specific behavior and you're going to instruct the sock on how to operate what to do. So validation basically okay so we created this use case we created this monitoring and you're going to do some you know you're going to do some validation before you roll it out in production this is your basically basically your basic sanity check you're going to have test did it happen like roll it against data that has been already kind of collected um what's your false positive you know basically what are the ratios of noise versus what are you actually getting in terms of valid alerts that your sock has processed and
stopped. You know, you're going to do some measurement before you roll it around in production. Uh you're going to document your false positive patterns when it generates false positive, what happens. So, you know, future processing, you know, people have some sort of a guideline in order to do this. and you're going to try to minimize it before you roll it out in production, at least against the data you have at the moment. And you're going to validate your response flow. Can you actually ban the application? Can you reset the password? Can you rotate the credentials? You need to validate that before it happens because there's nothing worse than you create, for example, automation app that resets
password and this thing is not working. Trust me, it happens. All right. Uh so this kind of my last slide if you have to boil it down to three or more stuff that you basically you want to say okay what's going on here you're going to number one know your data know your sources know what the hell is going on define your threats that would be the threat modeling people if there is anybody here you know you do a good job you have to figure out what's your threat and who's going to attack you and how this is going to look out how this is going to And always attach a response process to your detection pattern. You know,
basically always have a response process that that is attached to your specific detections or your to specific detection logic that works. And this has got to be written and people, stakeholders, it's got to be burned in contacts, all that jazz. If an incident happens, what the hell you do? And how if one one in the night, you get a call that you have uh servers going offline at the ratio of three per minute and you're losing connection to your endpoint. What the hell you do? So, uh yes, this is actually the last slide. I'm going to show you a document of this kind of being in action. You know, basically a kind of a designed use case
that you can pretty much I'll put it on GitHub so you can work with it and I don't know question now or later. What's what's the Yeah. Okay. So, I'm going to just roll a bit another document that I've designed and my wife said to keep out of the presentation. Thank God. By the way, that was my wife and you know, she's been putting up with my assholeishness so I can create
this. Ah, it's not going to be a demo. It's actually just a document. There's no Come on. You do demos on a company laptop now. Okay. Do we got some motion? Okay, we got some motion. Yes. So this is how a proto use case would look like. This needs more work. So you got your name, detection of an authorized application, access to Azure ID, you have your purpose statement, detect authorized application, consent.
Oh I apologize. I suck at this. Which way? There we go. There we go. So, probably if I press F11. All right. So, since I cannot see [ __ ] I'm going to go here. So, you got your purpose. You're going to detect an authorized apps. You got your problem statement. What the hell am I trying to detect? And basically, why am I doing this? You're going to build your success metrics, technical success metrics. what we spoke about for example detection latency under 5 minutes alert to incident conversion ratio that would be 80% um of things would be noise and false positives under 15% of the produced events meanantime to respond let's say how until you detect this to a
remediative action you're going to aim for 15 minutes and that's on your socks obviously this is going to be like It's going to be a high severity type of alert detection requirements. This means what do you need to detect this? So you're going to monitor your Azure AD ooth. You're going to alert on application requests with sensitive permissions. You're going to alert on applications from unverified publishers and you're going to alert on re recently registrated applications 30 days ago. That's it's not actually very good, but alert on applications not approved in application registry. Most companies have an application registry of applications that they use for their let's say applications that integrate with Azure ID for authentication. So
they keep kind of a registry. You know basically the enterprise apps in the Azure AD portal would do that for you. All right. Uh I need to scroll down I guess. Okay. Can somebody come and scroll? Brilliant. There's Thank you. Thank you. Thank you. Thank you. All right. Chill. All right. So, this is where it gets funny. Uh, you got your primary source reference data. Obviously, you want to create like a white list of applications and then the analytics rule query. Now, this pile is actually the main magic. And honestly, uh, if I try to explain it now, it's going to be probably about 30 more or 40 more minutes plus, um, you know, some
KQL basics. But this is what runs in the sim. Yeah. This is what's going to produce the alerting. This is what's going to scream. This is what's going to like this is what's going to produce the prioritization and pretty much generate the alert and prioritize the what's going on and how bad is it. Let's put this is a detection rule. It's kind of the most uh you know what I usually do. So going to scroll down a bit. So I have a basically a su code. So if somebody shows up and they're like not exactly versed in KQL, you want to be able to kind of explain to them what's going on. So you create some sort
of sodu code if then logic that they can follow, you know, like a normal person can read and you put it under your detection K uh under your KQL. Why do you do this? Quite frankly, sometimes people go in and they edit detection rules. Like those things are alive, but they don't update the documentation and it's difficult to follow what was the original intent of the person who wrote the rule and why do they have it in the first place. So there is not a reset button. Uh I apologize. So yeah, a little bit more. Thank you. Okay, here we go. Here we go. Good. Good. Okay. So technical components, this is what you need to pull it off. Azure ID audit
locks, Microsoft Sentinel analytics rule, obviously one created over there. Sentinel watch list that you're going to put your whitelist in. And if you want to go one step further and you want to do automatic response, then you have Sentinel playbook for automate that automatic response. Yes, there was a certain Ilian over here that kind of does this. All right. So you're going to calculate risk obviously um risk score 40 to 75 74. Why is this important? Because many monitoring systems Splunk U Azure every one of them kind of use event compounding to calculate risk meanings if something happens it gets a value of five. Then something worse happen it gets a value of 10. But let's say the
alert threshold is somewhere around 30. So you got a bunch of stuff that has to happen in a block in order for an alert to be raised. So you're going to have your, you know, basically your score compounding and risk uh risk accumulation. And this is important because without risk accumulation, um your sock is going to die out of alerting. Let's say if you have alerting based on atomic criteria, what would I call basically let's say IP address or a string and you don't have any risk compounding then you're going to produce lots of events and this will swamp the sock more or less. So if you can go down okay thank you and you got your
technical dependencies. What do you need in order for this to be swing? Technical limitations. What are the limitations of your actions? Like I cannot do this because or I cannot do this, you know, where are my limits? And this it's a little bit, let's say, not fleshed out here, but you really need to dump a lot of stuff in here to cover your basis. Why something didn't work? Because when the auditors show up and you got to say, they go, "Well, why didn't you detect this?" And you can go back to the limitations and say, "We didn't detect this because blah blah blah blah blah." Um miter mapping people use skill chains miter mapping. You want to put it
somewhere so it can be classified and basically for you to have a idea where your detection is covered and where it's not and where you need to work. And maybe, you know, if new threats arise, you can use the mapping to kind of refer to your current detections and maybe you can just tweak this one to go after the, you know, the new threat. Um, you got your configuration requirements basically, what do you need to configure in order for this to be swung and and that's it. And pretty much yeah that's uh how would a basically a skeleton of a use case for detection would work or how it would look if I do it. Now people have their approaches
they have different stuff to do this. So yeah questions comments anything