Detection-as-code: Why it works and where to start

Name: Detection-as-code: Why it works and where to start
Uploaded: 2022-07-06
Duration: 20 min 40 s
Description: Kyle Bailey - Detection-as-code: Why it works and where to start Detection-as-code principles allow detection and response teams to operate with the efficiency of software engineering teams. By embracing these principles, D&R teams can unlock the benefits of version control, test-driven development

BSidesSF · 202220:404.6K viewsPublished 2022-07Watch on YouTube ↗

Speakers

Kyle Bailey

Tags

CategoryTechnical

StyleTalk

About this talk

Kyle Bailey - Detection-as-code: Why it works and where to start Detection-as-code principles allow detection and response teams to operate with the efficiency of software engineering teams. By embracing these principles, D&R teams can unlock the benefits of version control, test-driven development, code reuse, and CI/CD automated workflows. Sched: https://bsidessf2022.sched.com/event/rjpV/detection-as-code-why-it-works-and-where-to-start

Show transcript [en]

all right hello everybody can't hear me okay sweet um really cool to be doing this in person again it's been a long time so i appreciate everybody coming out uh this is pretty this is pretty cool to do this in a theater as well so my name's kyle i've been doing threat detection and incident response for about the last 10 years i've been doing it in all sorts of venues from cyber command in the us air force to the tech industry most recently at box building and managing the detection engineering team there i currently work at panther labs as a security engineer i've been there been there about a year or so so we are a sim vendor so i want

to just like start with this disclaimer this is not a like vendor pitch by any means uh detectionist code is one of those uh sort of core concepts in threat detection and threat detection management where you know you can apply it to sort of any tech tech stack that you have uh in your environment there's maybe like five to ten percent of this that will get a little uh tool specific but i don't want uh i don't want this to become a vendor pitch and it's not going to be so why are we talking about this so if you've been in the threat detection or incident response landscape you've probably seen this term before it's

probably not the first time you're seeing it in the last three or four years or so we've had uh you know industry practitioners uh thought leaders and vendors starting to talk about detectionist code should we do it does it matter yes we need to do it like use my product to do it that kind of thing and then more recently we've seen practitioners security engineers and analysts start to say like hey here's how this works in my environment and here's how it helped me and i have the same experience at box and i want to share that with you and so um really what's i mean like what's been changing over the last four to five

years right so uh threat detection or detection engineering five years ago uh was was a pretty specialized field you had to be at a pretty big security company or a company with a pretty big security team uh and maybe a pretty advanced security team to have a dedicated uh threat detection function and then over the last four or five years we've started to see you know a lot more detection engineering posts on linkedin people looking for detection engineers smaller companies or companies security teams of any size building out threat detection functions because of the importance of riding threat detection specific to your environment right and having the ability to build and maintain that detection logic

that's very specific to how your company works or how your product's built so why are we like okay great all that's happening but like why and this is like the most profound slide of my entire presentation is because it's hard right anybody who's done it knows that and and when i specifically wrote this term i was thinking writing the content is hard it's hard to research the threat it's hard to understand how i see that in a log it's hard to understand you know how to write the logic so that i only pick that piece up and you know all of that process of building the content is extremely hard but that's not what we're talking about

today so today we're going to be talking about a totally different set of problems for threat detection content to side and threat research aside those are all very hard problems as well so what are some other what are some of the other problems around threat detection or what what makes it hard to do in a sort of modern you know scaling security team and i'm sure many of you have asked these questions before had them asked to you so uh who changed you know detection x and and what changed sounds like an easy question to answer but depending on where your detection lives and how you maintain it it may not be that easy am i going to do errors am i going to

like introduce problems or false positives ticket cannons by making a change or releasing this detection i don't know it can be hard to understand that or answer that question repeatedly when will we have a use case completed new threat comes down the line like where does that fall in all the work that we're doing from a threat detection standpoint is it still working as expected and we'll go in deeper into that as well uh is the current version of my logic running did it like get saved in my sim appropriately um like i said ticket cannons am i going to deploy a detection that just generates you know hundreds of alerts that are useless just right off the bat

definitely done that several times and then this is the most important to me and and why detectionist code really hit home for me when i first started kind of learning about it is like thinking about as a detection engineering team thinking about what you're producing as a product to whoever's responding to that alert it may be you or maybe a dedicated team and thinking about how can we improve the product for the incident response team and what are the processes and technical guidelines or technical guard rails that we can wrap around the process of threat detection sort of creation to produce the best product that we can for the incident response team lowest amount of

false positive highest amount of signal so i've mentioned the term a lot let's like define it um so detectionist code to me and and um to a lot of others is applying software engineering principles and best practices to writing threat detection so right software engineering has been around for a long time they figured out how to write good consistent code they figured out how to test it and how to do things repeatedly and reliably threat detection we're kind of like just stepping into this space you know a lot of a lot of days of copying and pasting detection into the production version of your detection logic like those types of things that software engineering figured out like 40 years

ago we're just starting to think about in threat detection as we consider this more of an engineering discipline i need to apply more rigor to it in order to create that product experience that i was talking about so that's really how i how i define it broadly and kind of how we'll think about it going forward in this presentation so i i categorize it into five domains uh sort of like the five domains of detection as code and i have these ranked up here in sort of the order of ease of implementation so starting with things like process and you know ways that you can think about writing detection in your sim a little bit differently than maybe you are now

all the way down to you know integrating your sim with other tools integrating with version control integrating with ci cd and really thinking about you know a test-driven development approach to threat detection so let's start with agile this is like nothing groundbreaking here right everyone's used to agile but thinking about it from a threat detection standpoint um we can answer a lot we can answer some of those questions that we talked about before having a prioritized backlog the team constantly knows what they should be working on it's always kind of that up next mentality of here's what we've decided is the most important next we're going to work on that and that obviously changes over time

but dedicating time for documentation testing and review in your workflow process for the team so having uh or yourself i say team like all this is interchangeable just for to make that clear like this doesn't need to be a team of detection engineers you know it could be one person writing detection but thinking about these things um as as sort of like important components to building any type of threat detection so dedicated time to documentation testing and review um and then be able being able to monitor you know work in progress uh are we trying to do too many things uh is the work that we know we need to do does it keep getting de-prioritized

by you know the latest greatest threat that may may or may not be that relevant in our environment um some great resources here one from from alex to shara and palantir around documentation and jira workflows okay so this is the like five or ten percent that i talked about that may be pretty specific and it's specifically around using expressive languages so most like query languages right you can you can look for an ioc you can look for something um pretty generic pretty pretty easy to do that's right like your any security tool or sim will do that for you um but when we start thinking about trying to do more complex functions import third-party libraries uh really like

tailor that detection to your environment to reduce that false positive ratio we need the flexibility an engineer needs the flexibility right to be able to express what they their like desire for the threat detection logic into what they're what they're producing and i'll go a little bit into into this in just a second but having the ability to do that in the language that you're writing is is a very powerful uh feature um some of these others these are like very sim agnostic right so uh turning common components or uh you know pieces of the detection logic that is the same across many use cases turning that into a function or function like there's many you know sort of

pieces within a sim that will let you do that look up tables and data models those are other great ways to sort of like centralize data or centralize functionality into a single place and then be able to just call that from you know any detection use case that you're writing a data model is obviously being able to write you know single detection across multiple log sources that is sort of like code reuse at its finest right there so so this is what uh this is yara url so this is an example of this is pretty simple right i just need to check if you know ipe is in a cider range that's pretty simple but some languages don't

let you be expressive like this and and being able to just say like write this in one line as a detection engineer or someone maintaining detection is is really great to be able to do and makes it very readable and you know on and on uh this is a python example so just another example of sort of the functions that i was talking about but being able to you know define a function and then call it repeatedly i've blurred some of this other stuff out because it's not important but the important part right is being able to define that piece of code that you're going to reuse in a single place maintain it in one place

updated in a single place all right so this is where it starts to get more fun and more interesting um so the next three pieces if we start like thinking of them as a house i would say like version control is the foundation to this house and it is kind of the core component of being able to uh like run ci cd workflows and and really implement test driven development so probably fairly familiar with the benefits of version control right so one of the one of the benefits that i really saw and that i really appreciate is being able to easily pick out what's changed in a peer review and easily being able to enforce

that peer review so when we go back to thinking about detection as a product having you know that peer review having it validated having the change validated and you know like as a as a leader that that's technically enforced uh that's that is a great feature to have to make sure that you're having you know eyes on detection and eyes on changes as they're as they're being made um one thing i want to explicitly call out here is is sort of the the flow and where like version control needs to fit into the to the workflow so basically it needs to be pushing your detection to your sim right it needs to be the source of truth

because that starts to let us enable these things we're going to talk about next about ci cd workflows and testing um if you're pushing things the other way right you're basically just doing a glorified backup from the detection logic in your sim into your version control and you don't have that enforced peer review you don't have the changes being flowing being flowed through version control and you kind of lose the benefit so not to be dramatic but i i think like doing it that way is a bit of a glorified backup and it needs to go the other way and that's really important for what we'll talk about next so cicd uh if we think about like you know

software development infrastructure is code you're writing tests you're doing lenting and you're making sure that your current version is always in production and we can think about those same concepts for detection logic so cd ensuring that as i as i merge a change into the main branch it's always being pushed into my sim and i always have the most up-to-date version of my logic in my sim and there's no question about it um we can talk we can enforce you know testing uh which we'll talk more about here in a second and then also do some cool stuff with linting so like linting is kind of a easy step to um kind of improve uh pieces of your detection workflow

without doing full-on testing because full-on testing can be difficult to automate or do in like a ci workflow by doing linting so like just an example one of the first use cases that we use for linting detection logic was looking at metadata we used to have like metadata fields in detection logic that populated in tickets and helped track metrics and so we wanted to make sure that those were always there when maybe someone new came onto the team they didn't quite have all the pieces in place and didn't understand exactly what needed to go in there from a metadata standpoint we had no way to technically enforce that but we could write a linter to say hey

as this is going through ci check to make sure that these fields exist or this string exists this metadata is filled out and so you can take that kind of a you can you can run with that pretty far looking for um you know maybe functions that you are expensive in your sim that you don't want your detection logic to be using um even making sure like simple things around metadata like making sure the scheduled search is scheduled on an interval that you expect you can really enforce a lot of technical controls just with linting to make sure that the detection is sort of like structured from a metadata standpoint or even from a logic

standpoint in the way that you want all right so leading into the last piece so i think everyone's been here too you had a pen test or red team exercise they did x you know x action you're like wait a second i wrote a detection on that you know a couple months ago this is this is impossible surely we detected it you didn't and you look and you find out why it's because something was broken right something changed maybe the schema changed maybe there's a change in the environment that you didn't account for um you know there's a number of things that could go wrong there and the goal of test driven development is to

try to reduce the number of occurrences where that happens because that's super demoralizing uh for like responders and defend uh you know detection engineers so the concept of test driven development is uh it's a software engineering concept basically you write your test cases before you write the actual code um and so when i first came across this i was like that's kind of weird how does that apply to threat detection um and if you think about it when you're right when you're building detection logic you're going to look at log events those log events are our test cases so whether that's a log event that you expect to generate an alert or a log event that you don't expect to generate

an alert those are two test cases that you can use going forward i would say the difference at least in my experience in threat detection we take those test cases we build our code or we write our detection and then we kind of like throw them away they're like gone and we just expect it to work forever when that's not necessarily the reality so the goal here of of some of these workflows is to keep those test cases around and keep validating the changes to logic that um you know uh changes in the environment aren't affecting your detection coverage and you're getting the detection coverage that you expect because you worked hard to build it

and you want to maintain that so this is this is really where everything starts to kind of come together and um i will say this this can be difficult there's a couple different ways to go about testing detection and especially trying to do it in an automated way so the first one is statically so this is basically the idea of just taking those log events and passing them through the detection logic and confirming whether or not you know the detection still works so this is good for you know validating the changes you've made are going to break the logic um you know ensuring that you know everything's everything across the board is still working as expected there's a downside

here right because it doesn't account for the environment you're having these sort of like statically defined cases if the environment changes i don't know that in my static test case because it didn't get updated and it might not reflect what's happening in reality so that's where these other two come in so we have sort of like dynamic and then what i'll kind of call continuous but you can kind of almost think of these as the same way just sort of implement it in two different ways so the idea behind and they actually kind of go together to be honest the idea behind dynamic testing is running the search and looking for events looking for true positives and

making sure that we're not getting false positives um and so this would be um basically you're testing this against against real data the environment and then the challenge becomes how do i get those true events right how do i get those events that i know should be triggering the detection how do i produce those there's a variety of frameworks out there so you have like the atomic red team framework you have the miter caldera which does some of this like adversary emulation and trying to tie these pieces together in order to be able to generate live test cases in the environment and then validate that the detection caught it and um like to be to be completely honest this

is something that i wanted to build out and i haven't been able to yet so if you're doing something like this i would love to hear about it but i think that this is the standard that we need to get to when we start testing detection and thinking about how we validate the detection is is functioning um going forward so it's a it's a high in the sky goal but i would love to to do it for real um so yeah some links here around sort of red canary the caldera framework and a couple other things as well as it relates to testing cool did we solve these let's find out we got pretty close

um so like right like version control ci and testing helps a lot with answering these questions uh and as sort of as do agile workflows around processes and product improvement so um cool i mean one thing or a couple things i want to want you to take away from this too is um this is very sort of like environment agnostic uh team agnostic no matter the size of your team you can start to think about implementing some of these workflows if you're not um and kind of like molding them to your environment they don't need to be you know structured in the exact way that i described here but it's all about sort of like the high level concepts and

how do you apply that to how you're building threat detection uh in your environment so cool i'll be around the rest of the day and at the panther table as well so please uh drop by for any any questions you want to talk about detection or anything else i appreciate everyone coming [Applause] thank you again for uh presenting at besides sf uh again kyle bailey and this is from malt ego thank you thank you all for joining us as well thank you

Detection-as-code: Why it works and where to start

Related talks