
hello everyone uh good morning my name is Hara nikar and welcome to bsid Las Vegas uh this talk is on open source giops for detection engineering presented by Zach waserman a few announcements before we begin we would like to thank our sponsors especially our Diamond sponsor adby and our gold sponsor Prisma Cloud blue cat toota it's with their support and other sponsors donors and volunteers that make this event possible these talks are being live streamed so as a courtesy to speaker and audiences we would like to ask you to check that your cell phones are on silent mode and if you have any question please use the audience microphone so that YouTube can hear you as well and with that let's get started please welcome Zach thank you and welcome everyone to the talk so as you heard my name is Zach waserman I'm the CTO and co-founder at Fleet and I'm the co-creator of osquery and on the technical steering committee for that project so a lot of the work that I'll be talking about today ties into the work that I've been doing going back to 2014 on osquery uh I also do lots lots of other things in my free time uh if it were not 105 degrees outside I'd love to be climbing out at Red Rock Canyon for those of you who don't know if you ever come to Vegas and not the summer highly recommend checking it out it's beautiful up there so I want to deconstruct a little bit all of the buzzwords within the title of this talk so starting with detection engineering what's that and and I pulled some quotes that I found from other folks explaining their idea of it and then I'll try to synthesize it into my idea of what we're trying to do here so detection engineering transforms an idea of how to detect a specific condition or activity into a concrete description of how to detect it that's Florian Roth detection Engineers design and build security systems that constantly evolve to defend against current threats and that's Josh day and so to me detection engineering is kind of the next evolution in what we're doing as security Defenders so 10 plus years ago we had anti virus doing really basic stuff like looking for specific file hashes we moved on to EDR and tools that kind of automatically perform detections and then response based on the things that they detect and I think that the future that we're moving towards is detection engineering and incident response as kind of a full-on role a real engineering role within the organization moving beyond typical analysts responding to alerts but really understanding the needs of the organization building detections that meet those needs and then being able to do incident response based on a deep understanding of the particular organization's needs threats and and risks so next we come to G Ops which I also think of as detection as code in this case and this is really adapted from the devops philosophy of configuration as code so something that that folks in the software engineering and the infrastructure world have been doing for 10 to 15 years now but I think it's a bit newer here for those of us in the security world so what is gitops using poll requests in a version control system like git introduces visibility into the deployment process it lets you view and track any changes made to a system it provides an audit Trail and gives you the ability to roll back changes if something breaks that's from Red Hat detection as code is a modern flexible and structured approach to writing detections that applies software engineering best practices to security so again we can take these things that we've been doing with off-the-shelf tools that we've been doing with click Ops if you will clicking around within uis triaging alerts and we can turn it into code and this doesn't necessarily have to mean like everyone has to become a software engineer and be be able to write complete software systems but we can start to specify using text using yaml configuration files using Json maybe writing a bit of python writing a bit of SQL we can specifically encode what it is that we're looking for and use that to be a bit more rigorous about our practices and like mentioned in that one quote it also gives us the ability to to collaborate review the changes that are being made and understand what has changed over time and build that audit Trail and then open source broadly speaking open- Source software licenses make the source code available for use modification and distribution based on agreed upon terms and conditions that's Janelle horcasitas OSS or open- Source software is not only available for anyone to use but also to build on this has resulted in a Global Network of contributors who work together on a project by collectively reviewing testing documenting and patching code and this is Richard stalman some of you may recognize him a real Zealot for open- Source software and and we'll come back to him briefly in a bit here but I think the really important and cool thing about open source is that it provides this opportunity for collaboration we can build tools at one organization which for example you'll see in my story I helped to build osquery at Facebook now meta and that's a tool that we can now take advantage of across organizations and additionally we don't have to just be okay with what comes out of the box we can integrate again deeply into what we understand of the organizations that we're working to protect and I think that it's that flexibility and that freedom that's really exciting about open source and that allows us to be more effective when we try to do detection engineering so now I'm going to talk about the tools in the stack that we're going to be introducing here so in this talk what I want to do is show you one model for how you can put together a framework for doing detection engineering using get Ops and using open- Source tools so something that you could take home today deploy in your organization or in your home lab for free and adapt it to modify to your purposes and best suit your needs so first we've got osquery an osquery allows you to write queries SQL queries so those of you who've worked with databases have seen this before write queries to collect logs on the state of your devices and it supports Mac Linux and windows so by learning one syntax and one tool now you're able to collect logs and and understand the state across likely almost all of the Computing devices that you manage doesn't include mobile yet um and I will actually talk briefly about how there's some Chrome OS support coming through other work that we've done not directly an OS query and again while we talked about detection as code configuration is code I really do think that this allows non-developers to access and aggregate data across all of the different sources on these different operating systems so you don't have to write a whole program if you want to get at a new data source you can use one of the many data sources that's already built into osquery and you just have to learn the little bit of syntax and the little bit of configuration that OS provides in order to start collecting these logs across all the different disperate data sources and osre has been designed from the ground up to have the performance and reliability to deploy across corporate and production infrastructure so today osquery is deployed across millions of endpoints it's integrated into multiple commercial products and it's been deployed across production infrastructure of hundreds of thousands of servers and organizations like Facebook Google Apple so this is a really robust system but it's also really accessible and something that we see folks deploying down to Tiny organizations and again it's free it's open source you can modify it you can use it however you want and you can make it fit your needs if it doesn't yet and there's a community that's supporting it and you're all welcome to join the community and help to support it and this is kind of what it looks like you want to get all the users on the system that you're running on you write a simple SQL query select Star saying give me all the information from the users table and this is going to give you all the users and again this works across all the platforms so whereas on on Mac and Linux you might look at Etsy password to get users and then you'd have to look in some other places to get information about the groups those users are in and and other metadata about the users on Windows You' look in a completely different source with osquery you just have to write this query and you get that kind of normalized across all of the different operating systems that you might be working with and there's a huge huge huge number of data sources something like 300 different tables across all of the operating systems so it means you can get data from Flat files like Etsy host or Etsy cron tab the known host files each of these has its own table in osquery and instead of you having to write that parsing logic again you just write this select star from Etsy host select star from Chron tab select star from known hosts and osquery gives this to you again in that normalized format o cre can open SQL light files which are becoming increasingly popular to use to store configuration about programs on systems it can call a number of system apis um so here's some examples from Macos but there's system apis on Linux Windows pretty much everything you can imagine wanting to know on a system these apis are available and again you don't have to write C code C++ Objective C any of that you just write your query and osquery exposes that information because someone else has already written the C code for interacting with these apis then there's application API so you can get things like status from Docker from carbon black uh and people have written extension tables that interact with things like crowd strike and we can get event based apis so so FS events is used to do file Integrity monitoring on some systems the Linux audit and BPF subsystems are used to do process auditing socket auditing on Windows there's support for etw um and there's file Integrity monitoring and process auditing on Windows as well and again unified interface so there's a lot of code going on under the hood that you do not need to pay attention to in order to use osquery effectively to get at these things you can get metadata about the file system hashes of files the permissions and you can parse various kinds of structured data for example on Mac OS a lot of configurations are stored in pis and this all comes ready to go in O query and again just to underscore it for the 10th or 11th time probably it's all available under the same SQL interface so you don't really have to learn more than just the one interface and then you can look through that entire schema and get see all of the information that you can get across all of these different sources and I showed you a simple example of a query before but because this information is all exposed in this unified interface you can also start to combine data together from the different tables so normally if you wanted to get some information about a process say on Linux you might run PS to see the running processes and then if you want to know about what sockets are open on those running processes then you might run lsof to see the open files and you'd have to probably write a script or something to put that data together and start to filter it and look for what you're used to but noest query you can use again the SQL syntax and obviously this is a little bit more complex but we're saying give us all the information information from the processes running on the system joined with the open sockets for those processes and you and use the the the PID when you're doing that join and specifically we're looking for sshd processes and ones where the the local Port that they have open is not Port 22 so at a high level I would explain this query as find us sshd processes that are running on a non-standard port not the standard Port 22 that we're used to for SSH and so I'm just giving this to you as kind of an illustrative example of how this unified interface allows us to to start to express slightly more complex ideas but again without having to write like long and complex scripts or code like this isn't trivial but this is not particularly hard to learn so next in our tool stack we come to Fleet and the purpose of Fleet in this situation is primarily to deploy and manage osquery at scale so osquery as we just talked about is a piece of software that runs on the individual endpoint that you manage but more than likely you have hundreds thousands tens or hundreds of thousands of endpoints that you're managing and so Fleet helps to get those agents in installed on the endpoints helps to configure them once they're installed and then helps us to get the logs where we want them Fleet also allows us to take those kind of queries that we talked about there run them live so get the results from what from that query right now across all of the different systems that are enrolled Fleet builds a little bit of higher level use cases on top of that like detecting vulnerable software compliance with organization ational policies or policies from actual compliance Frameworks and allows triggering automations like into ticketing systems or through web hooks or that kind of thing and then Fleet can configure the schedule of queries that osquery runs so by writing a bit of yaml so just a structured text file we can tell Fleet have our osquery agents run these particular queries on the schedule that we specify and then osquery will run those queries and send the logs back up to fleet Fleet can then dispatch those logs to whatever logging destination you want these are kind of the most common that we usually see S3 elastic stack Splunk Snowflake and so Fleet makes it pretty easy to to modify the queries that we have running and get the results into the place that we want oh yeah and I said we'd mentioned Chrome OS so osquery itself doesn't support Chrome OS but at Fleet we've built an also opsource Chrome extension that allows you to get the same kind of SQL interface on uh Chrome OS devices so Chromebooks and will then show up in the same dashboard along with the rest of your Mac Linux and windows devices I should say that Fleet is technically you might call it open core uh so part of Fleet is MIT licensed which is true Open Source by Richard stallman's uh definition it is not truly open source through the whole thing because there are features in Fleet that are Enterprise licensed so while stalman is very triggered everything I'm going to be talking about here today and the talk is actually open source and can be used by you for free the code can be inspected and modified and we won't really focus on the Enterprise features at all and here's an example of what things look like in Fleet so in the middle there we see query and this is just an regular osquery query but we can also store all of this other metadata about the query uh Fleet will help us see which platforms it's compatible with it will allow us to set permissions on who can run the queries and then it will allow us to save and schedule those queries and we talked about yaml configuration as code so here's a somewhat equivalent example of of what it looks like to schedule a query using configuration as code so we say that you know this is a query that we're trying to schedule we can put a name in description we put the actual SQL for the query and then we can set things like the interval that we want the query to run on and how we want the logging to take place so we can see how this starts to be a building block for doing detections as code and the last tool that we'll talk about in this stack is called matano and matano is a security data lake so it's comparable with something like a a snowflake or a Splunk and primarily it's designed to ingest data from S3 it can also automatically ingest from a number of data sources within AWS and uh GitHub and a number of other tools but for our case it's really ingesting the data from S3 cuz as we talked about osquery and Fleet can write the data into S3 so that's our connection Point here and matano stores the logs in a structured format and then it allows us to write detections as code and we'll see in a minute a little bit what this looks like and we said python here so we're talking about real code in this case but again don't worry this is pretty simple like you don't have to be a software engineer to apply these Concepts and matano is built on AWS serverless Technologies so you deploy yourself but it it really is pretty closely tied with AWS and matano again is is truly open- Source Apache 2 license so you can use it modify it and deploy it however you want so this is what a detection looks like in matano so it is code but it's pretty simple essentially we just write a function that does a Boolean evaluation and it can be arbitrarily complex the function takes a record or basically a log line as an input and we say whether that log line should generate a detection or generate an alert and so in this case this is an example from their documentation so looking at zek logs so Network logs you can imagine how this could apply to pretty much any kind of logging that you might pull in and you can see again how this is encoded as text and this could be put into a source control system and allow us to do get Ops and detections as code and then matano also provides built-in alerting capabilities right now that's primarily slack and these alerts that are generated can be configured again through code to include any of the relevant information that you might want your detection Engineers or your response Engineers or your security analysts to have access to when triaging the alerts so I'm going to talk a bit about the data flow of how these tools come together and I think we've seen some hints as to how that will work based on the way these interfaces work but we'll go through it a bit more concretely here so Fleet configures osquery with a schedule of queries so as the administrator as the as the blue teamer the defender I go to Fleet maybe in the UI but probably in this case since we're talking about gitops through GitHub gitlab wherever we're storing our configurations and I'm modifying that schedule of queries Fleet then sends those queries up to OS query which becomes responsible on each individual endpoint for running the queries on the schedule and then osquery sends the results that are generated back to fleet Fleet then pushes those query results into S3 and this is done typically via AWS fire hose which makes it very easy to get the results into S3 and as I said matano then knows how to ingest logs from S3 and run detections on the logs that are coming in so that's the basic data flow of how this system works and so then how does it correspond with the gitops workflow that we're trying to get to so again the the user the administrator will update that yaml to create or modify a query so that we're pulling data the initial Telemetry in that we're going to be building our detections on then we would update the matano detection yaml to create and modify the detection and it's not just the yaml but it's also that python function I'll show more about what the yaml looks like as we go through this and then you'd commit and push changes into your git repo for most folks this is probably a pull request on GitHub and then we'd have appear another person on the team review and approve the changes and this this I think is an important part of doing like a real get Ops and detection as code process because this is where we're generating the audit log this is where the author of these things explains like why are they making the changes that they're making this is where someone else on the team verifies that this is in