Connect: How to build your own SIEM with open source tools and methodologies

Name: Connect: How to build your own SIEM with open source tools and methodologies
Uploaded: 2020-04-17
Duration: 30 min 34 s
Description: A walkthrough of building a SIEM from open-source components — Beats, Kafka, Logstash, Suricata/Zeek, Wazuh, Sigma, and the ELK stack — organized around a funnel that filters noisy logs into actionable leads. The talk maps common detection use cases to the MITRE ATT&CK framework and addresses the th

BSides Atlanta · 202030:34393 viewsPublished 2020-04Watch on YouTube ↗

Tags

CategoryTechnical Tooling

TopicDetection Engineering Threat Hunting Tooling

TeamBlue

StyleTalk

About this talk

A walkthrough of building a SIEM from open-source components — Beats, Kafka, Logstash, Suricata/Zeek, Wazuh, Sigma, and the ELK stack — organized around a funnel that filters noisy logs into actionable leads. The talk maps common detection use cases to the MITRE ATT&CK framework and addresses the three perennial SIEM problems: inconsistent field terminology, log noise, and the difficulty of stitching single events into attack stories.

Show original YouTube description

Ever asked a vendor for alert rules or techniques to catch the bad guys only to be told: “every organization is unique”? While there is some truth to that, there also are a bunch of techniques that can be used across any organization based on common attack methodologies. In this talk I will discuss how to abstract some of the common event logs from your network, hosts and security devices into the MITRE ATT&CK framework and make sense of the noise. Defenders can use it to identify the spectrum of techniques that an attacker may exhibit, then look across their processes and controls to identify gaps in detection and prevention coverage. Presentation details: The SIEM problem: 1) Data that gets loaded to the SIEM is what gives it value. Show as example, a Windows system noisy log and the small events that typically matter. 2) A SIEM can’t automate information security domain expertise. What if we map the application of your logs to your specific needs? Example - lateral movement based on IDS logs. 3) One of the most common failings I have seen is a SIEM overstuffed with useless data. What if we can surface only the high risk threats to the user? The Abstract solution: 4) Introducing MITRE ATT&CK - Overview of the open source framework model of attacker methodologies. 5) Open source SIEM - Using open source for ELK such as Logstash / Beats blugins can help you build your own SIEM with much lower cost. Conclusion: 6) open source ATT&CK + open source SIEM are happy marriage! Examples of making sense of logs on ELK based on ATT&CK framework 7) Summary - Your security team don't need to be “master defenders” to properly implement SIEM and see positive results.

Show transcript [en]

okay great so thanks first of all for having given me my name is nearly OSHA actually from New York so I saved a flight to Atlanta thanks to a coronavirus and I'm going to talk about open source solutions for seams I work for a company named empower which is the same provider that allows a kind of a top-down approach for seams that's why you see the pyramid out here and empower is also contributing open-source pieces of their software to to the community but this is going to be a real firehose talk so feel free to contact me for any questions later on this is my twitter handle and my name in connection so just a little bit about myself

I'm originally from Israel born and raised I started my career in the Israeli intelligence Court and I moved here to the US around 20 years ago actually I live in New York I have three kids make sure I'm not going to jump into the talk now so I locked the door and I worked for multiple vendors in threat intelligence and in identity management I'm the photo here you see by the way is me in my first beside dressed up as a detective a little weird like the idea was that the talk is around indicators of compromised and threatened investigation so that's a little bit about me but enough about me let's talk about our agenda so seam is

not a new thing I'll discuss the current historical challenges with seams and then how I believe we can solve it with open source but there are a lot of open source tools here so bear in mind that I'm not going to get to all the details and we'll go through the tools via a turn that I'd like to call Sims panel which is basically tools that helping to reduce the number of false positives and noise until you really get into a manageable number of leads that can be investigated because as you will see later on that's the number one issue it seems today so I don't think I need to introduce steam to a new one of you but I just put down

here a list of the most common use cases when operating scenes seems becoming more and more involved with not only detection but also prevention data threat hunting is something that they are still vendors out there that using specifically for threat hunting but there are more and more threat hunters that using the scene for that and so that's that's the the main purpose of giving a scene so let's talk about the three main issues which seems and the three ones that I see in Oliver is around the fields normalization or terminology of fields coming into the scene the noise that I mentioned earlier and the fact that it's really hard to understand what is a real true positives

and you can see here this is a lot of cases where people either are getting false positives or false negatives and missing real alerts so number one issue is that there is no standard for seams as of yet each vendor is coming up with their own proprietary fields and schema for data a simple example is using IP addresses different vendors will call an IP address in different fields and what that brings when when you look at him to see me a list of name of fields that you cannot really cross correlate between because from a database perspective they don't seems to be the same identity or the same entity and that's true not only for

specific fields like IP addresses or URLs but also for names were malware's and for adversaries vendors keep on using their own terminologies and does not necessarily sync the second challenge around seemed is all around the noise and the fundamental issue is that log files are not necessarily sending events that are left the events that going into the scene could be just operational events some of the information is coming in from detection tools that are not adjusted to your environment and so there's a lot of false positives we see this especially with IDS's and windows locks and a lot of time there's no fine-tuning of those alerts coming into the sea and the last challenge that relates to the same is

that no single event really tells the story right you cannot really identify what is the threat or the attack progress from a single event you need to look at more data coming in for more sources because every attack whether it goes directly with the cyber kill chain progress or other progress involves multiple sensors within your environment and understanding the entire picture requires putting all those pieces of puzzle into one story so how simp ever solved the issue well the issue of terminology needs to be solved by normalization so most vendors just take different type of entities and try to normal them into some common schema in addition there are rules so the main challenge is

with notices that there's a continuous need to update the rules which takes a lot of resources either from the vendor as a professional services or from the security operation guys and then finally to understand the story there should be some kind of an enrichment of the data a lot of the time the raw data really doesn't help understand the exact source of the address what advocating falls etcetera and obviously machine learning by the way all of those things are available today in open source tools so if you don't have the same which a lot of small and medium companies are in that situation and you need to be able to see them either for compliance

purposes or because of the use case as I mentioned earlier there's couple of things that you need to start doing first so the first thing to do is just look at your current blog types specifically when you look at your environment seem should have at least one type of network detection tool and one type of endpoint protection tool depending on your environment you might want to add or also wireless type of logging or cloud your operation or business operation also depends in in the cloud activities which more and more providing a tree aizen and hooks for seams today and one additional thing to look at is the gap analysis of okay where am I missing data where logs are

not available either due to a licenses from the vendor or just because I've done I don't have those detection devices in my environment so once we have this portion of logs that we can work with the second decision is which works what we want to go with there's no right or wrong way to go here it's really depending on your team and the process you're using today for detection and mitigation but generally you can divide it into two types right you can decide that you send all the information directly to the seam and then do the filtering and the rules in the alerts from there or you can have some intermediate solution that does some of

the filtering for you and I'll speak later on about those tools and if you would like to use them they're going to reduce first of all the total storage that eventually the seam will use but also they will improve the experience of the security analyst when they interact with the scene and then the last thing to consider is data enrichment so when I'm talking about data enrichment if you're looking at identities like users very often Active Directory can help with adding metadata to users such as their department their specific role the user privileges and so on for looking at assets like servers or hosts it would be great if you will add also some context

around which department this asset is part of how critical it is to the system which one are your crown jewels and which one you know just the one server sitting in QA that has no access to the Internet so this is how I look at when I'm thinking of building a seam okay the look is through a funnel and actually this one is taking out of a post from my Spector ops by Jared Adkins on you can see the link down here and the idea is that when we build the same we need to go through this filtering process the main idea is that we don't want to clog the funnel we don't want to end up with alerts that

cannot be handled by the amount of resources that we have in our group and so we're kind of going to go one by one in through those steps and see which tools can help us in each step so of course sim has no meaning if there's no data there so we're starting with collecting the data the one that I highlighted here in red are the one that I have experience with beets if you haven't heard of peach this is actually part of the elastic stack which I'll speak to a little bit more and helps to collect a lot of various types of data sources from operating systems to network to to security devices Kafka is

critical especially when you deal with high volume data with the high throughput and low latency and it acts as a buffer so if any of the pipeline's is down you don't lose data and then log stash is really coming in to be instrumental when you try to parse the laws and solve this normalization challenge that I mentioned earlier right the challenge of talking in different languages so you can see here's some other ones but I'm going to focus on those three today now if you're missing some of the detection tools on your either endpoints or networks you don't have to run and look for vendor to buy them they are great open source solutions they're both for network

intrusion detection and hosting through the detections I'm sure you're familiar with them the nice thing about them is that they have a great community behind them Zeke some worthy out-of-the-box tools that supports up to layer seven of network analysis and Sarika does not also have very similar format so if you used to one of them you can move between the two pretty easily on the endpoint side there are less enterprises there that I see implementing open sources just because it it's much more risky and they rather work with the vendor that has accountability on a piece of software they install on an endpoint but we do see more and more customers using wazoo in the production line so if you haven't

heard of wazoo this was a project forth out of OS X which provide the great endpoint detection and again the idea here is that they integrate nicely with the other tools that I mentioned earlier post the beats and lennier on the log stash which actually going to be what I'm going to talk about as well so my company mPower released an open-source version of log stash parsers the idea behind it is to solve this problem that I mentioned earlier with not using the same terminology and at the same time to reduce some of the noises even before the events going in into the sea so log stash you can see some leaks here if you're not familiar with is very

flexible framework that is not necessarily getting you locked in to any vendor it's an open source it has a great parsing capabilities you can look it up it's called the grog filter and it is very much compatible with the elastic comment schema which is the elastic search version of standard field names in order to be able to later on cross-correlate events coming in from multiple another thing you can do with lobsters is you can in reach some of the field so simple example would be if you have an IP address you can use you can use log stash to enrich the data for geolocation so application into the IP address you can also enrich using reputation so if

you have your L you can and reach it and find out even before it goes to the scene whether this URL is malicious or not so this is how it works an example of log stash each log stash consists of pipeline and you can communicate between black pipe lines you can concatenate pipe lines each pipe line has three three parts one is the input plug in and locks lock star supports multiple formats whether it's a syslog see ya but files what have you so once you ingest the data you can filter it X this is where you do the enrichment and the parsing and finally you can either send it to a database or you can send it to

another log stash if you want to change few logs together or create the pipeline to pipeline type of architecture okay so not sure how many of you have heard of Sigma but Sigma is a little bit different that is not a open source code it is an open source standard and the idea behind Sigma is to allow you to get out of the box the list of rules that already will helps you find threads so Sigma is basically helping you both to maintain the threads but also to avoid vendor lock-in so if you're moving into another vendor the rules in Sigma will still stay and will be able to work on any senior so how it works

ah there is a sigma rule creation process there's a huge community that already contributed a lot of things that helps you identify very common techniques like lateral movement privilege escalation initial intrusions and so on you can create also your own rule if you have specific rule that is not part of the community then depending on the sim that you use you can convert the rule the rule to your team you would be able to add custom fields that are relevant only to your company and then you would be able to hook it into a search query and start sending alerts to incident response and security operation teams if for any reason you want to move

to a another thing you simply need to use another converter but you don't lose the original rules created and maintained by Sigma ok so just kind of a recap to spoke about how we getting the logs shipped into locks - how locks - can help us with parsing and how Sigma can help us with creating those rules all open source and now we're looking at the other side of the open source which is the storage and search capabilities and so one of the solutions you can look at and the one we have experience with is the elastic search or the elk stock elastic search is basically an open source for search and storage it's not

originally meant to be a seam but it is highly used in the security community as a seam because of three main reasons first it's very scalable you can really handle huge amount of data and throughput use it using elastic search cluster second it's very flexible there's no specific schema that you're locked in it's you it's called a schema on edges which means that once the data is ingested it's automatically parsed into a JSON format that makes it very flexible for search and lastly it's very quick so one of the challenges which seems today is that there's a lot of overhead when reading information and the more data you add slow it gets to work with a seam you can

you can look at queries that taking 15 minutes in 20 you can go get a coffee break and it's still looking for your data and obviously this is not practical for soft and definitely not for Incident Response just mentioning few other seams that are open source and out there one coming out of alien vault this is our steam it's getting a very good feedback from the field I personally don't have much experience with it but what I like about what I hear is that it also has asset discovery and vulnerability in it so it really can plug it into your environment and will tweak find out what your infrastructure looks like and then obviously this is helped with finding

relevant threats within your routine you see I don't believe from what I hear it is as flexible as the Cabana user interface which is part of the elastic stack then one more open source that is used out there coming out of Apache this is basically a combination of six open sources that having one interface what I like specifically about that is that a it supports the elasticsearch database so you're not losing the flexibility and the speed if using Apache Metron with elasticsearch database and the other one is that they have a specific model for machine learning algorithms if you are one of those teams that having the approach towards scientific studying of of your data this is something that can

help so this is how the elastic stacks looks like just to kind of recap we're looking at low carriers all kind of beats some are for Windows other are for Linux for network etc and then you can either send the data directly to elasticsearch as a raw data or you can send it through lots - first for enhancement and parsing and then eventually they can end up at the elastic search for search and reporting and you can access the data via Capanna for visualization now a lot of the themes today are starting to move into not being only a log educator our threatening social but also an incident response motion so I figured I'll add

also some of the open source tools that related to Incident Response and one which is really standing out is the hive beehive project again very much smooth integration with elastic search so all those open sources are playing nice together and the idea behind it like any Incident Response ticketing system is to be able to collaborate within the team and create tasks and enrichment using the integration that we have has both with elastic search but also with mist which I'll talk about next so mist another open source open source solution stands for Maori information sharing platform but it's much more than that this is a literally an open source solution for threat intelligence platforms the idea behind it is to get

feeds from the community with the indicators of compromised malicious IP is URL hash values and then easily try to cross correlate them with those indicators coming in from see and then if there's a match obviously you want this one - to get the protein first in the list for your security analyst to work on and then the last open-source tool I want to speak about when when it comes to instant response is cortex this is actually the second half of the hive project it supports the analysis of those IO sees this is the kind of the brain of the analysis part of what coming in from beehive ticketing from one hand and what coming in from your local actors on the

other hand and so encourage all of you guys to look at the highest project and see if it's relevant for you so I came up with this kind of final diagram that shows all or at least some of the pieces that I've spoken so if you want to build your scene totally from open-source it's definitely possible you can look at lock aggregators like log stash that can help you put posting sending them to elastic search if you missing any of your detection tools you would be able to use them as well in this example of a saw and then eventually this can be integrated into the hive generating tickets and if there is any match with

known iOS's you can immediately trigger alerts so we cannot really have a security topic without mentioning mitre attack framework it's great framework that a lot of seams and other tools in market are adjusting to and one of the things you can do with your open seam is doing the same thing you can use it for both the red team or blue team exercise I have here this is actually an example of the atomic red which is a cannery project that helps you test your existing environment with those techniques and techniques that are I mentioning within the mitre and that framework and then you can identify where are your gaps right which which of the testings past and which are failed

but you can also use it within the same and see within your detections where are the detections are where are the specific techniques whether they're more on the discovery side the persistence the command-and-control communication what-have-you and this way you can continuously improve your security posture by adjusting your security tools to to those red spots places that you're not doing the greatest job on um like I mentioned earlier seems becoming more and more of a threat hunting tool there are separate training out there that can help you with how to use elasticsearch in varna for that purpose but in addition to that you should take into consideration tools like the Montego cuckoo and other open source tools that

can help you together with the seam to figure out a little bit more with the threats are another proactive tools you can use again from a from the data analytics part there's a great tool from David Bianco which is a great contributor to the security community it's called the hunter my turn release the great tool that helps you with the process of analyzing attacks again helps you with understanding which log file can help you identify which threats they might have that framework and then I added here another link that has some great tools it's called awesome machine learning for cyber security and that's what's exactly it's really great bunch of tools over there and if you want to

go through the situational awareness you can work more with tools like Rita Yara and the diamond model again were short in time now but if you are curious who want to learn more about them feel free to reach out so what I didn't cover in this call obviously talking about building your own team cannot be a half an hour talk there's a lot of other things to consider I'm not even getting into the architecture and the storage calculation and the bandwidth usage within the network all of those should be taken but I think the bottom line is that you can build your own seam with open source tools and there are great support coming in from the community and

you can actually expand it to be not only your scene but also your detection tool and your threatening the attack framework so I think I'm over with my time it's just one thank you and I don't know if we have questions or not time for questions but if we do I'll be happy to answer them yeah so um payoneer thank you we really appreciate you presenting today I know there was a lot of good information about open source Tim um so there's some questions in slack on track neck most everybody's asking for a copy of the slides so you know if you want to yeah if you want a PDF um and then you know maybe make them available to

everybody that would be very helpful otherwise we really appreciate your time today and speaking that besides Atlanta

Connect: How to build your own SIEM with open source tools and methodologies

Related talks