Collect All the Data; Protect All the Things

Name: Collect All the Data; Protect All the Things
Uploaded: 2019-03-18
Duration: 26 min 46 s
Description: Blue-team operations require collection and analysis of diverse data streams to detect advanced threats before they cause damage. This talk covers practical methods for gathering network, application, and host-based telemetry; correlating signals using behavioral analysis and machine learning; and s

BSidesSF · 201926:46176 viewsPublished 2019-03Watch on YouTube ↗

Speakers

Aaron Rosenmund

Tags

CategoryTechnical

TopicDetection Engineering Threat Intel

TeamBlue

StyleTalk

About this talk

Blue-team operations require collection and analysis of diverse data streams to detect advanced threats before they cause damage. This talk covers practical methods for gathering network, application, and host-based telemetry; correlating signals using behavioral analysis and machine learning; and surfacing actionable alerts through workflow management. All techniques and tools discussed are open source.

Show original YouTube description

"Blue teaming has not, up until this point, received the same applause and attention that red teaming has, but the tide is changing. The realization that the charge to ""protect all the things, all the time"" requires the collection and analysis of all the data is creating the conditions to ""bring the sexy"" to the blue team. This talk covers the application of different methods to collect, analyze, and correlate multiple types of data as well as the use of machine learning to generate behavioral anomalies that are incorporated into overall continuous monitoring capabilities. This is not a vendor talk, and with very few exceptions all methods and tools discussed are open source and free; the focus is on the application of concepts. "

Show transcript [en]

oh yeah there we go hey thanks for the introduction yeah collecting all the data in particular things is not as exciting as drinking on the booze and hacking all the things but someone has to be responsible and so that falls to the blue team right so my names are in Rosenman I work for Pluralsight developing cybersecurity training thank you yeah we make courses and do research and I work solely within Incident Response and security operations part-time I work for the Florida International guard where I work in defense of cyber operations essentially on really small man socks that have to use a lot of automation which is kind of what has created this brief for me right

so according the instructions you can do the questions SLI do and there's the key word there and I'm gonna have to go a little fast on this brief because it was originally an hour long brief an ounce 30 minutes so if you want to go ahead and download it from github don't worry you don't need to take notes so you should be good cool so we're here to talk about the problem problems advanced threats the easy stuff like the script kiddies and the hacktivists and all that can get caught because we already know what they're gonna do we have signatures developed to catch that right but the advanced threats are burned zero days and using things that we don't

understand yet and we have to develop methods to find that before it's too late and really too late is just we have to catch it at all because it's likely on all of our networks we just don't know about it yet and 146 days I think is the current discovery time right we need to get that down and the way to do that is to collect analyze all of the data and that sounds like a lot to do but we're gonna talk about how we do that and once we have all that data and all that analysis we have to develop some sort of method of workflow management so we can task our guys in our Security Operations

Center to be able to understand that data and make it actionable so let's set the scene a little bit this is definitely not a talk about Stuxnet but Stuxnet is still really good example for something that did not come over the network right it was just some dude we thought you had to be on say on a USB plugged it into a laptop and then all the sudden centrifuges start blowing up I'm pretty sure that's a little simplified but that's the idea right and imagine for a second that you're the guy in charge of security for this so allegedly just put allegedly in front of everything or saying allegedly has happened three times and third time it

happened ac/dc thunderstruck played over the speakers so your gun start in charge of security and a facility like this you walk in grab some coffee go talk to Jeff down on the floor and all sudden you start hearing some American rock music player of the speakers and you're like oh oh no and then centrifuges start blowing up left to right boom boom and you're like oh man that's really dramatic and the first thing you think or these first thing I would think is like oh you guys that was impressive right then the second thing you're gonna think is I'm gonna start prepping my resume and what did I miss and that's what we're going to talk

about here okay so Kona physics for security what you missed was all the stuff you weren't looking at so you know maybe Palo Alto or Cisco a whole bunch of money and you got your network security set up and maybe out the second time it happened you got a IDs going and you're like man I am super good but I didn't come in that way you're kind like a chicken nugget right you're crunchy on the outside soft in the inside and you're not looking at all the other stuff and unlike quantum physics what you weren't looking at wasn't undetermined is actually the cat was dead so there's a little quantum physics joke for you all right and the stuff you

need to be looking at other than network is everything else the application data the machine data in the point OS so you can break this up however you want but we're gonna kind of go over what all this means from my perspective and how you can identify and then analyze data here not just from a signature perspective but also from a perspective of behavior so instead of developing tens of thousands of signatures try to understand a whole bunch of trend attacks you can instead make one behavioral script that's like hey if this looks bad then it probably is bad and I can use less CPU cycles to identify that cool we're gonna start with the network because it's probably

where every one of you has started and if you have any security at all its start at the network you're that chicken nugget hopefully you can move some inspection the network the internally so you're looking over your core over your servers or if you're in the cloud you're inspecting traffic is happening internal to your cloud environment and if you have not done that here's some open source stuff you can use to start doing it the first thing you do is collect the data right and you can do that with things like nest F&G google scenographer all right and use be a friend kind of manage some of that memory and the analysis for open source projects really

falls into two categories that are leading here and it's snore and it's forks Arcada the main difference is multi-threading so it's really the most popular one and there's a cool little project called squeal that helps you visualize that and here's what that looks like and that's really cool we have signatures these signatures are looking have rules that look at the hex data coming inside of a packet and they want to match this very specific pattern right this is the updated snort rule set that you can get with a snort code and there's a few scans going on here so just to set the frame within this time frame we had some in map scans going and

in map scans were scanning from 0 to 65535 so a total of 65,000 ish ports were hit but only detected were like 5 right and that's kind of the standard here and that was a TCP scan but also what was happening was a UDP scan in a PowerShell internal scan no that was caught whatsoever and the reason is because that rule if you look at it saying that if there are five sinful acts within a minute to a very specific port then I want you to alert on that but everything else got missed right now what you also have is it did detect that was in map and that seems really cool like oh man you know start figured out

the in map was scanning me until you actually look at how it did it and what it did was say hey if you see some of the capital letter C look deeper in that packet and see if there's a whole bunch more of the capital letter C and if there are then that's probably a in map skin and that's really cool until you just like change the letter C to something else and it doesn't see anything at all right where you use mask scan or a map or Yusuke P and do whatever you want that will bypass all this and if I'm a smart attacker which hopefully that's what we're concerned about are these smart attackers I'm

gonna run my attacks against this signature database and make sure it doesn't pop up so the next step is to move into behavior analysis favor analysis starts with the sessions to do session analysis there's a couple options NetFlow v9 is kind of the standard is cisco proprietary you can get that from your switches if you have a license and you have something to actually ingest it but you don't have to you can also use something like bro it's now called Zeke but I don't like change so I'm leaving it that way or you can use my lock and it helps you take those sessions and visualize it because that's what us humans are really good at we're

good at analyzing patterns visually right so within growth in these sessions were really concerned about is looking at the IP 5-tuple right so it's the source and destination port and IP as well as a protocol bro does this as a script language takes that turns it into a programmable object and allows you to create scripts that analyze the behavior of the different attributes of that object so let's look at that same data set with the scans and see what we can identify so now we're looking at condition attempts as an object and not just to any specific port just in general kind of internally to our our environment and what we're doing is kind of the same thing as saying if there's

any connection attempts at all that are failed right so this TCP timeout or a reset was sent because that's not normal application behavior that's weird application that works inside your environment should be failing all right so that's what you're looking for here and sure enough we found them and so that that 10 102 4.3 is actually the PowerShell internal scanning which I haven't found anything that will actually detect that and so I use that a lot in the red team so now let's look at what we're good at with our humans big brains we can look at visual data identify patterns so malach is looking at some session data here and it doesn't look that exciting

but that's because we're looking at sessions over time let's now change that visual data and take a look at the number of packets so we have red going out blue coming in it's kind of cool but there's nothing that really pops out of you it's very consistent still if we next look at the amount of data bytes going in and out then we see something a little bit more concerning right so over time date abouts data bytes going out pretty low data bytes coming in or high and that's kind of normal but you can quickly identify with your human eye that there's something weird going on there and that is actually identified HTTP exfiltration and so without having

to offloading we can use our big brains and identify some exploitation that's encrypted and not be concerned that we're missing stuff that we're not unencrypted cool so here's another little capability I referenced the elk stack a lot and we'll go over that a little bit more later the school capability is called graph it uses graph methodology to look at records and this kind of the same session data from a instant response that was done and this session data has connections that were interesting right so we're going through or finding IPS that we think might be related to an intrusion and we kind of put that IP into the dataset here and ask it to look for stuff that's related

we kind of had idea there was some different agents doing a the HTTP exfiltration as well as the D command-and-control and sure enough this kind of pops up and validates that that those IPS were connected we're kind of really on the right track so it's just so cool capability to understand they're cool so the next step is machine data so it got past the network now we're gonna actually look at those devices that are on our network and what they're doing locally when I'm talking about is window systems and Linux systems and those local logs like security logs off logs syslog Cisco endpoints SNMP traps anything you can get your hypervisor also has logs maybe importing that as

well it's important right and you're looking for user activity that's local auto area service failures that can indicate you know some sort of nomally that you can correlate with something else and just strange Hardware activity and this is actually we can get into some supply chain interdiction detection capability right so the first thing is looking at the auth log for linux so this was a web server and this is SSH SSH logging right so people are logging in using different usernames the first thing you recognize is that there's a really high instance of failed logins and that's interesting but without even understanding the frequency of that you can just look at who's logging in are

you allowing root to log in as SSH he shouldn't be but if you are then fine but you're also do you have a whole bunch of usernames that are very similar to each other probably not right so some of these usernames if someone logs in the username that actually is not on the box then that's weird you need to kind of interrogate that so and then further past this that brief force happened you can look and see if what happened on the box so were there any new usernames in this case there was around the same time period so now you have a little bit more clue to figure out that maybe that brute force was

successful and now we've made a new username and if you look at the history I'm sure you'll see that it was added to the pseudo group and that that intrusion was successful you can do the same thing with Windows security locks and import those and what we've done next is import the domain security locks from an environment right and we look at that we can do things like enable the auditing for success and failures for SMB logins on the domain so let's do like a shared drive or something like that and the first thing you recognize is that it's kind of weird there's an increase in volume of just logon activity and general that's great we're kind of

recognizing that with our human brains again here but what's really weird is when you look at success for sale years of logins the it's not weird that someone logged in wrong right because I do that all the time what's weird is when the ratio of logins to our successful logins to failed logins is is different than it would be over time normally and you can you do that by looking at large data sets of users logging in over time and then we see that spike so that blue is a very high increase in the number of failed logins and that's bad yeah so that's cool that we did it with our human brains really good at pattern analysis but what we

need to do is figure out how the computer can do that for us and then alert us on it and here in Cabana we have a machine learning capability that actually is paid but you can use Apache spark to do kind of the same thing right hey I agrees that spike is bad it's pretty much what's doing here and it's not really AI it's machine learning it's still not even really machine learning it's more like statistical regression analysis so it's just saying over time your standard deviation of what normally happens with logins is different right but now you can drill down a little bit do something called population analysis and when that population is here is the

users and you say hey within this population which user is acting weird compared to the other users and here it caught Steve Rogers logging in which is weird for a whole bunch of reasons and the main reason is that a hundred year old man is using THC hydro right cool so the next step is we can use that same graph capability and I think this is cool here because normally you know stop really exciting to see someone use a graph capability on with those connections in the nodes with network traffic but what is kind of cool is you can start mapping information from the security domain logs you can map IP domain names and even better users to IP

the domain names right so we're starting to see these connections kind of different way we're visualizing this data in a way can make certain things pop out that's interesting so if you see a user plugging into multiple devices or a IP change on a domain controller or a domain name then that's something that's weird and you can interrogate further so let's talk about telemetry see train telemetry streams so all of your machines on your network or mate run electricity and they have boards and all boards of components and all these components have data that you can analyze so things like bandwidth use and fan speeds and I ops and CPU usage and really voltage behavior is pretty

interesting as well and that's kind of what this looks like and use metric beats and import this data from you know Linux or Windows boxes of like and you get information about the CPU and memory usage and that kind of thing and over when you import that over your entire enterprise you start to get a picture of what that should look like normally so here if you drill down to a individual device so you can see at a certain point that the RAM was at full usage and at the same time the power shell was the top CPU and memory user right so maybe that's something interesting when you look at but the really exciting part is

when we start trying to figure out how to catch something like supply chain interdiction because that's really hard to do so here this is a custom application that I built for a Pluralsight course so don't make fun of me I'm not a web developer but the main point is that from my es6 I I was able to pool the wattage information over time right and that's great but how do you use that to catch supply chain reaction well the first step is that you have to have kind of quite a few devices that you're using so whether that switches or servers or even uh you know endpoints like a like a Dell desktop right and then you need to pull that

voltage information but the first thing you do is get it from different distributors because what generally happens is a distributor will team up with some factory that will team up with an advanced threat that will then kind of conspire to sell you something that has a compromised motherboard right that compromised motherboard with different components is theoretically gonna use electricity differently that will be reflected in the way that that voltage is concerned or the voltage displays so if you can source that from different distributors and then tag that information as it comes in to your ingest and then look for different voltage activity between distributors you can start to identify anomalies and that's great it's still not really

identifying supply chain area diction but when you map that to odd network behavior now you're starting to get a lot closer so near and dear to my heart is endpoint analysis but really it can be very sim right so some of the things you can do to get started with this as you system on right this Mons pretty great it essentially adds an extra blog to your windows boxes that you can pull different usable security information out of and swift on security has a great config for that you can to config to make it the most usable and so Taylor Swift in your off time has made that for us and you know that's really nice of

her to do so the next thing I have here is PowerShell and it works on Windows and Linux now which is great I love PowerShell but we'll get into why this is a great tool for security and in the next slide spring OS query for an agent open source agent capability new load o is creating box it turns the OS and to aberrational database and you can query those relational databases what's really cool about as it works on Macs Linux and Windows and something with the honorable mention is an NS 8 tool called grass marlin you can download in they've made available and github and it's mainly used for SCADA but what it actually does

is passively listen to the network and builds an asset list of everything that identifies on that network now I bring up asset list under endpoint analysis because the first thing you need to do is understand what assets are actually on your network right so talking about PowerShell we can use this in a bunch of different ways the most important part about this is you don't have to be an admin on a box to use PowerShell so you as a user you can start you can go back to your environment right now run some scripts and start to do some asset interrogation started to do some security of PowerShell without having any rights whatsoever on the network you

can do things like file hash analysis you can pull registry key values you can do asset identification you can do scans and that makes it a really powerful tool right there's other tools tool as well so I have the DCO there and inside there's a script called power scan we'll get that next but PS recon empower forensics also can pull a whole bunch of usable forensic data from any device on your network so that power scan tool essentially makes network connections or test network connections to a subnet so if you've done your asset and configuration and configuration management properly the first thing that this does is identify what boxes are responding on that network so if

something is responding that is on your asset list easy kill right the next thing it does is we can identify here that there's a TTL difference within that subnet so that's something supposed to be mainly a window subnet and now we have a 64 TTL what does that mean it means it's likely a Linux box right so someone actually live boot Callie here and we found some insider threat this is a real network I was actually just testing this to see if it work and then sure enough we found this right the next thing you do is look for kind of different configurations just on basic ports and here out of all the boxes I responded it's got a 128 TTL

but HTTP is open which is kind of odd so that's a strange configuration now I bring a post query I just do like one example port per section I bring up post query again here because it's agent-based right and if we're pulling data from that OS it can lie to us so if that OS is saying that it's not listening on port and then we scan it with the previous script and is listening on a port that likely means that there's a rootkit on there that's making the OS lie to us and that's also an interesting metric we can use to identify security anomaly great so application data what is that that is actually everything else so everything

that we use to actually operate on the Internet is included in application logs and data right so I include DNS and DHCP in there that's also Apache and SharePoint any web stuff that you use so those logs that come from those are separate from the logs that we normally get from machine data and the same idea is going to apply we're impart some of logstash we're going to throw them into elastic and they're going to visualize them with kibana now let's focus on DNS if you look at your DNS entries and you see this right there's some weird strings there and we can identify that because we understand that those aren't normal words and this

shouldn't be domain names that are being queried so how do you make a machine do that well you can use entropy calculation to calculate the randomness of those strings and make a value out of that now I'm not saying that that's gonna immediately give you good information you're not the baseline your environment but what you can do is once you've based on your environment for a number say 4.3 you can say well anything 10% higher than that's probably something very random for our environment and very well could be command trol and when you do that you can get an alert that gives you only the Dinos command control that you're looking for so this is a quick way to kind of

evaluate whether a DNS query is good or bad so the next thing we're looking at is protocol metadata that includes quite a few things one of the notable entries here is geoip right like a user can't be in one location simultaneously logged in from another location on the globe but what I want to focus on is the ja3 hashes right so these j-3 hashes are going to look at encrypted traffic when we look at encrypted traffic we can't see into it but what we can see is the initial TLS handshake and when that handshake occurs cipher suites are exchanged and those are unique to the executables that are creating that connection so the cipher suites are then turned into a hash and

that's what this j3ster is right so j-3 here looks like a lot of unusable data but we look at that over three hours of 16 users office activity we can see that there's not really a lot of hashes created right and let's use a know a bunch of different applications even better three of those were only used one times this makes us really able to do a whitelist scenario where maybe 20 20 op locations are used in the entire environment and if anything else shows up that shouldn't be there that's a whitelist violation cool so you put it all together this seems like a whole lot of stuff to look at but really the idea

here is that we're going to take those alerts and those alerts from all those new capabilities that we've just talked about are the new logs right we're going to take those logs and look for correlations across those and then only interrogate stuff that matches multiple types of alerts and you've probably seen a pattern here right the pattern as we take a whole bunch of data we put in a database back-end and then we display with kibana or something like that and I'm not the only one who thinks that's a good idea every single one of these projects does the exact same thing right and all these are security related projects I'm going to focus on the hive

project because that's our workflow management system that can tie all this stuff together when we do that we can use the rest API to take all those custom alerts and put them into the hive database the hive will then look for observables that match from different types of alerts right so every single one of these was associated with the same IP and then it's gonna alert you to that and so now you can assign your limited personnel to only look at things that show up in multiple alerts and then you can't ask them accordingly right so that you're making the best use of your people and you can look at those alerts by status and you can also look at the types of

observables that are there to see if there's some sort of larger pattern in the from a management perspective you can have them open cases and see what how many cases you're opening and closing over time the next thing you really have to worry about is intelligence and how you include that so that hive uses something called cortex to integrate with mis and there's different there's 30 different threat intelligence sources it can use to kind of enrich the data that's found on those observables and see if they match any other databases for something that's malicious so I give you a whole bunch of stuff that you can do to fill gaps that you may have in your collection analysis

but I didn't tell you how to do any of it and that's because I think you can read and I'm working on a project called Protoss with some others and what this does is take every single one of those open source capabilities and gives you a step-by-step on how time together configures you to put in your switches so you can do proper spans and how you can scale that from a laptop size capability does all those things to a enterprise capability that scales out to as much data analysis as you need that's it questions I don't know the question thing works

yeah yeah is there any questions on the SL I do thing okay yeah hold on let me see you yeah yeah absolutely so it's my perspective on that is I the alerts aren't what I'm prosecuting anymore I'm only prosecuting things that are alerts from multiple types of domains right so if I have if you looked at the example in the hive project right now if the j-3 cert whitelist popped hey there's something's violating the whitelist I don't have anybody look at that right but if it spot filing the whitelist and my machine-learning house is the saying that there's an odd HTTP s exfiltration and there's a snort alert on it and right so now I'm like I went

from like 60% like maybe that's something bad to like 95% that's definitely something bad right and so instead of trying to worry about tuning every single tool ad in my environment instead let them do everything they can and only look for things that match across different tools absolutely any other questions awesome a little bit a little bit louder sorry

right yeah yeah yeah volume and then that j3 surf from Salesforce is amazing right so you could that the initial TLS handshake is not encrypted and those cipher suites are unique per executable right so those when we looked at the the unique j-3 cert or the j3 hashes that were in that environment or those were actually associated with the DNS it's actually DNS cat to agent on that box right and it popped up as something that's not on every single machine right and so those correlate so you didn't have to do any decryption at all to identify that right and if that matches also with some volume or some anomalous behavior where you doing that population

analysis right so like which box is being weird according to the other box with and relevance to data bytes out and in so we're still not doing decryption and then that starts to get you closer to being able to understand what's happening and those in those sessions yeah anybody else great thank you guys very much [Applause]

Collect All the Data; Protect All the Things

Related talks