← All talks

Machine Learning Fueled Cyber Threat Hunting

BSides Augusta · 201748:00898 viewsPublished 2017-09Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
BSides Augusta 2017 Tim Crothers (@soinull) Machine Learning Fueled Cyber Threat Hunting Cyber Threat Hunting can be difficult to do well but most organizations have come to realize how critical it can be for their overall detection and response programs. In this session Tim will be releasing a new open source tool to aid your hunters in their efforts. We'll explore how machine learning can be used to both speed your hunts as well as help find things you might have otherwise missed. No expertise in machine learning required for this session, just a desire to find bad actors who may be lurking in your organization. You'll walk away with a new tool plus a knowledge of what ML can and can't do to help you find evil (hint: it's not magic despite what the security vendors say).
Show transcript [en]

[Applause]

[Music]

[Music]

today focus of today's spa all right so

[Music] just [Music]

[Music]

[Music]

[Music]

[Music]

[Music] but to also up the best

[Music]

[Music]

[Music]

whereas our our sea breeze you know

[Music]

[Music] all right

[Music]

[Music]

[Music]

[Music]

so all of a sudden the suit becomes hey decide approach we want to use surprise run surprise I'll talk about that second we've got any be permitted this is where Wilson works the hardest bit with morning is not balloon tool and using the tools then we've got the proper features are straightforward right so those are the key steps all right so let's talk we have a signature or something we could detect it but the back

[Music]

[Music]

[Music]

[Music]

my versus my red team features right versus the bad guys

[Music]

[Music]

[Music]

that's where I would positive leaves encourage state machine learning and where we do the powerful works hey normal activity so there's basis supervised which basically just means that the we direct the learning of the machine this is generally best for solving very specific problems right some very discreet but he leave native swag my little [ __ ] we need samples bad it's a good okay but it's harder to get good to drink that is big bad things get back come on right there's not more analysis not manage all of these free resources out there that got tons of the present rate samples of bad that we use the game [Music]

this is detection this is security for general security you're gonna run into problems this dream the problem so to be clear about we need and there's all these famous with word of that that spreading out the smaller piles of native is perfectly fun what you just find use an advocate better other sizes data then

[Music]

so

[Music]

[Music]

[Music]

[Music]

whereas not just me we can't tell something too bad [Music]

[Music]

[Music] okay so we're parse it all of the data and we'll compute the variance on all the different hats anything with target to use as a feature so that is like to start figuring out which which happened which things that reduces features all right so as I said Python it's a great option that's what I prefer but there's lots of other things that could be used the reason Python is such a great option is because there's a lot of people [Music] that are huge Frank Bannister and so they've already have rhythms guarding them all that stop and you see this has been by one two three two pieces of over here right we're going to

need weight sparsity obviously I use whatever parses we put it into our datastore we're back algorithm the algorithm don't know what happened to go with certain this rainforest rainbow forces of

so you know it's and then in the Trinity mode and an analysis of them so what I mean by this is you practice in one pool just for simplification pieces go free Frank's what I've got assuming that's free and suddenly national right it's just too different from soccer just understand [Music]

[Music]

[Music]

okay and then the animalizer model parses today and calls out so let's look at some of this rather than me just talking about it alright so I'm going to get this kicked off so I've got a trained model here so this is isolated fasts so I'm gonna run this against you can see a directory there well first off I'm loading my two mono files back to risers and enemy then I'm running it against a directory called a simply test small and then I'm going to put any output into a file called findings text and the - B of course is just for boast so it tells what it's doing so I'm gonna start that now because it takes us a

minute to load the models because the models are are pretty big well let's doing that I'm gonna pop over here to show you the code so here's the trainer code notice I'm bringing in the modules I need and then there's my option set up right and then here's where the real work goes on so we've got here on line 45 right [Music] all right bro - art for the win right it's your peek-a-boo process against that and so that's what's in those five in those directories then I'm classifying it right uh which is lying to 51 through 53 and then 56 literally right this entire program is 59 lines level sorry I said 53 earlier I

did put blank lines in between please be clear and then the assess program on the other side just does the opposite right so it starts out here by loading the models right you can see up there at line let's see that's 47 so it light loads my my vector file on my mono file and then it just iterates through all of the all of the files and the notice is right out okay so then let's make it sense so far oh and we're done over here then we scroll back up so you can see what it did but as you can see it punched through a little bit of files here alright there we go get it back to

the alright so it loaded it then it started on the directory right so here's the assimilate test small let me open that open that up for you just to show you what that looks like so these files so these are just log files output from bro let me turn off line wrap and make this a little bigger so you can see it so these are just the bro HTTP header up okay so that's the files that I've both trained off of and and I'm working off of so it parsed through all of those and then it rode out to a file called scroll down to here somewhere there we go sure that was done and so we should have a results dot txt

here my name's not that I'm sorry don't remember my name the Midlands oh and now I'm going to turn them back on so the line it called out is this yeah that looks a little I certainly

somebody is you know doing a request against that with a really big that's the power of machine work took a whole bunch of files like I say your biggest work is in the south the the biggest work is in the getting those clean data and an unclean data let me show those directories here okay so for the mouths on this doesn't want to make it you get the idea here I think those are eligible right so each one of those is the distinct and literally what I did was I took a packet capture of the comms from the malware and then I ran it through bro - R then I edited so one of the key things here that you want to be

aware of so again if I one of these bad files times so if I open up one of these bad files I actually went through every single one of these bad files because there's a whole bunch of noise right so when the malware is going on but the harder part is the normal traffic here and I want you so take a note of the sizes of these log files so these were about oh I think it was about an hour of pcap capture from some of our pro sensors that work few terabytes

[Music]

[Music]

[Music] generation with a very very powerful capability one of the and and there's a bunch of tools out there at this point already so and there's a light on each one strikes to be misses and ultimately do that and you're still problems you probably have unclean you probably this is also it actually has two will do surprise surprise so that's kind of fun simulator smiling some way

so this is so

[Music]

[Music]

[Music]

[Music]

[Music]

[Music] my process

[Music]

[Music]

[Music]

[Music]

[Music]

with Google+

[Music]

[Music]

[Music]

[Music]

[Music]

but I really solid oh yeah that's exactly right that's where that really becomes important is to we'll get back kinda fade over time right so now that I knew this is bad or this is one minute flight is potentially bad is not actually bad I'm going to run that back through and keep back

so mine right now in my experience is so

right so right but mostly I'm using so like that probably

accuracy

[Music]

[Music]

[Music]

[Music]

[Music]

[Music]

[Music]

[Music]

[Music]

[Music]

[Music]

you know there's there's a lot there I haven't specific one right after King me

[Music]

[Applause]