GT - Scheming with Machines - Will Pearce

Name: GT - Scheming with Machines - Will Pearce
Uploaded: 2019-10-19
Duration: 57 min 28 s
Description: GT - Scheming with Machines - Will Pearce Ground Truth BSidesLV 2019 - Tuscany Hotel - Aug 07, 2019

BSides Las Vegas57:2886 viewsPublished 2019-10Watch on YouTube ↗

About this talk

GT - Scheming with Machines - Will Pearce Ground Truth BSidesLV 2019 - Tuscany Hotel - Aug 07, 2019

Show transcript [en]

please welcome will pierce hey he already coughing mm-hmm guys hear me okay cool well welcome to my talk this is technically my first talk that I've given and excuse me with machines we're going to talk about the offensive use cases of machine learning we're not gonna talk adversarial machine learning I'm wearing talking about using off ml to support offensive teams so Who am I I'm will pierce I work for silent break security we're a small security consulting firm in Lehi Utah we're kind of known for custom malware dev we write all our own malware we teach two courses at blackhat dark setups one and dark setups two they both focus on malware dev and darts out to focus on research

was anyone taken the dark setups courses out of curiosity no so he's got two two days of training so if I goes I'm sorry but my primary function is I'm an operator so I fish people I you know write reports my job is effectively to breach organizations you know write a report tell them how I did it I'm not you know data scientist the second pieces of my work is obviously research so this is the research piece for the last year eighteen months that I've been putting on there's a lot of excitement around ml an action kind of a finance background so anyone use excel without a mouse in here st. keyboard monkeys yeah so that's

my claim to fame was I can use excel without a mouse if you had a very boring party that's my party trick and then we do obviously teeth training so you know we teach dark setups one and two and you know we write training and things like that so our agenda for today's we're just going to talk generally about ml and InfoSec and it's going to be from my perspective as an operator so whatever that means for you know the products that we see the hype that we see and what I actually see on networks versus you know what maybe vendors are claiming may or may not be true I'm also going to talk offensive

tooling so I have three case studies that we're gonna go through that I've sort of developed if you're here for any groundbreaking ml research you're not gonna find it but what you will find I think is some interesting use cases from a very like lay persons view at least someone's who trying to harness ml to become yeah more efficient so as we know vendors old dogs new tricks from my research what I know is ml is not magic it's math so no matter what anybody tells you you know how many in here are MLS like data scientists how many red teamers pen testers blue teamers of course oh it's good even mix so we all

know it's it's math not magic there you know I think a lot of the products aren't quite built yet and we're the marketing engine is going and so we're starting to look at it more critically it has huge implications for the red side just in terms of the sheer detection capabilities and the sheer you know amount of data it can go through so it's it's extremely important I think for offensive teams to at least take a look at it my current thought is that maybe a machine learning might go the way of at whitelisting so if you remember a few years ago everyone's like oh wait you can just block powershell exe and no one

can run it and then the lul buttons project came out and they're like oh as it turns out there's a ton of stuff that can execute things so it's gonna be interesting to see how vendors trying to implement the just they just have an impossible task of just an enormous amount of data so it's interesting to listen to stocks so bender claims are generally you know overblown the data requirements so in organizations that we get into or we consult with most of them have insufficient logging and alerting so as we know ml requires some amount of data some of that consistent data obviously I there are techniques that I'm not aware of or can't speak intelligently to you know to

fill in datasets but I do know that ml requires data and I know that organizations don't always have the logging required ml says is a significant engineering challenge so there's just a ton more moving parts there's models to keep you updated there's model drift there's any number of you know and they're still they still suffer from the same problems that current Sims suffer from so false positives poor system implementation you know open shares on the network so if you're keeping your model in an open chair like you know we could steal it so a lot of times what we do is we'll actually find you know code or products written in dotnet which we can then

pretty easily reverse because it's not a compiled language and it's the same for models so I think the first neural network I found on Necker was is a business app and it was legitimate in open share so that model was just there for me to take and if you know anything about era machine learning you know that opens the door to any number of things and then we get to data scientists running the sim so what ends up happening is you know it becomes very expensive to run like have a full-time employee to run your soccer Sam or monitor alert so what ended up happening is people went sock as a service and what I know as an operator sock as a

service is terrible because it's just an average of everything so you're just a client they don't necessarily you know they have an SLA with you so I mean probably the best one we've come up against it was 24 hours later our client got there the call from the sock and it was too late so if you're gonna have data scientists running your sim you know you're gonna have a bad time and we've seen it before so but if you look at the tutorials that are kind of out there so I like babies first because it's the you know Satori's the DEF CON thing so from a malware perspective from our perspective so the red side so we I see all these

tutorials it oh we can detect malware we have all these PE files you know they're all statically in this folder and can read in all the bytes well we don't be primarily but an execution happens in memory so you're not going to be able to gather you know enough where you can gather samples and you can view them you know in memory but just a static PE file I think is gonna fall short of a detection in a real environment phishing so I think proof point is a great example of this phishing a proof one has a pretty good product but it ignores vectors like LinkedIn or Twitter so instead of just going through a

difficult you know ML enabled mail system I'm just gonna you know pick you up on Twitter with a document or hit you up on LinkedIn with a document so then I'm pushing the malware from the gate the mail filter to the web Gateway and you know mail filters aren't going to catch it and this is something we are already do because of increased detection and then on the network so we see some ok we can detect DNS traffic with machine learnings like well if you have a pair of eyes you can detect malicious DNS traffic malicious DX DNS traffic looks extremely malicious so when you're trying to shove as much data into a DNS packet as you possibly can to

get your you know there's that transfer back and forth you don't need machine learning to do it and you might be introducing additional complexity that's just unnecessary so you're building you're going from something that's certain to something that's less certain or something that generalizes versus something you know is true but from our side so what we see is typically from the routes of sequence analysis of function calls so there's a paper out but you guys are a couple guys out of Sandia and they wrote a paper about detecting malware I'm using the 32 API calls so for me if we're living in memory and we rely on those API calls that's something we should pay attention

to from my side from phishing so we have much harder time fishing than we used to you know we used to be able to send you know five or six males get maybe 50% clicks and shells but now we're just seeing context to our spam filtering so just much more intelligent anything with a link is getting picked up you know anything with their link is getting visited and pulling down there's payloads so there's just it's just an additional headache that we're not we're nothing our side of the industry is still kind of reeling with but obviously with things like going to LinkedIn it kind of makes a little easier but we're just having just spread it out versus you know only email

and then on the network you know we we need information to perform our jobs and so that requires querying you know any number of services or servers and if we're on a host you know Jim in HR and we're clearing some SMB server over in the accounting department even if you don't have hosts isolation that's going to look anomalous like Jim shouldn't be doing that so give did those detections in place but at the same time you could implement a machine learning model to detect it or you could implement host isolation in that case the just the traffic doesn't exist anyway so you know it's obviously give-and-take so our perspective or is my perspective of ml at least in the

short term I think offensive teams actually benefit in the short term so our data sets are smaller our expectations are lower because let's be honest none of my team expect like anything useful to come out of ml but that's on them because it does work sometimes but you know effectively we want to my job is trying meant decision makers I want to bring information to them so that they can make the decisions that they need to and then as I slowly you know pick away at that eventually I would like to get to a point where we have some semi-intelligent operation going on and just generally from the red side like we need to be able to create

better tools so later on the presentation I'll talk about pushing models client-side so if we could ship malware with ml models in it and just never have com that'd be awesome more if it could do you know rather than reaching out to communicate you know on an external domain if it could just collect information on the internal network and just shovel it back out whenever we need it to it's gonna be preferential so if we can trust it on the internal network not to destroy anything then it's gonna be preferential for us and ultimately we just want to be better operators so you know we every operator has like ten commands they run and if I could get to

command five and rather than going through my normal six through ten I can just go straight to ten because of some information um I got in the previous commands then that makes me a better operator and just more efficient at my job so I actually wrestled with title I said new kid on the block and it's not technically true but it's a new kid on our block especially from the offensive side there's not a lot of research out there there's some pretty sweet projects and actually if you go way back there's some really cool projects but I just want to separate ourselves from adversarial ml it's an adversarial ml you know it's obviously like let's make

this model thing this dog as a cat or you know classification bypasses is more than that but basically we only care about classification bypasses because we're cavemen but I want talk about offensive ml and just that using ml to support offensive operations so it's about you know awesome research so the parcel sec the timing attacks deep exploit the big language model you know they said they didn't release their big model but we found like the smaller models are just as good at generating phishing emails its markup obfuscate c2 is pretty awesome basically just obfuscates using Markov models through and you can see two over it that's pretty crazy her this one was you know DEFCON a couple year you're

going um but there's it's coming out deep word bug this is awesome but I have kind of started to aggregate these projects github which I'll post a link to but I'm just read em so I'm starting to collect these projects to start aggregating them when were talking ml so we we have I think a unique set of challenges on our side so our data sets are very sparse you know we might have like 30 projects in pipeline each with 5,000 lines long so we just buy em else traditional ml it sounds we don't have the data to do things so my epochs are higher than they need to be you know overfitting is in my view for

us not a problem just because our you know our use cases are much more focused we're not necessarily trying to generalize we actually might be trying to be very specific so in that way I think of sort of overfitting is a tolerance if you imagine like an engineering term so how close to that torrents do we want to get again I don't know if I have a professor liaison who's been helping me and it basically means go like I think this is possible just like yeah I think that might be possible I'm like okay sweet so should I go and do it she's like yeah go try it see what happens transferability so we are in a lot of

different networks so we obviously can't store data for the purposes of other networks and a lot of that data is very sensitive so Social Security numbers care be TGT hashes we just can't keep it for the sake of machine learning so we have to find a way to make our models both useful but also agnostic to each network so I think is potentially unique challenge and then sharing data so as offensive ml becomes you know a thing we can't share data like we can share to other TTP's so I can tell you like oh you should go look at install you tell because you're gonna be able to get execution application whitelisting I can't give you a data set because it

might contain some you know krbtgt or if you have enough domain knowledge you might be able to reverse it and get something if you're clever enough yeah potentially could just come from my lack of understanding of but there's a book that's called adversarial machine learning this bloob cover and it's it's pretty great oops cool so when I start to think about that problems that we can a face are the ones that I want to solve I don't think you know ml is on the blue side is speeding up so that just people are just behind it and so I think unless we the red side start looking at it we're gonna be left behind so it's a bit like you

know the frog in boiling water before you know it dies before it realizes it's too hot so my hope in this if you are red teamers to start looking at it because it is useful beyond the offense of ml organizations are going to be implementing it so you're going to need to have some basic understanding of it if you're going to want to operate in networks in the future assuming it takes off but as I go through you know thinking about solving a problem you know I have this empirical analysis so I have experience as an operator that leads me to conclusion so I know what an expected output should be for you know a

given input that's kind of what I was a human they have my own anecdotes as well so as you know having an anecdote so but those could be slightly skewed so in one network I might have able to do something and in another network I might have done the same thing in a net not work and I might assign the fact that it didn't work to something incorrect so from then not going in the future I'm gonna have some false assumption about why it didn't work until that is corrected and that's going to come through my ops another statistical analysis so you know how many commands did I run you know what was the length

of my commands you know that kind of thing I mean I think there's the blended approach is really useful we'll get into some statistics later but the first case study that that we're going to look at is classifying the sandbox so and if you've had a blog out a while ago let me see a year now I put it I said part one and I was like if I put part I'm gonna have to put part two and I didn't and this is part two is where you're looking at this is a talk so hopefully there'll be an update read part two to three seven I think so playing the sandbox so you know part of

our job is to you know package up word macro Excel whatever it is and try and get some user to click on it so we can get execution and part of that is obviously basically any email with an attachment these days is getting executed in the sandbox and some boxes are kind of out sourced analysts in a way and their job is effectively to make it some determination about the safety of that payload and it's been traditional you know like okay let's change the strings in it let's link it instead of attach it just try and get it past the mail filter but they're kind of dumb machines and they're very automated and they don't do I wouldn't say they do

a great job and they're very easy to see as a human so playtime is over so why would we care about being in the sandbox well you know we write our own malware so that represents a significant portion of IP so if that piece of malware gets compromised that represents a significant amount of time that we're gonna have to spend rewriting that piece of malware I mean slingshot is our main tool let me probably in Devon on it for seven years so it's not like and actually we don't send in slingshot for that reason we have a stage one tool that does that so we have a smaller piece of malware that is responsible but we just want to

protect our payloads we just don't want to give it away for free we don't want to be spraying the internet we don't want to be accidentally detonating it in someone's personal account so we need to make sure that you know we protect our payloads even from sandboxes alert fatigue I saw some of you raise your hand for blue blue side you guys you guys are see familiar with lurk fatigue right but little-known facts are we the so we put image pingbacks in our macro so there's an insert field code and once you open the document it'll go down to a remote web server and it'll try and pull down an image we just put an

arbitrary URL and it has a web hook so when it hits it we just get a text well it's all well and good with you know you open your email and you just get we just get one so I call sweet that's show or I got someone to open it but in a sandbox world they like to share it so it's like you get one and then you get two and you get five and twenty and so I don't want to be up at 3:00 a.m. responding to sandbox alerts so if I can push that job to ml and just be like you deal with it from the hours of you know even from 6:00 to 6:00 overnight then you know

that's worth research for me because I get more sleep it's pretty easy to detect a sandbox so does anyone have experience with sandboxes much they don't look at the process lists of them so if you look at your process list on a sandbox or just a host that you don't use very much I'm even a Windows 7 vm versus a user's box there's gonna be a very clear difference and I'd say humans human operators like our team can detect a sandbox party you know 99.9 percent of the time but it gets laborious to do and so we you know we want to make sure that we're we want just want to break it down so we get a process back this back for

OPSEC purposes so I said we use a stage one tool we got a process this back and it's kind of a look before you leap so if we in the process let's see like silence is on the box for us that means a handful of commands are off the table so before even interacting with the target we have some expectation of what commands that we can run versus not so we're not gonna be on the host generating unnecessary alerts just for the sake of you know not having how to process this to begin with so I guess that's why we you're even looking at this process list to begin with so there are clear differences the other

difference obviously you know users are gonna have like Chrome and Word and Excel Sam boxes aren't going to have any it's just going to be very basic typically the payloads get run as admin it's Windows 7 you know VM so they just look very different in this chart up here is just a little PC 8 thing that I did so as always the classification piece but by the numbers so if we're looking to select features I'm I chose process count user count and then add a process user count ratio admittedly they're terrible features and an obviously a label but it illustrates the fact of you you don't have to use any NLP to sort of break down the numbers

you can just use features you can use you know the pure process count the user count any number of ratios it doesn't have to be you know this really complicated thing it can they just it's just data representations it's like however you want to represent that and obviously I know there are there are techniques to pull out the best features but you know for our purposes this is this is worked you know pretty well um you know and it's not you're not just limited to the features if you know a lot of people have their own checks that they want to make you know process list is ours but some people like recently used files is a keyboard enabled what

version is it is there a user name attached to it so sandbox is off and have like Bob so there one admin John admin PC so they just don't try very hard but you can just you know you can arbitrate just attach whatever you want and these kind of become representations of you know your your sandbox your data set and if you any MLS if I say anything in Craig just talk to me afterwards and correct me that would be extra extremely appreciated so if we're looking at decision trees this is the sort of the first actually this is the second model when to cuss obviously when you learn ml you're like I want to do the cool stuff

so what we did is we actually into neural networks first and then we're like ok is all right and then we're going back to decision trees like okay this is actually way better because it's one simpler I can explain it to my operators so I can be like hey you didn't lose your shell because weird bye don't know you lost it because you know I can explain it to them and I like to think of it as 20 questions for an algorithm so that's you know it's easy to explain to operators who I've already mentioned a bit like cavemen but yeah you guys are familiar so effectively you just have a root node and then it's making some determination

and it's gonna you know true or false it's gonna just go down the tree until you get to some conclusion of the classification piece for it and the codes you know super simple so you know even if you're not mathematical like I'm you know admittedly terrible at math as long as you understand the concepts and the code and you kind of understand what goes in and what comes out and on balance you can get it you know it's it's extremely accessible so it's not I wouldn't say it's not like math that you did in high school you're solving problems it's it's I've kind of I would say maybe not fall in love with math again cuz I never was necessarily a math

love of math but just the fact that it's it's more about concepts and expressing ideas in in mathematical notation so it's kind of cool so you don't have to be mathematical to implement these things you can you can do it as long as you obviously if you want to you know get into the details you'll have to know but some simple stuff soon obviously neural network so we have our inputs we get our output and you know they're the weights so the stronger the weight the the stronger the the signal on the output so we just make a prediction on it so you know it's kind of basic ml stuff I'm gonna shy away from explaining

the models because I know there's some ml is in here and if you want Google I'll just do myself a disservice if I try and explain it in front of professors so I'll show you where from and I'll talk to the operations piece but again this is the code for it and it's super simple you know there are libraries like I love Korres and it's just you know do you have a couple backends on it I've been moving more into the PI torch area just because it seems to be more academic and maybe a little more supported and then also ml net for client-side ml so Windows just actually said they use days ml net and their

Windows Defender product and so it's going to be supported for the foreseeable future so it'd be awesome to be able to use ml net in your malware and ship its client side same thing Microsoft uses it's going to be supported I don't see why you shouldn't or couldn't we have some Microsoft people in here so never like the code super simple and so we have the the models once we built them so I have a server done that I'll release I called it deep drop and effectively all it does it has these two models trained and then we have a function so there's a macro with it and all it does is it post back

process list as we would and then you know effectively just parse the process list as you would a regular just instead of doing a lot of them you're just doing one of them you'll gather your features and then you'll make a prediction on those features so if you're in a neural network obviously gonna have to scale them to whatever is appropriate and then after that you know we're gonna make some sort of if the prediction is good we're gonna drop our mail or otherwise we're not it's very simple so you know it'd be tantamount to like if you had another singular check that you're just shipping off this is this is very basic dropper code here so there's always

going to be some drop or decision and you can also push it client-side but you know it's not all nice so you have you know model drift so overnight all the sand boxes could change so they could all just be like you know we're done move into seven we're gonna go to Windows 10 and then that's you know just having looked at it the Windows 10 process list is much larger and much already much closer to a regular user host so once we start getting for example a stock Windows 10 our model gets a little fuzzy again it gets hard to it's harder to determine network defenses so obviously we've noticed a few vendors there Sam boxes have stopped

reaching out to the Internet so if we don't have our information we can't build models so we kind of need that that feedback and then data collection for phishing campaigns you're going to need a separate process so you can't collect data while you're fishing so those two pieces of code need to be separate so in a production like fish or a production macro you know you want to keep it small you want to keep it tidy you want to keep it as small as possible but when you're just collecting data that adds code weight to it so you need a secondary effort so you're going to need some sort of secondary fishing campaign that I mean it could quite

easily be automated where you're just going out and just sending things to virustotal or or clients and then anyway you know you can get execution in the sandbox and that'd be separate so that's going to feed your model and then your prod your prod vba are your prod macros are then gonna use you rely on that model to make some sort of determination adversarial inputs so we've kind of become a little full circle here so if we were to post back you know if we were to make like there's the the features post back in the URL parameter so for example and it get requests and parse those out some clever analysts could you know change those inputs fairly

arbitrarily and get you know trick our deep drop to deploy our malware and get it up so you know for example if we were to just that top line there so if we were to send this in a URL you know we're gonna get someone could arbitrary to intercept it and change those numbers and then force our dropper to do it so there's gonna have you some check on the other side but and what's funny is it's it's like I'm dealing with the same problems now that the blue side is dealing with so it's like for its country science of the same coin let's see so the other thing I mentioned is client-side models like Excel is built

for data so you can build a neural network in Excel you can build a a decision tree in Excel so you can then package up your your model send it with Excel and it says if you have sent some sort of operator intelligence with it so you you then it just never has to reach out to the Internet and it has everything it needs client-side to make a decision about whether or not it's safe this is kind of Awesome there are some caveats so I don't think I haven't heard of anyone doing this so your macro is gonna stick out like a sore thumb and you lose the ability to troubleshoot so in our macros we you know we I'm

ention the ping back so we know when a documents been opened you know we have a macro ping back so we got a process list back so we know in the macro has been executed but if you push it client side and it you never get anything back you can't troubleshoot so let's say for example we get that macro ping back we see a process list we can be like oh you know but this products on the Box you know we'll just try a different payload but if we never get that process list back we can't make any assumptions about what's on the box or even anything that happens so it's like it cost you

something effectively so you know it's gonna cost you something to send it client-side and we kind of need that information to make intelligent decisions on our site about the future of our op so let's see it's now have a demo of it and see how this goes

you guys can't see that okay so obviously we have this terrible output from okay so this is a deep drop I'll release it on github and it's super simple if you guys have questions or I ever so has to pre train models I'll release the data set with it as well has like maybe 200 things I can't release like full process list because I have client information in it so you'll get the you know just the yeah the features yeah that I've selected now I'm just starting it so your simple it's just going to load your models and then I have a macro that I'll release with it as well so here we load our neural

networks we load our routes just a flask implementation super simple then I have a macro that I run and then in the background it'll run that so we're not going to drop the payload so you know I have it on a little VM just running in my lab so super simple like it's not it's really not this really not that difficult like you guys probably do much cooler stuff between the hours of 6:00 and 8:00 then deep drop

any questions about that one very simple again if you're here for breaking my ml stuff you're not going to find it but this is kind of how we're thinking about implementing ml on the offensive side I'm so command recommendations this is our case study - so we went from this really easy awesome thing to something that's actually extremely complicated so when I first got into us I you know be suite command recommendations that would be awesome because then I would just have any command I just ask and you're gonna tell me and I'm gonna think anymore but that's not how it turned out obviously so old habit so we have existing knowledge that we want to take

take advantage of so I have a model in my brain that's like I need to run these ten commands and based on this information there's other stuff so we want to take advantage that we have a team of six seven operators so I have logs for all of them so i we have parsing for all of it so I get just everything so I can see you know kind of everybody session logs for good or for bad some of them are pretty dirty not in terms of dirty but they're just like they're OPSEC is terrible sometimes but yeah so sequences of commands like we know there are models that can analyze sequences and give us probabilities so

you know we can say you know based on all these logs based on lies commands run based on this sequence of these first ten commands the length command is probably gonna be X or LS dirt whatever it is but what's more interesting actually when you look at this is where we get into the statistical side is you can start to see patterns in an OP so once we get initial access there's like a grouping of commands that we initially get and then as the OP progresses like when we're when we're when were privileged then there's a different set of commands and so it's been really interesting to see sort of the even though we know it and we do it it's been

interesting to see the those subtle hints of transition throughout an OP so these are just a few ups that I pulled and it's just a simple graph but effectively number of commands in each color represents a different command and so we get you know a pretty colorful distribution but obviously orange and I think this is our power show we have an in-memory power show that we run I think is our power shell and then the other one is get UID but so there's some basic metrics from our from our ops so further is for the last rolling 12 months and obviously we don't keep data around so some are deleted or some just aren't useful anymore but six of us you know we

ran on the 60,000 commands among us across 30 ops and we average you know almost 2,000 commands so aside from just the ml stuff like looking at this data from an operation standpoint is extremely useful so you know how long do you think it takes you to type a command two seconds the second half second we okay so now you can multiply this by 2k and you get some reasonable assumption amount of time that you're gonna spend on and up so our project manager loves this I'm told about this so don't tell me we have 99 possible commands in our rat we ran only 84 of them so from a malware dev standpoint or from OPSEC standpoint

we could probably drop 10 commands and we just lose all that code and then our binary smaller so not only does it you know this kind of analysis help us shore up our ops but it helps us find bugs in code faster it helps us find where operator having issues faster and it helps us to just remove unnecessary things that just get built into products you know if you think of your malware is just another piece of software you know this kind of analysis can be extremely useful then you also get stats like you know the most commands is obviously an outlier and I know which client that was or the fewest commands where somebody

started it up and just exit it really quickly and nothing useful was actually done or the longest command where someone's uploaded a DLL and that command just gets that yelled or shellcodes gets put in their log and so it's not it's not necessary pretty but I think going through this process helps you take that net step towards ml so it helps you think about your data helps you think about structuring it helps you think about so I think it's an it's an excellent first step to toward some sort of offensive ml I don't think enough red team's spend enough time inside their data since there are top commands so if you look at the total

here this is these 15 commands are almost ninety percent of everything we run but I already said we have 99 commands this is like what right so we could probably cut 50% of our commands and not notice a difference in ops and that has that has a significant impact on our detection rate potentially on the number of API is the code that gets loaded into memory so it's extremely you know extremely beneficial this power show up so PowerShell is obviously a bit of an interesting one because it's just a wrapper for other commands effectively but and of the PowerShell commands I think 85% of it was get domain user getting that local group and get domain

computer and then some variation of those and there's like some other ones so it has been extracted pretty enlightening to go through this but getting to the data is extremely painful my parsing isn't is really difficult so currently we have written a database boost not in our product so we have logs so I have to go through unstructured text our regex is like 80 characters took took me ridiculously long time but to get to the day you have to strike the commands you have to somehow deal with the arguments which I have not figured out yet so you'll see him the demo so the arguments was at was is just impossible so there's like six ways you

can run get domain user and you could run you could write reg X's for everything but if you have 99 commands like that is a significant amount of work so I have to think about is it really worth the effort or is you know maybe you guys can tell me is does NLP like have an answer to this no no is reg X right as this problem but once we have it all we can not start to model the user and we can start to build out maybe particular profiles or whatever it was but we touched it a little bit out sequence ah so based on the previous sequence of commands we're just going to

get some sort of distribution some sort of probability of the next command and then we're going to run that one so recurrent neural networks and then long-term short-term memory and I'm not this is the one I'm not going to explain so you guys can go goo bullies but what I do know is that there's input there's a sequence and then there's some distributions and probability that gets put out on the other side that gives me the next command based on the previous sequence of the analysis of the prions so demo

the project I used I found was actually it's just text gen RNN so I has an interactive mode it seems pretty clean well shouldn't say it's decent but here we're loading all of our GPUs and that has an interactive mode so it gives us low C so here you can see like it just builds a it just goes one by one and then so if you think of get UID so get you're used to getting the user on the box and that's usually one of the first commands we run if we don't know if it's not it's PS so you can see after get UID it gives us option so these are the next ten commands that

most of the time run I think we run into issues because we only have 99 commands so there's only you know 99 possibilities and so if we don't for example don't train it up and you'll get that where's hook oh you know P she'll gay RDP she'll get you ready be sure get UID and it just doesn't go anywhere else so it's been somewhat painful to deal with but actually not as painful as the arguments but you know but it's a start oh Sh this would go away

do you guys have any favorite projects like for RNN zor-el tsm's i haven't yeah that's this that was the challenge yeah yeah yeah right yeah so you could do that but we have our own shell so we use the CMD to library so that's actually my his next stop is to look at their parser and see how it does it we have some modifications there but effectively I just need to dig deeper into it I was hoping there'd be more elegant solution than reg X but that's that's where I'm at currently but this is kind of where I you know we kind of started at a high with is very easy to implement some binary classification but it's very

quickly it devolves into this very challenging finicky not finicky but it very quickly devolves into this very like architected solution where you need to have like multiple data sets and reg X's that you're keeping offline and so suddenly something someone like me who is interested in ml but it's primarily an operator if I think about my teammates like they just don't have the patience for it but if they don't if it's almost like if they don't learn the basics of ml in in three years they're not going to be able to op on a network so depending on that the adoption rate of ml in networks it's you know I think this talk hopefully if you know red

teamers obviously right here because you're interested in but hopefully you get the ideas like you just you need to start looking at it you need to start thinking about your commands you need to start thinking about the processes you inject into you need to start thinking about the sequences of commands you need to start thinking about how your the sequences of the API is that your commands call like inject arbitrary API is just like between virtual you know create remote thread and virtue Alec just put a ton of different api's you know so it's like I know Microsoft in like these monotonic models that supposedly I'm not clear on the details but what I've heard is that they

prevent you from adding good features too but then so then we're just back to where we started where we're just removing malicious features or we're trying to doppelganger explorer.exe so I don't it almost feels like AV is starting over it almost feels like the traditional AV is like standing still and we're just in this run-up phase until people have figured ml out on the defensive side but what I what I think is gonna happen is that a mice my experience has been it's not as easy it's not as easy as it seems and there's a lot of nuances to datas and you guys already know this but there's a lot of nuances to it that at scale become

almost impossible like I I do not envy you know anybody who's trying to solve that bat challenge I'm actually I wish I had saved that tweet there's a guy I forget who what his name but effectively I think he worked at some AV venery he's like I've been doing this for six years and the best thing we've come up with is network alerts so if that tells me anything it means like that's the data that you have like companies paid companies paid vendors to like take in all their logs and vendors through in a pile like there's no structure to it and I think now they're realizing their mistake and they're having to go back through and

like structure it properly but I still don't know how they're gonna deliver a consistent product when all of their clients have different levels of logging so Eddie do you have models for different clients do you aggregate everything what do you what do you do I think probably the only person are the only company who PI gets away with is Microsoft and that's just because they're scale in there and on the US but if you think about like the AM Z stuff you know and we're a V vendors can have hooks into AM Z you know even that provides you limited like side the silence research do you guys see that one where they had that silenced bypass I put it

in air quotes cuz you're still not gonna get that past a V you know silence did that as a way to sort of backdoor their model because it is even even legitimate EDRs now have a really really difficult time deciding what is legitimate execution of what isn't so if you're if you're living in explore you can basically do what you want because explore is an extremely noisy process so but if you take that away and you put it in an ml model it just becomes I don't know how like impossible night it doesn't it impossible there's a super smart people but it just becomes an additional challenge that you're gonna have to solve and I don't think we're

just gonna I I kind of feel like we're gonna be in the same place five years no okay so so the challenges obviously arguments and assuming a human expert oh say thank you sorry some panic from the back there and so assuming a human expert so you know when I'm on an app and there's something not working I have to troubleshoot it and that's gonna increase my command counts gonna increase you know it's gonna put those sequences off so implicitly I'm just gonna have bad data in my data set so there could be a sequence at which I get into where the the model is telling me to like troubleshoot something but it doesn't know any better so it's kind of

difficult so you get just dumb commands or fat fingers misspellings yeah it's all the classic I don't be stuff but there has let's do my research there's I'm actually really really old papers from like the 80s and the 90s you know as windows researchers we really love like subsystems and like actually one of my prized possessions is a comic that the Microsoft Library sent me that I bought for three bucks up Amazon I come in so old that Microsoft is getting rid of it in their library okay so you can go you know predicting use it UNIX command lines and in this paper they basically say you know that turns out it's really difficult you know and

there's no there's no conclusion to it and so you know but it's obviously not a new problem and and that's you know I felt better about myself when I when I read that paper and say oh cool know what they couldn't figure out either okay so now we're getting to the last case study and it's the reinforcement learning piece it's a reinforcement learning piece and this is like the Golden Goose is like what everyone wants right now it's like this Auto hack all these adversary simulation products that bring in that mitre attack framework they're all trying to build like some sort of intelligence into it and they're just having a really difficult time it's actually really

interesting to see like how different organizations are dealing with it so like them I think the mitre guys they use like preconditions and postconditions and things like that and we're not different so you know I'm not different that's what I effectively I'm trying to do but so you know my solution is is probably to like step a little further back and start with something a little more simple but it's like it's like cotton candy so it's like reinforcement learning could be extremely on the surface really awesome but when you kind of dig down into its basic tenants is actually not that useful for us so for example we cannot we need we can't have an agent learn and

network we need it to learn real-time like we don't have the luxury of it being able to like you know go out and fill its cue table with all the right values and then go out and be like oh this is where all the members oh it's like yeah no we don't have that luxury but if you're not familiar with q-learning effectively for you know given action in a given state the environment is going to return a new state and a reward and I'm just going to fill out its cute table there are other methods of machine reinforcement learning but this is the one that I kind of started with so we kind of have this

state action table here and if you're not familiar an LS against the see dollar on windows if you have if you're an ad administrator you'll get a directory listing back if you're not you'll get access tonight so this is kind of a good check for for admin but effectively you know we have this state sword admin on host 1 or Ivanhoe's - I'm avatar Network so we're gonna have some action we're gonna LS from our malware into the target networks I'm hosts in that Network it's gonna return some output either we're gonna get a directory listing or we're not and then we're just gonna give it some reward I'm just like that it's obviously not optimal for a number

of reasons primarily which we can't learn we need to learn on the fly we don't have the luxury of going back and then the video got deleted for this one but effectively so this is we have running doors this is simulating just a false you know no no directory listing and then we get previously if we had actually know what I'll live life on the edge a little bit

cool okay so this is the sex show to our tools thing shot it's like my little dev environment now we have a scripting engine so I just have a little script that is it just initializes so we could do the sale probably off time for a hundred let's not live life that huh yeah it's mr. 20 all right this is just about we're gonna create the network he's just hard hard-coded effectively just doing something right now so it's gonna go just run through these commands gonna start scoring them and then look at them so there we go we got a process list or sorry directory output it's going to continue to go

so as it's going is just updating the scores and then at the end because we don't have a ton of time oh there we go so there we go we have like it tells you the step you know step zero obviously just our localhost so yeah it's fairly fairly simple but you know again this is we don't have that luxury so my solution so far lost my mouse my solution so far has been to use fuzzy logic and fuzzy strings so if I think when I have creds less I'm a sequel I have creds to a sequel admin account where might that account be an admin any guesses I heard that sequel server yeah exactly

so and actually that was slide later but the pressor that I kind of interact with she was like hey have you tried this as levenshtein distance just added a string metrics and I was like this is perfect like this is exactly how I go about five creds like a password sprayed I have crisis exactly I'm like okay I'm like a sequel admin these are single servers and that's just and it's not a guarantee that their local government it's just it's it's a more likely that they're gonna be you know local admin on a sequel server than they are like a sim server just due to the fact that human nature need to delete they label things

as such so that's what we do so I mean if you look at like the degrees of truth thing like we operate in degrees freeze like networks are known they're unknown but they're discreet so it's like we don't know what's in there but we know they're not infinite so you know we have to kind of go through and play with it but effectively we just have these scores and they're different across the board for each network but then this kind of feeds into our queue learning algorithm but it's you know degrees of truth then I love that and then a string metrics piece levenshtein distance has been awesome so we've all we do is we

collect the network information we run the fuzzy string networks on a combo of users and hosts so I pull down users and hosts and just side by side and then I select the next action based on the string metrics score and it's just like how I do it is so two things you've got to be if you're gonna put some sort of AI in a network my two caveats have been it has to be better than random and a default implementation of an RL accent selection is random we can't have that so he needs to be better than random and it needs to be better than an if statement so if you can satisfy those two so if you're just

starting at host one and going down so if you can satisfy those two you have a chance at being more efficient our as efficient as an operator I'm sorry I did do the demo but there are some challenges like said networks are unknown but they are discreet so it's pretty awesome but cheater pumpkin eater you know prefilling your cue table I don't know if that's an ml faux pas yet um so I don't know if it you can you can pre-fill it with memories or whatever that is but you know that this is this has been my current solution and this has been probably one of my more successful actually this has been my most except other than the process list

stuff this one has been has got me you know da everytime so I usually have like a threshold of like 15 now I'll just start at one and I'll go down you know getting that local group all the way down and you know at least one of them won't be I've yet to convince my colleagues that this works yet though so they're less so that's it the search won't say a big thank you to Nancy fuller BYU Brigham Young she's one that put me on to levenshtein distance and she's just been really helpful guiding my efforts so a big thank you to her I mean obviously my colleagues Nick as well for for reviewing my terrible code

and that's it thank you for coming [Applause]

GT - Scheming with Machines - Will Pearce

Related talks