← All talks

A Journey Through MITRE Evaluation - Alex Davies

BSides Cymru Wales · 201931:34116 viewsPublished 2019-10Watch on YouTube ↗
Speakers
Show transcript [en]

yeah okay cool see ya sources we here my name's Alex I'm secure researcher with the team f-secure and today I'm gonna be talking a bit about lighter and might revaluation and the might attack framework as well that you guys have a whole up before I'm gonna keep it quite brief on the soil framework side of things I think a lot of people talked about it before but will talk back just I guess some of the pros and cons of it and also a bit of a cheat sheet how you can use it in your organization for the second part of a talk I'm gonna be talking about the evaluation this is essentially EVR abhorrent assessment so anyways that

of yo for its endpoint detection and response so essentially an endpoint agent you put on your computers to record data detect attacks and what my trip done is they basically evaluated all the young EDR agents and produce some results you can publicly see and that's kind of all I want to look at today and how you can analyze them analyze those results but also talk a bit about f-secure and because we've actually just gotten through this evaluation ourselves and the con lessons we've learned doing it just quick introduction to myself I still am a skew researcher with f-secure I spend my days basically looking into malicious activity attacks how we translate that to defensive approaches

just kind of interesting work and one of my major projects this year was the mitre evaluation essentially and lastly as roll over to say I'm actually from Cardiff as well I live in London I've been there for a while but I'm a Welsh boy at heart so you know I want to say a big thank you to the pea-size organizers but you know putting on this awesome conference karakuri on the map in that in Wales it's really kinda good to see all right so I'm gonna jump straight in with things today I'm so TAC framework who here has heard of the attack framework or unit the attack framework okay cool awesome so yeah I mean this is a framer that's

been around for I think a few years now and it's kind of slowly kind of gained in popularity more and more people have started using it in their organisations which we call the sea I'm using it as well to kind of benchmark themselves against attack techniques so it's awesome to see that I'm frame who doesn't know this is just lazy and encyclopedia of techniques that attackers can use and it covers the whole kind of kill chain from payload delivery to persistence a lateral movement to expectations the whole thing really really core is awesome but there's a but coming here right it's it's big it's really big it's on 338 techniques that I counted the other day

there's a lot there to deal with and actually for security teams that's maybe a bit too much you know a question I think but organizations often ask themselves is you know which should we focus on but when you start in this let's do this so many things there and you know I think that's that's a hard thing to answer at the moon because mitred they don't give any information about like prevalence for example they don't tell you which techniques actually are used in the real world more often than others they don't tell you which are easier are hard to detect and then that's really incorporated here so I kind of put together with a cheat sheet

and you let you lose right this is a lot smaller than the the attack framework I think there's maybe ten techniques on that that I thought were the ones that are most interesting and these are the techniques that we've seen being used in real word breaches again and again I look these are things that are quite easy to detect you know if you're using system on or you're using EDR agent or you just got a bit of PowerShell right and you just run our numerator services you can do that right it's not too hard um and thing is I think a lot of companies are in this position where they're struggling I think with detection in the modern era and I think

he's secure II often it's about going back to the basics right you know we all kind of know is I think you know you nail the basics like patch in access control user management all that stuff on the preventative side the same applies on detection as well right start with the basics just find the basic resistance mechanism find like purchase activity that's maybe suspicious you know MSHA Jeremy don't run look for that and you can detect attacks quite easily know CUF secure we don't have the luxury of just being all to focus on five or ten things we've got to try and cover as much as possible we won't have as much coverage as we can you know we're

experts in this real security company and this kind of leaves you nicely I guess is the attack evaluation because evaluation like I said before is a security testing process aimed at EDR agents so just like you guys Marcy Navy Tesla alright and you see the kind of a different AV agents all compared against each other and you know one's better than that one because it can detect more samples or whatnot basically the same didn't really exist for EDR and the main reason why is because EDL's a lot more complex than a V a V is about detecting individual files right is it good is it bad simple right with EDR it's more about behaviors it's not just about for

analysis about behavior analysis maybe someone's running some power shop maybe someone is spinning up run dll with a weird vll and you weren't expecting right these are the behaviors actually that real-world attackers are exploiting and using and it's actually easier fulfills that that nice shows like gap to help you detect us but in terms of the testing no and actually what is a good EDR agent and you didn't really have anything to tell us no into the security community until mites came on I might have said hey we're gonna do an evaluation we gave all the vendors on board and we're going to compare them and we're gonna actually use a real world's apt group to actually model this

activity as well so what does this process look like well it's quite straightforward you've basically miter between the current role of the attacker right there running the attack techniques based on the you know micro attack framework and then you've got us you know f-secure or any other vendor taking on the role of the defender right and here's just a quick example you know this is kind of a made-up test case but you know you just got CMD / c hostname to enumerate in the host name on the local system that'll be the kind of thing that might you would run and then from the vendor perspective we would then show them what we could detect so here maybe you've got some

telemetry you can see hey we sure the process name we've got the arguments for that process but also you know we've got some kind of learn that pointed us in the direction of this saying hey this is weird enumeration activities maybe it's worth looking into and finally they published all the results on the website again free to access you guys can go and look at them right now you can see there's also a lot of major vendors out there today um Swift mentions roll in round one there were trial vendors round 2 it goes up to 21 so we're just coming to the end of round 1 by the way round 2 is starting later this year results we

published next year and also one of the subtle thing was just that in round one there is what they call the initial cohorts these were the very first users to take part in the evaluation and no controlling emissions and these were kind of people who came afterwards in round two it's slightly different they're just doing one intake and I believe that's already closed now so back from the 21 vendors it's already car locked in so we started our journey you know down sort of the micro valuations for earlier this year and you know it started with an internal thing you know we spoke to management we put the the idea forward got approval got budget got resource got it all kind of

signed off for our application forward through tomayto who then accepted us onto the program and we can have started our internal preparation basically now this took a few months and you can see there's quite a few steps involved just to you know understanding what the testing was the barracks are actually implementing changes you know where that's code changes or infrastructure changes finally always you actually getting the results and providing feedback on those results mitre now for the next few slides I just kind of work to step into these these elements one by one and give you guys a bit insight in Turkey what happened behind the scenes actually Europe so we started off just by looking at the brain now a lot of

what I'm talking about today and this bit here this is from the mitre website which we are very open about the testing process which is awesome you know they talk about the techniques they're going to use they talk about the boots they're going to focus on and that's awesome because actually gives us a good starting point so we already knew before we went into the testing but cold war strike Empire these are the testing frameworks that we were going to be tested against basically we knew as a PT 3 as well again they're pretty well documented there's a lot of information about how that group operates publicly we also had some previous results so we're in fact that initial cohort versus

rolling a missions thing because we have passed the rolling emissions we're able to see some near results that already were published and you can see here this is carbon black this is cobalt strike you could see net group domain admins and you can see they had telemetry they had enrichment for that as well and they have some screenshots oh they did ok on this one but this actually was just a real help for us because it gave us a starting point prior testing so the next step for us was to jump into the lab right and we have we've got virtual environment basically so we we basically had COBOL strike we had Empire and we have some test VMs

with our endpoint agent install and we can actually run through a whole other different test cases and see what could we see you know do things work as expected you know so here for example if we ran lightly conflict record ball strike you'd end up seeing some IP config telemetry cool that's kind of what we expected but we kind of need to do this for all the test cases and there was a whole load of different things within here there are many many different test cases and we needed to make sure we had telemetry and also some kind of detection or enrichment for every single step there now f-secure we've got basically detection rules in

place and we have automated test cases for each of those detection rules as well as monitoring telemetry as it comes in so this makes sure that essentially everything works as expected right and this is all automated you just call for us but at the same time you know mitre it was kind of a special project right and so we wanted like that additional level of assurance so I kind of went you know the old-fashioned way all right I got Excel oh I started writing down the test cases one by one and I should just going through them checking do we have telemetry do we have some kind of detection in place and this is kind of

again the real-world results list I had that and main reason for this just to re-emphasize as well was that we've got a complex system out to go right you've got the endpoint agent which collects data you've got the kind of middleware which has come out analytics data processing then you've got some front ends that allow you to access the data right so you can see what a complex system so just relying on automation is it's gonna be tricky there to measure end to end if everything works as expected so again doing the old-fashioned way actually worked really well for us but you'll notice though on this sheet here most of it's green which is really good you'll you'll notice

there's also some red right and ran with are things that didn't work as expected or things we just didn't have coverage for at all right this is kind of where I want to talk about on the next few slides so I wanted to start with a simple example Who am I so all you guys know who am i right it just tells you who you are as a logged in user right then simple right in this test case same as before we got some executable information so we saw the process event as before and we also had enum room eyes this is like a tag basically this is what I might have calling them in Richmond

so basically some way to solve give you extra context about what this thing is and what it might be doing right so we had that so it's kind of cool already if you look at 12:31 this test case um it was who am i but this time it I had some extra parameters now before this testing process I I knew whom I did like you know it can list the the the regular user right but I didn't know it had all these other options you know maybe you guys knew maybe you don't know right but this is all school pilots or testing a research process that if you do you know a question mark after Graham ID you can

see there's all these other options you can Center it actually in this might a test they use /all and /fo so far all if you're running you see what's on the screen here so you don't just get the user but you get the synth user you get group information you get privileged information as well it's actually pretty powerful and we'll see how the FO was just output to file a lot so in this instance actually we could again just do a bit of simple research take this extra stuff and build the new signature there's actually just more targeted because Who am I this happens all the time in organizations it's not a very high fidelity signature you can just

look at all who are my instances for example in ER in a security team so actually having something a bit more specific focus on someone looking at /all doing that extra step of enumeration that's a bit more suspicious and again I don't emphasize here this isn't anything too complex I'm not trying to show up here this is just basic stuff but it's kind of showing I guess the research process behind the scenes that we went through here just trying to understand how a tool works and then kind of making sure we had telemetry improving our detection and then just building on that system overall so the next we a1 and so in this case you've got

cobalt strike and there's a bypass u AC module day huh there's based on token duplication running this in the lab environment we have some interesting results hopefully you guys can kind of see that and they see you've got a parent and child process relationship there on the left you got SPC host on the right you've got PowerShell with some encoded commands that if you decode that you end up with this this string here now there's a few interesting things from a detection perspective here right SVC host the PowerShell that's unusual you get me don't see that running organizations SVC host this is like service container so you don't use here loading dll's not so much a spawning new

process especially on power shop and likewise the the decoded commands down here this is a very common like download a cradle you'd see a lot of tools using writes that's not unusual what is unusual lo is that end part than one to seven right because usually you'd expect a cradle to reach out to achieve juice like a remote system why is this reaching out to the local system like well why would if you retrieve anything there you've already got a shell on the box why would you need to run this again so at this stage I could go off and I could use like pokémon and stuff you could reverse there from do more deeper

dynamic analysis that would be one option I just read the COBOL stroke blog instead right it was vide there and you can see on the blog they have like this little description about the technique and it was originally from I think it's James for originally did the technique fuzzy second-year weaponized it and you can see actually the the explanation here this module does run a PowerShell one-liner to run a payload stager so this all can't make sense now like they literally say here we actually do run aphasia and it's this little line of power shop cool it all kind of makes sense and that we've got that additional kind understanding we can actually end right a rule or signature to actually

catch this it's worth mention this is actually quite unusual for COBOL stripe it's actually a very good framework and the operational security is is not only pretty tight um this is one of those few exceptions but it's it's very noisy and so here an offensive guys they don't use this module or use the the runners admin command you can see slightly different but for defendants it's a great easy thing to spot next 112 e1 so this is a really interesting use case that's why actually because this essentially as a PowerShell module to help you enumerate system information users group services antivirus patch levels it doesn't all in one handy PowerShell scripts is pretty cool for for attackers but defenders

though it's quite interesting and especially in mitre you you might be able to see here guys that detection site was gone none for everyone and these are big names they've got little vendors you've got your carbon black she wrote your cyber reasons you got your cows right all none so let's go around here how do all these big guys miss this kind of really noisy activity well if you actually look at the Empire code and you look actually how it works what if first things Empire does when you actually run the the initial payload is it tries to disable pao-chai logging and there's a number of different techniques it uses there I put kind of

one of them just on the board here where they try and bacey set you know enable script block logging to zero and that just stops the logging occurring so if you're a vendor who's relying on PowerShell logs to do your detection well you've just lost the data you can't do detection and that's basically what's happened here and you'll notice if you look at the results in mitre many vendors suffer from the same issue so what do we do you know do we just accept this no course with all right we're defenders we gotta step it up here right so obviously where you start looking at this we start trying a researcher and figure out is there somewhere we can

work around this and it's interesting I pull back just one of the lines out of that batch about script and you can see at the top here get doubled on my object win32 share just a numerator share drive space down below cousins now the interesting bakery obviously this is power shop so we can't we can't drinking with this because we hunk of a lot but what about that first bit well get W on my all day well that's not power shop that's calling W am i right so if you think about is if we can actually get maybe W my date up in summer we may be able to detect this and that is actually

what you can do so you can see here this is actually W on my output from etw for event tracing for Windows this is a date source within Windows you can collect real-time telemetry and here we're putting in W my data in real time and you'll see right in the lower select star from window you share to base he found our command is slightly different what's at the top short but the main point is though we've managed to locate it and this is exactly what we can use for our telemetry for our detection then we can build on top of it so we've kind of moved around the disabling by using a different data source set which i think is pretty cool

now I guess one of the big questions here is how do we do I'm from f-secure we went through this evaluation and you probably just wanna see you know did we win did we lose how do we do well it's kind of cool like at the moment I've got the draft results and you know they're they're kind of interesting but we've also ran NDA with Leiter and this is also haven't been released yet and I'd only get vo by the mighty legal team or the f-secure legal team so I thought she can't talk about our zoster day apologies the good news is though that actually a hold of other vendors have already released the results so I wanted to take a little

look at them today okay ton of a game I've got a graph for that so first up by the way these were all made by vendors themselves so that might explain some this so first that we had CrowdStrike they had a big red bar there was bigger than all the other vendors in prior to detection we had carbon black they went for a blue bar it was bigger than all the other defenders without painful delays so that's quite a good thing we had common black they were saying about how you don't need people and so you just install calm black and inter thought does it work for you subsequently we have cyber reason they didn't really understand what they were

supposed to do here because they made their graph look the same as everyone else's and last thing we had fire I coming in as well same thing they didn't quite have a big enough gap so again I think they could have stepped up their graphing here a little bit but you can see here really I mean these are all based on real data I'd like to emphasize that they're not made anything up here but what they've done is they carefully picked and choose various configurations of different data points to make it seem like maybe they're better than their competitors now when marketing teams get their hands on this data things go to a whole new level right so kind of combine

they alcohol or other EDL solutions or crime strike they were the most effective edr solution because some reason they're the best coverage and you know not to get far away as well you know apparently they were the most effective solution now again you guys can all probably smell the [ __ ] all right they can't all be the best solution right and it's really frustrating for me you know I'm a technical security guy and it's frustrating to see I guess this kind of marketing coming out in 2019 you know all you guys are smart enough to know that this is kind of made-up source up and it's just a sneaky approach to catch sell more products right so how can you

actually interpret the results so forget about the vendors and what they're telling you can you take the day to yourselves and and analyze it absolutely yeah you can remember all this data is publicly available and it's on the website in the kind of Mike's tabular form I'll show some screenshots on today you can also get JSON files and you can also analyze them too but I think there's three really important questions so sort of asked here first is telemetry available and that's a big one if you haven't got telemetry you can't text up you can't catch the bad guys so actually that's the first thing to look at if you're going through all the test

cases make sure they have telemetry option and by the way most vendors do like for most test cases process their network data file data most people have these things so it's actually not not too big a deal of normal and might revaluation it becomes a slightly more soon when you equip more interesting things like W my date for dotnet data which we have secured kind of collect which is pretty cool but all vendors do it's a special activity emphasized in some way so there's a lot of back-and-forth occupy us on the next slide about how the Texans work in my training r1 it's a bit confusing to be honest and so I like to think of it more

in terms of are you kind of drawn to a suspicious fair activity it doesn't it say have to have an alert it's gonna say after at a given or certain amounts of information just as long as you're sort of drawn to it and it's kind of emphasized in some way I think that's what's important to me at least and I also have an additional context and lastly as well his activity correlated for anyone who's done real-world investigations real with operations you'll know how important this last one is being able to tell maybe okay there was this suspicious process execution here and there's this network connection over here but they were related that's real awesome if you can link those two up

because it's a save time that saves effort if the tool doesn't do it you have to manually go right so correlation is not directly measured in the micro valuation but you can cut derive it a little bit based on some of the tainting stuff and also just some the screenshots as well now if you do an analyze those JSON files and Forrester and Josh illumise he's published the script is on github of this link definitely recommend checking out it's a cool script the one kind of cover out with this grip is that I personally don't kind of like the the way the scoring's be done in it and I think my recommendation would be probably to and take the script yourself

and you dig that scoring a little bit and make your work for you and your organization based on some of the things I'm talking about today so I also want to mention just quickly about the sort of tag in from i2 as well so you'll see in round one enrichment general behavior specific behavior and these are the kind of detection types they use and what's unfortunate I guess in round one is that they decided basically how to award these things based on kind of how the data is shown in the UI right you can see this from the results themselves but it's kind of quite confusing to know what's quite going on here and also a

really important point though is that these things have gone in round two so if you're like stressing about it around one like all like should I care about enrichment or specific behaviors which is more important remember I this is the first time I might have ever done this and actually have learned a lot from it and they've changed some stuff based off feedback and based off you know what people have said and samarium has said you've got general tactic in technique which are more based on the types of information you get with the alert as opposed to how it's shown in the UI is it just a tag or is it a full alert one

last thing as well about delays in painting and a lot of people like put-down delays and say if there's delays you can't use that tool or whatever I disagree personally again look at the the time to discover for breaches it's still at weeks to months if a tool takes maybe a heart an extra half hour an hour I honestly don't think that's actually that big an issue in context and likewise we're tainting modifiers that you can look at these as good or bad you know tainting into correlation which is a good thing but also if Katie was required to detect something it can be a bad thing so it depends how you want to analyze results

here paintings almost may be a neutral ones how you look at it so one of those interesting things when you actually look at the results and I'm almost kind of Ellen was all photo um it's actually when you compare how different vendors did when you've got easy stuff here's SC query I know it doesn't come out very well here but you can see prim at everyone when you're a new making services on loco baat everyone had this date - everyone collects process data and everyone actually had I think detection czar some kind of enrichment for lid because it's pretty obvious enumeration and it's interesting cuz likewise for hard to use cases that are focused on Windows API activity this was

Cole ball strike and they have a direct Windows API call to enumerate running processes right and you'll see that again no one really detected I think Palo Alto down the end they had an enrichment for it but in general no one really did so it's interesting because I think no these like these results I guess show that many vendors are very similar they have the same theta sets they have the same types of detection and actually I think if you look at the top five vendors and I'm not gonna name names today but some of the major vendors out there who a lot of people use I've argued that very very similar in all honesty and that's actually what

it might be historical Hoosiers on Kali where today is that I think a lot of the EDI vendors out there right now are very similar especially if you're talking again the top 500 name feel free to maybe ask me afterwards um but I think you're the big thing to take as well for round one is like I said before it was the first time my trip ever done this it was a great first step and ash had some really good results provides great visibility but it's not perfect and in particular there are things like noise for example they're just unaccounted for at all any real world like Blue team owners or detection guys in the car will know noises like

the single biggest issue right if there's too much noise you just can't see the data which matters likewise workflow response these things just aren't even you know accounted for in in round one at all again in the real world these are huge things that's super important if you can't retrieve afar off a box to do an investigation I really like blocks your investigation makes life a lot harder coverage as well it kind of falls outside like the actual EDR assessment but again real-world what's going to recur screw up your detection approach as an organization is you know maybe you forgot to install the ER in this part the network or you can't for whatever reason and likewise I think people for

me I always talk about people when I present I think people are huge aspect you know all you guys new audience myself like we are the ones who are driving these tool we're the ones who are leading and only in the investigation is actually determining whether you know an investigation succeeds or not it's all down to those people and then making sure that they're in charge here you know one of the other things I want to mention as well was just an game you know they put together a really good blog post and actually I think point does a bad job with some of things in this talk you important the round went into kind of context you know

there's a lot of limitations to round one like I said I love it and I think they might have done a very good job but like I said there's just a few limitations there you've got to kind of be aware of I think a lot of people have started going down this path of being very dogmatic that if you're not in the might revaluation or you don't have a certain score in might revaluation I don't want to use your product and again I think that's that's going maybe a little bit too far I think you know we go take things in context here you know user results but kind of use them in a in a clever way right so the last thing

I guess I'd like to leave you guys with today was was round two so like I said round one as literally just come to end I'm hoping our results will be out in the next few weeks and round two is actually starting I said tail end of this year and they're switching things up so apt 3 was the first group in round one this time round is a t29 there's me some crossover round one to powershell WMI and i'm sure some of the same legitimate you know wineries and windows will be abused as well but also you lose that bit there this is custom compiled binaries that's gonna be a tricky part for a lot of endless because compiled

binaries needs direct windows api calls if there's direct windows api calls it means that you won't see anything in process there and network data file data very often depending how the tool is actually doing its data collection right so i'm really excited to see how things kind of pan out hopefully was also be out around one but I'm kind of making this up to ask my deputies out there I'm on but yeah that's it for me hopefully you guys have found useful we do have a few minutes the questions if anybody has any

by now thanks very much for that um one thing that you mentioned quite a bit was correlation of data from different types of sources so I work a lot in the industrial cyber side one of the things that we're quite interested in is is bringing data in from industrial plants to correlate and Co analyze with some of the IT security information have you seen any any interest in that with the sort of mitre work that you've done so yeah I mean like correlation of data and also just taking in data from a variety of sources is it's super important right the big issue I guess for it while you're saying there is that this talk

and mitre in general and media in general is focused on the M point so if I've got agent on this Windows endpoint that's awesome I've got the data I can see all the things if I've got you know a machine over here that's you know maybe processing oil or whatever I can't install EDR agent maybe on that who knows maybe you could that's the challenge I think is that right now things have gone better with EDR cuz there's a lot of data that's very valuable there but if you start will know maybe pull in ICS date or maybe cloud service data and you want start hanging all together oh that's gotta go like complex fast and actually it'll

have a lot of overhead to your team so it's it's something that we aim for but be aware is it's gonna take a bit of effort to go there you know yeah and it's already planning rain to to deal with noise is speak up a little sorry is there any plan in rain to to deal with noise because it's fantastic that you're able to detect Parrish a Lakota command or whatever but if you detect that and it triggers everywhere then it's totally useless data to me is that is that gonna be included just is is noise going to be included in right - that's what I'm asking sorry along with is noise the noisiness of the detections at how frequently the

trigger is that going to be in range view of detections so yeah so I mean noise as a huge thing so like I said I be doing detection for years and lawyers is one of the biggest issues we encounter because in many ways detection is a big data problem and you can use humans to look in a smart way and find the bad stuff you've used rules to speed up you can use some ml to try and find their knowledge right but the whole point is you just searching through a sea of noise and what's interesting actually great thing about the future here and maybe answer your question is that attackers over time I dig in deeper

in Stanley's day oh what's the word they're trying to blend in and I guess more and more right so we've seen that happen is inching the pull previously you know they talk about binaries right I don't really see attack as using binaries all that much into the future they use dll's right Castillo's you load into the gym of processes so straightaway you when you're trying to you know analyze process executions when you can't it's a dll running in a thread in there already you know legitimate process so again trying to blend into that noise that's the me attackers will do and obviously that partial example super noisy you're right easily spot that but likewise we've seen

attackers move away from PowerShell to.net and probably into other things to evade that kind of simplistic detection there Oh so round two I don't know that's a good question I don't think it is so that's actually one of the biggest drawbacks around - I know why it wasn't round 1 it will be around - as far as I know the virtual lab environment they're using I don't at all I know there's not going to be saw simulated noise in there so yeah unfortunately it won't solve that big gotcha oh thanks again Alex though you're out of time but you're sticking around right that's right just so I think we actually have a 20 minute break

so if you want to ask questions directly to Alex while he's still here or later on see in about 20 minutes