Effective Adversary Emulation

Name: Effective Adversary Emulation
Uploaded: 2023-10-01
Duration: 32 min 33 s
Description: So you've built an amazing suite of security tools that provide defense in depth. But, have you actually tested them? This talk describes an effective method for adversary emulation designed for small and medium sized teams. Learn how to build a plan, execute it safely, and how to evaluate the resul

Bsides CT · 202332:33271 viewsPublished 2023-10Watch on YouTube ↗

Speakers

Jeremy Mill

Tags

CategoryTechnical

TeamRed

StyleTalk

Mentioned in this talk

Frameworks

Covenant Sliver

About this talk

So you've built an amazing suite of security tools that provide defense in depth. But, have you actually tested them? This talk describes an effective method for adversary emulation designed for small and medium sized teams. Learn how to build a plan, execute it safely, and how to evaluate the results. Jeremy also gave a talk at BSides CT 2019 on Reversing and Bypassing DRM/HSM Dongles - https://www.youtube.com/watch?v=Jggq-PvJsfI

Show transcript [en]

last but definitely not least like to welcome back Jeremy mill again uh he was uh a speaker in one of our previous bsides and uh he's going to be speaking on effective adversary emulation thank you Jeremy awesome yeah thank you yeah so I guess we're we're bookending people who started uh speaking at besides 2019 right so you know bringing it back uh from where we started uh yeah so I'm doing effective adversary emulation right this is specifically really aimed at like small mediumsized businesses this is not aimed at like I'm a Fortune 500 and I have a you know a red team with 20 members right this is definitely much more focused on teams that are running uh much more lean who

am I right I'm Jeremy uh you know where is that documented Mill uh you can find me on Mastadon on GitHub my background I started in military Sig I was a CIS admin then I was a full stack software developer then I was in napsec then I was in iot pen testing and then I ran a security team at puppet ran a CNA uh and what I'm doing now is I managed the security team at varrow bank right so uh you know what what it is that makes me tick right I enjoy making bad guys frustrated right that's that's kind of my goal that's what I really like it's one of the reasons I like working in uh

fintech because there's a lot of real bad guys who are actively trying to steal people's money all the time why this talk right most of the people in this room don't test like you might say that you do you might tell your Regulators that you do if you have them you might say in your sock to like yeah we did a pen test but that's not the same thing as actually rigorously testing when we do test we suck at it like we're genuinely not good right like when when we dig into why tests succeed or fail and root cows analysis and rigorous testing we're bad and we suck at it because a lot of these testing resources

that exist for people in those small and mediumsized teams the best majority of organizations right they're way too complicated right one of the reasons they're too comp is because they're focused on emulating attackers exactly right they're talking like oh we're going to emulate fin 7 down to exactly what C2 they're using down to exactly how they delivered it exactly how they moved laterally like that doesn't even make sense because when they attack your org they're going to do something different right so exact adversary emulation especially the stuff you know being released by miter it it doesn't make sense right or we treat our test as pass fail particularly looking at everybody in this room who did their

yearly pen test didn't get domain admin owned and said yay we've succeeded and moved on with their FES right that's not good good enough right we have to do better the last thing that we do really often is we test tools individually right we might just test our EDR and we're like cool we caught something right our EDR is working successfully right but that's not how we deploy things right it's not even the model that we think of because we talk about things like defense depth like why test it all right so we talked about defense and depth we've got a network security tool right we've got you know a bunch of Palo alos running

we've got a cloud flare w m AWS W fastly something right controlling Ingress uh we've got zero trust right deployed around our Network we've got a fancy ered or an xdr we've got a Sim and we're aggregating all of our logs from all of our systems we've got a weo tool I don't know what that is but every single one of you wouldn't be surprised if somebody outside was selling it right like it just wouldn't be right we need to know that they work together and that they actually work right because we're expecting them to defend in depth but if we don't test them that way we have absolutely no idea that that's the case

right and we need to do it regularly because the ttps that attackers were using a month ago are not the attacks that they're using today but we just learned a ton of stuff at MGM right we have a whole bunch of new ttps that we're concerned about a year ago right would anybody here have been concerned about like a deep faked audio call to their help desk like maybe right but like you might be the only one in the room right and that's awesome but like the vast majority of us might not have even been on our radar or something that was achievable for the attackers that we were facing but like really why is because

like every time I've done this and I've done this in several different ORS I've done this in several different contexts I have been shocked not like a little shocked like uh oh we have to make immediate changes right now shocked right by basic changes that reflect the attacks that we're facing today and it's better to find out right now right like the best time to test was like yesterday the next best time is now and one of the reasons is that if we don't and we don't find out where we're not doing well we might never know because all of our stuff's encrypted right it's been ransomed right we're going to find out that our backups fail

because we also didn't test our backups right and then we're really screwed so I've convinced you right well with my seven step solution you two can test right we're GNA scope we're going to do some research through our ttps we're going to design an effective test we're gonna weigh the elements of that test so that this is something that actually has some rigor in some way for us to have a feedback loop we're going to talk about executing the test right and scoring it against those weights we're going to take the results from that turn it into some kind of plan we're going to talk about actually like doing something with it because just putting it on the backlog isn't going to

save you so scope this is an objectively incredibly simplified version of what it is that you might have right so you're a company that produces some software that you run as a sass right so you've got like the user internet you've got email that comes in maybe those two should actually be on top of each other right sort it doesn't matter for the purposes of this you've got a bunch of endpoints right those endpoints are running everything from software developers to your accountants right to the administrative assistants who help schedule your CEO right they're talking to some kind of user off system OCTA uh Azure ad or entra or whatever they change their name to avoid lawsuits um right those

developers talk to a cicd system or to the cloud provider right because you're you're hosting some stuff and then you've got uh Cloud egress right uh from your Ci or from from kubernetes or your Ingress right coming in through that wack that we talked about for those SAS Solutions if we try to test all of it if we try to build one test that tests all of the security controls that are in the scope of that right you'll be testing for the next two months you'll probably be testing so long that the results don't matter and even if they do your backlog is going to explode and it's useless right it's just too big it make

doesn't make sense right it's too large of a timeline especially if you don't have that really large security team so what we might want to do is break it up into hey we're going to test our Cloud infrastructure right we're going to emulate something like a log forj Style attack where one of our servers or container that's running in kubernetes is the one that gets popped right can we detect that exploit coming in can we detect C2 coms going out right entirely in the scope and context of our Cloud infrastructure or we might want to do the opposite right we might want to test that malvertising type situation right starting just from like a user clicks on

an exe right and where do we go from there and what do we detect right scoping is really really important to this right too big is a huge problem too small and we lose too many of the tools that we're talking about from our security program to effectively evaluate our defense and depth it's organization specific right and you'll see me hit on this a few times right but like everything you do you should know where your crown jewels are what is it that your business has that's important right is it a giant database of pii right is it your customer's money is it your customer's API tokens because it would let you know a compromise of your

system get into theirs what are your crown jewels right spend your time on those things because those are vastly more important than uh anything else that you can do the next step is to evaluate your ttps right and you're G to have to decide what kind of hacker are you you know what I mean maybe you're the uh you know typical black hat uh you know black hoodie style hacker maybe you're this guy right I can only assume this is the guy who hacked MGM right just standed way too close to screens uh or maybe you're working I I can only assume for the NSA uh because you got a much nicer desk and a lot more

funding than everyone else right the choice of the ttps that you pick in this step are the result of your Scopes right you're not going to be researching you know ISO control plane uh bypasses if you're testing your user email exploits right you're going to spend your time researching how it is that the various attackers that attack organizations like yours work for the scope of your test right again coming back to those crown jewels right aiming it around that critical information in those critical people and processes where they live right and for this step you're keeping it high level right you're just doing General research for me step two never really finishes right I kind of just

watch this all the time I have a a notebook that's literally just filled with like oh that's a cool GitHub repo or a particular threat Intel report that told me like hey this is uh this is new right this is interesting right or I've seen this you know over five reports now of some dfir people talking about it maybe that's something I want to pay attention to and revisit when I do this because I'm going to take all those right I have all those ttps and I'm going to bring them into a plan right and I'm G to use something like miter attack to help keep myself organized right around like how do they do Recon

right how are they doing initial execution right is it a a word dock with a link inside of it or is it a macro is it an encrypted zip file right we however it is right I'm going to use that to help build this into a cohesive plan but you have to be able to execute it right we talked one of the reasons people don't test is because stuff gets too complicated right so we're going to only use stuff that we can find right and that we can use so we're going to use a lot of open- source tools right for things like EDR evasion right if for some of the basics that we're going to

try and do right and this step includes actually getting those things and building it right that's a really important part of this right if you build this wonderful plan but you can't pull it off right or you spend all of your time on stack Overflow because it didn't work the way that you thought it was going to then this test also didn't work right so make sure that that's a key part of this right so example of a you know High Lev design right we got a developer on Windows he's got local admin privileges and he downloads a back door mpm package something that is way not out of the ordinary right uh if you're paying attention at all that

package is going to download a Stager from some some place maybe from GitHub maybe from pastebin right somewhere uh it's going to persist with a scheduled task right that Stager is going to download some sliver Shell Code and execute it right we're going to escalate with a bring your own vulnerable driver attack we're GNA use cursed Chrome to steal an OCTA cookie out of a session and then we're going to use that stolen cookie from an alternate location to log into the cloud provider right and that's where our scope stops right so we've got a tightly bound scope right a limited number of tools that we need to bring into our engagement right this is something that we can actually

pull off right with the right number of engineer hours right same thing here uh I like to try and at least draw some pictures through this one of the reasons is that uh I also have a boss that I need to go and tell them what I'm about to do right because one of the people I test through this is you know if you have an mssp right hey they should probably alert on this also they're supposed to be watching if they don't you get to have a fun and really difficult conversation right but one that's important to have now under controlled circumstances and you're also probably going to need to like tell someone from the SE Suite that you're

doing it because they really don't like surprises about critical level alerts especially if those Alerts get misrouted right same thing right build yourself a nice flow diagram right as part of this design so you know exactly what you're doing right and you're covering all of the steps through it this is where the rigor starts to come in it's in step four right we've got an excellent design that's cool we've got a plan but not everything is as important as the other ones right the event that we see for a new scheduled task right new schedule tasks happen all the time we're not gonna that's not a critical alert right most of the time it might be something that we want the sock

to take a look at right if if they're there but if they're really busy they're not gonna right that's way important than sliver actually running in the C2 coms and it performing actions on the computer right way way more important we need to treat them that way right so we're going to break down every one of those steps that we have into a tuple of what happened in the tool that should have detected it and we're going to assign a weight to that why because that's what we're going to use to prioritize based on the results of how well we did right and it's really basic form right this is just an overall weight this is did the EDR detect the

new executable it's never seen running right that's it EDR new executable run in a more advanced form as you start to get better at this that can break out a little bit more now it's did the EDR see it okay well it's a Stager it's malicious code right heck it's malicious code we wrote for the purpose of this test did the EDR see it and alert on it right did we get an actual alert that got raised to somebody who has to actually go and click through and find it and did our automation do its job did we automatically stop it did we kill it right did we isolate this box because hey this was actually something

really delicious right we can get more advanced with this as we go so here's an example of that again just looking at that basic way of doing it right you know the dev downloading the npm package we want to be able to see that firewall it's not super important we're GNA give it a two right uh the attacker uses the OCTA cookie from Romania right boy if all of the sudden my user travels from San Francisco to Romania in the span of five minutes and starts logging in and Performing actions I really really want to know and my tools better detected right that's like uh oh emergency levels of action that my team should be taking

right now that's a 10 right what that weight scale is is subjective right I use 0 to 10 I know other uh teams after I've left have adjusted this to zero to five they find that better it sort of doesn't matter right as long as you're consistent cool so we got a test plan right we have all our tools right it's time to do it right this is the super fun part uh especially if you work on the blue or sort of purpley side of the house right you get to actually like be a hacker for a little while and that's really fun um as we do it right we stand up our C2 server we go ahead and we send

our our email with a link to an executable right or you know we we have that simulated user run npm install with the malicious package we're going to mark down its weight with how effective it is right blocked and alerted yay you get all the marks for that one right logged in logged only it didn't block you know we really wanted to block that one right we're only going to give a one out of six here's a really important difference between this and a pent test right if during somewhere of this exploit chain right of our test we succeed which is to say that our EDR blocks it right it just got to keep coming back to it when it's an easy one

right doesn't mean we're done that isn't it we didn't win right that's only one of the things that we're testing today right what we need to do isk Market is a success right which is to say we give it full marks we're going to allow list it in the EDR pretending that it didn't and we're going to continue with the test because we weren't just trying to test the email Gateway and the EDR we were also trying to test the Sim and OCTA and can we detect C2 Communications going outwards in our firewall right if we stop the test there we don't get to test any of that other stuff right we also want to only allow minor

debut right it's really tempting to be like ah no but what what if we did it this way instead right don't do that because one of the things we need to be able to do is repeat the test that's the reason why there's rigor in it so if we make changes we can evaluate whether or not we were successful right and any changes that we make in the middle of testing guess what we're not going to write it down no matter how much we say we're going to or how careful we are to try right it's also not what we got approval from our SE Suite to go and do right so only allow minor deviations

right which is to say like all right we did some encoding we're going to go from R 13 to exor go for it we're going to change our entire C2 framework and change the way that we're deploying it probably not right not for this one then we have to plan right because unfortunately the fun part's over right the part that at least I I find really entertaining um but it's the part that like gets us paid which is great I'm a huge fan of that part of this step um what we're going to do right is we're going to take that score right all the scores that we have and we're going to compare them to the weights and we're

going to calculate Weight score Deltas right and we're going to perform some root cause analysis and determine why certain steps failed right what do we need to do to make it not fail right and we're going to use those Deltas in order to prioritize it because if it was a 10. 10 weight task and it only scored a one we have a Delta of nine and that is our priority right prioritization is done our backlog grooming is in a really good place right and we're going to go ahead and turn that into tickets so we track it so it doesn't stay inside of a single Excel file on a share drive somewhere right we're going to actually put it

where it needs to go maybe we'll even have an epic if we're feeling fancy so we can track whether or not it got done right this is that same one I showed you before but with some hypothetical calculated Deltas right I tend to use a low medium high critical scale right because that's how my J boards set up right um you know anything that's highlighted in green there right that's going to be informational maybe there's something there right maybe we can tighten that up maybe because it's super low hanging fruits real quick right cool we can pull that in at the end of a Sprint with some extra uh time that we have right but what I need to be doing

is looking at OCTA and why it didn't log or why that log didn't end its way up into the Sim why that did didn't fire and why my mssp never called me about it right we got all of those done by just being rigorous with our test definitions and then the last thing is um I know what all of your backlogs look like I know because like we're all in the same boat right we all know that we don't have as many people as we want nor do we have as much time as we want and everything's moving super fast right but like we have to use that prioritization and we have to fix it right um and then

lastly right just making a change isn't the same thing is fixing something if you just make a change and you don't rerun the test which goes back to why we only allow minor deviations we have absolutely no idea if we fixed it we just kind of made a change in hoped right that needs to be inside of our definition of done for these tickets one of the things that I find really useful um just on my soap box like while I'm talking about number seven uh and number six uh around the planning stages is measuring how much of your team's time is focused on proactive work versus reactive work and having every ticket have that on there because

if more than half your time is spent on reactive stuff you have a great measurement for you to bring to the reste management and go we're falling behind we won't ever catch up because we're spending more time on this something needs to change versus this this is proactive work this isn't in result to a failure right you're not doing an active investigation incident response you're changing things before something terrible happens and that is always better for your team to be working on and then last like do it again right pick a different scope right move somewhere in the middle maybe we tested purple last time when we have a hypothetical green box between uh end

points to our CI system in a little bit in k8s right can we attack artifactory and get a back door package in there right if there's a an attacker who has decided or if there's a developer who's decided to break bad or something like that right continually redo this process to feed back in and make your security program a better and stronger Place let's say I've convinced you and you're like yeah I'm gonna do this right first of all awesome right I'm stoked right you're gonna do your first one start small and I mean like take what you think is a small test and then make it smaller and then probably reduce your scope a little bit right because like I

said every single time I've done this I've been shocked you're probably going to have a little bit more work cut out for you than you think you're going to have and there's no penalty to starting too small the first time there is a penalty to starting too big so if you're going to fail in One Direction absolutely fail on the side that lets you succeed and have actions coming off of it don't get too fancy right there's a lot of fancy stuff there's a lot of fancy stuff that attackers do in the real world and that's like super cool right like process hollowing module stomping Dynamic call Stacks all that stuff is awesome like I love nerding out

about it I love writing it I love doing so much with it but like chances are you don't need it because the basics work really well right by the basics I mean like netcat reverse SSH python reverse shells when you're on a Mac OS system right uh bass sliver uh executables that are compiled by just Ren and generate and like you know emailing it to a user the basics work way better than you think they do and that's why we're doing this right so like a good first test right is even scoped down from what we covered during our the design phase I introduced with right an email with a link to an AE that a user downloads

execute it persists and it steals some files that's it that can all happened with nothing but one sliver C2 server and an email right all of those steps can happen inside of there and I bet you that will cover three or four of your tools and give you actionable items to use as you do your next you know annual quarterly planning for me right these are taken from my my most recent test because um I find a lot of people talk about stuff like this and they don't actually talk about how hard it is and how much time it takes and that's a disservice because otherwise you're kind of left in the dark and guessing for yourself and

that's not fair right you should learn from experience um I use mostly open source tools collection of those tools took roughly three to six engineer hours right so that's one day but in truth that's split over a long time like I said I got that notebook um I love to lurk in places like the blood hamang red team channel right uh that's a slack sorry that word is missing from there but um there slack channel is open the red team channel is great uh the Covenant C2 folks in there the bishop Fox uh sliver folks are in there there's a bunch of just people doing the job and it's worth looking to them because they're also learning from the bad guys

like some other uh speakers have talked about I wrote some tools because uh that tickles the fun part of my brain and I enjoy it right so I wrote a Stager in Rust um that at least for Windows Defender and EDR it completely avoids it it was blocked by some of the other tools that we have which is great but all of these things are sort of Hit or Miss um and I used a sample backward uh python library right so this is just something that has you know that downloads that Stager uh and then is used to call it Sliver infro setup the first time it took roughly six hours I had no idea what I was doing right now I

could do it in an hour probably less I've actually terraformed most of it so I can literally just click go and I've got all my test infrastructure up and ready to go for some the common tools that I use uh testing itself if I ran this test myself I'd probably spend about five hours making sure that I do it thoroughly right and in allowing myself time to make some minor deviations uh with two Engineers that doesn't get better that gets worse uh as I'm sure all of you know but right I've been doing this a hot minute I've got both senior and junior level engineers in my te right that extra three hours or even 16 hours are worth it because I've

spread that knowledge to them I've made them better at what they do I'm G to keep just losing my screen uh and then I've got some resources here right around some stuff right so sliver and Covenant are are places that I tools that I really like to use um the real recommendation for all of these is whatever bad guys are using right going back to our our TTP recommendations uh apples script fishing is super easy because you can literally just pop up a window and users love to type their passwords into stuff that pops up uh cursed Chrome and cursed Edge are awesome because everything is a sass tool nowadays so you might as well just

steal some cookies and become them um and then persistence again whatever they're doing these days because I'm not keeping up with the new fancy stuff I know new Mac OS persistence was dropped on Twitter or X or whatever it is now uh yesterday right so my next test may give that a try and I find out whether or not my EDR is keeping up so anyway that is that's my uh my team's methodology to add adversary emulation right it doesn't try and get too fancy around a single thread actor right it tries to be much more rigorous around our definitions of how important things are and uh you know we use that on a regular Loop to improve our

security process and find out stuff now before bad guys do uh any

questions yeah what software do you use for your design uh stage Excel ex because I'm a manager and that's where 90% of my time lives now I wish there was a better answer to that question if somebody has a better one I'm all yours hit me up we'll talk about oh the pictures I use draw. iio uh because it's free and hooks up to Google workspace with a good privacy policy and it hooks up into a lan so if you're in that world it's easy yeah so you've you've developed these test cases you've got tests that you've executed yep do you have a method for Rolling that back into your pipelines through automation do you have

tools that you use to integrate that kind of stuff we have a lot of automation tools right so places where uh you we we try to in my team personally we try to get up as many things we can right so but largely what we're talking about the feedback from this is changes to existing tools um so you know that might take the form of you know a pull request or a merge request that's going to end up inside and changing the config system those toolings or it's just going to be a j ticket and an engineer physically picking it up and and going and making the changes in its most extreme form this is being able to point

to this tool actually really sucks we can't continue to pay this tool right and this is good excuse to um you know go and purchase something else I guess do you manually do you continuously manually test these test cases this test or you automate yeah the suite yeah this this test in particular right this methodology is very manual right and something that is uh because this is what I consider like an integration test um I think it's a really interesting idea to automate some of this and I think there's some vendors that promise to do that right but I don't have a good way right now to uh you I can't think of a way right now that would be worth the

time to automate doing this in any kind of basis uh routine basis you know weekly or monthly what what are some tests you would run on a network that's supposedly blocking out all outbound requests it's a great question question was what would I do if it was a network that's supposedly blocking all outbound connections right um I would probably test that different right if my premise is is a host right on this network actually blocking all outbound right I would probably design the test differently by looking through every protocol and every situation that I can right so what I mean by that is like is DNS actually being blocked right is DNS like Upstream resolution being blocked right are there

different resolvers that work differently is icmp actually being blocked right I would design it I would design it differently um than this sort of Enterprise you know tool stack test but it's a really interesting question I'd love to go and design it and actually do it I've never had that actual premise uh it'd be fun yeah awesome well if you have any more questions for me I'm around I'm going to step over the after party thank you all I appreciate it

Effective Adversary Emulation

Related talks