Get in the Box: Containerizing Red Team Infrastructure

Name: Get in the Box: Containerizing Red Team Infrastructure
Uploaded: 2024-01-31
Duration: 51 min 43 s
Description: Dan Astor Jonn Callahan GET IN THE BOX: Containerizing Red Team Infrastructure Red Team engagements often require complex and secure infrastructure which facilitate the routing and traffic for command and control servers, phishing frameworks, and other external services. Much of this gets deployed t

BSides Philly · 202351:4394 viewsPublished 2024-01Watch on YouTube ↗

Speakers

Dan Astor John Callahan

Tags

CategoryTechnical

TeamRed

StyleTalk

About this talk

Dan Astor Jonn Callahan GET IN THE BOX: Containerizing Red Team Infrastructure Red Team engagements often require complex and secure infrastructure which facilitate the routing and traffic for command and control servers, phishing frameworks, and other external services. Much of this gets deployed to many single-use systems for various steps or phases of the engagement, often relying on Ansible and Terraform for deployments. While this works, it can be cumbersome to maintain, modify, and is difficult to use by inexperienced team members. In response to this, SRA began building out a deployment strategy that instead relied on Terraform and Docker Swarm. The current iteration of this model not only makes deployments far more turnkey, but also reduces mistakes and infrastructure deployment time from days to minutes. Bsides Philly 2023

Show transcript [en]

good we're good we're good we're good all right can we make this out further or [Music] no all right um Welcome to our talk get in the Box too loud uh this is going to be about containerizing uh red team infrastructure um so briefly real quick um we're going to do a quick introduction um we'll talk about some common industry practices uh specifically around red team infrastructure we're going to talk about red swarm which is uh creation that we have uh kind of automating and building out the red team infrastructure uh we'll talk about uh terraform anible and Docker swarm we'll hit on traffic and routing which is what we're using on the back end um we'll

discuss monitoring when it comes comes to your infrastructure and then we'll touch on a piece we're calling sojourn and have some closing thoughts and any [Music] questions wow fun um so I'm Dan aser I'm a principal scientist at security risk advisors my background is in pentesting and red teaming I've been doing that for the past 10 years uh I enjoy breaking things and uh I do like metal but I can't compete with John and hi my name is John Callahan um I have only been doing teaming for a couple years I primarily come from an appet Cloud security containerization security background um so I worked a lot with like engineering uh side of the house more so than the offensive side of

the house so that kind of funded a lot of um where my thought process is coming from on a lot of this um and yes a big metal head you want to talk about it we can come afterwards I love metal um can everybody hear us all right we good okay cool thank you cool um so just from a show of hands uh how many folks in here are do red teaming uh deal with red team inform infrastructure at all okay good good four people awesome um so for the rest of you um just level setting some terminology that we'll kind of discuss throughout this um the first part is a C2 server um so t uh

C2 server is something that's pretty crucial for red teams it's what we use for command and control that's what the the C2 means it's basically uh a system or a platform that allows you to either send or receive communication to some type of compromised endpoint um a C2 channel is the mechanism that's used to relay that information back and forth so something maybe it's going to call out over HTTP https um websockets uh DNS you name it it's the the means for how it's communicating across the network uh the next thing is an implant this is something that we'll talk about too um an implant is the piece of code that gets executed on an endpoint uh to

communicate over a C2 channel to our C2 server um and then the last thing uh this isn't something that's typically well known but a redirector um you never really want to have your infrastructure just sitting out on the internet you never want to have a C2 server just exposed um there's tons of bots that just scan for that and then publicly shame you you end up in some ban lists things like that so you always want to protect yourself and one of the ways you can do this is through the use of a redirector a redirector is basically uh you could think of it as a reverse proxy that sits in front of your infrastructure um and it will you know

do whatever you want but you're never exposing your C2 server directly you have some type of intermediary in between to filter and Route the traffic um so when we look at Red Team infrastructure um this is a pretty pretty well-known example deployment where you've got um your target environment on the the right hand side here um this is going to be your compromised endpoints the users that you're targeting um there's a lot of red team infrastructure that's kind of through the use of single use systems so they'll have multiple redirectors set up the you'll have multiple Standalone whether it's a fishing server um some type of C2 server uh a web server and so at the end of this you may end up having

somewhere between 10 or 15 Standalone systems out there that's just being used as part of your infrastructure um another example of this is where you'd have your C2 servers um where you've got multiple redirectors in front of it you want to have different domains uh different providers that you're using for routing um so this is just another example of uh but again there's still a lot of Standalone systems here where you can kind of count there's there's a handful of just different systems that are being used um where it's not necessarily efficient um several years ago uh kind of when terraform started getting pretty hot um not as hot as on the the devop side but when it started getting hot

with red teaming um a lot of people started going going in on terraform and using it as a means to deploy infrastructure uh rapidly uh and then so put together some type of um terraform plan that would spin up all of these different systems um then they're going to use something like an anable to go through and actually configure the applications that are running on it setting up something like engine X for a redirector or deploying Cobalt strike and configuring that for your environment so um they'll kind of go through that perform the engagement and at the end of the engagement run terraform destroy to tear down the infrastructure so um this is kind of the

previous deployment strategy that's been talked pretty widely about um on the red team infrastructure side but we can do better and some of the some of the issues that we have with this type of infrastructure deployment is um because you're deploying a lot of single use systems um there's a cost that's associated with each of those systems and so if you're standing these up for a long period of time you know several months for an engagement there's a cost associated with that you either bill it to the client or you accept it as part of your operating costs um you also have to deal with the maintenance that's associated with each of these servers you've got got you know individual

servers that are out there you have to deal with configuring updating maintaining making sure that each individual system isn't exposed in some way um if you're going to work with anible you have to go through and do a lot of os level configurations um whether or not you're having golden images being deployed as part of your terraform deployment you may have kind of long build times where that repeated configuration is applying on every single endpoint which can add to time there's some inflexibility when it comes to the architecture um you're kind of stuck with individual systems how do you route them things like that um and then you still have to deal with the certificate and domain management aspect

of it so um whether you're using anable for that or some other automated strategy for configuring certificates and and managing them minting them keeping them updated um they're all kind of things that you have to think about um when we're going to talk about here there are some strategies that you can use and and newer software that you can use to make this a lot easier and none of this would be possible without calling out some notable work um when it comes to Red Team infrastructure uh the original uh red team infrastructure Wiki from blue team of Jeff was kind of the the go-to standard for how to deploy stuff and and how to structure everything um there's

some pretty good posts about using terraform for uh deployments along with anible uh there was a pretty interesting project called Red Baron that came out for deploying uh all of your infrastructure using terraform which is a pretty interesting project um but not a lot on the container side uh there is a project that came out recently called warhorse that does use containers um however it's still a lot of single U systems where it's using anible to put a container on there and so you're still spinning up a lot of instances building on the endpoint um and then the one thing that I will say uh bite beder had a pretty interesting wiki page or uh a

pretty interesting blog um where it actually talked about using things like Docker swarm and it was kind of the first time that I had ever come across it um where he's kind of looking at some of the CIA strategies From the Vault 7 leaks and looking at how they deploy infrastructure and kind of showing how things can be done differently cool um so with all that that's the history of kind of how things have taken shape over the last five years or so in terms of popular red teing methodology um so oops how did that happen um so with that I kind of came came through this again with more of an engineering background working very much

on the modern side of the fence of you know containerization is everything container orchestration is everything um so with that I worked into um it's just a brief refresher for containers really the big takeaway from this here is that it allows you to run things in isolation um that means if you want to run multiple things on one server you don't have to worry too much about them interfering with each other and that is a huge Boon when you're trying to deploy a bunch of things that might try to uh fight over safe ports if you want to run multiple things listening a port 443 on one server it's a pain in the ass using

containers it's a lot easier uh at least if you got a load balancer in front of it or something like that um and so with that in mind uh the way I was kind of thinking about this is again from an engineering perspective where you're trying to separate your infrastructure from your services so your infrastructure is your your Hardware layer or more often than not your virtual Hardware layer so your ec2 instances versus your uh bare metal boxes um it's things that very rarely change uh you you're not adding more servers you're not removing them all that much they're a fixed asset in a lot of way um their configuration is as well it's very fixed once you stand up a

server you're not going to really want to change it if you need to increase your disc size you're going to have a rough 30 minutes hour trying to deal with that it's just a pain um services on the other side your application layer uh is far more flexible and it's constantly changing um so I'm constantly working on these application layer things I'm making changes I'm improving them I'm fixing bugs so I have a very very rapid development cycle and I don't want the hardware to get in the way of my application deployments I don't want to have to have those things coupled together so I want to keep them as decoupled as possible um and this is really where

things where container orchestration comes in so on the left side with our infrastructure uh our goal is to deploy it with terraform and then configure it with anible um our anible stuff is very very limited um back in the day about a year ago when I first started doing this I was very very hard about saying no anible I want nothing to do with it I hate it I hate working with it it's a pain in my ass um I've since seen the light and there is a time and place for it um in this case what we're using anible for is for once we get our ec2 instances online we're using it to configure our Docker swarm so that's

what we're using for our our containerization engine container orchestration engine and on the right we have our services so the actual uh C2 Frameworks the fishing endpoints the um the Litany of things that we may have running inside our uh inside our setup uh packaged as Docker containers and then deployed within our Docker swarm um you'll see Docker swarm and Docker compose kind of used interchangeably throughout uh if you're not very familiar Docker compose the terminology is terrible Docker compos is the actual file itself it's the Manifest um it's also the name of the CLI tool you can use to stand up Docker compos files locally Docker swarm is the actual cluster version of that or it's the it's

the analog to kubernetes where you've got multiple nodes where you say I want to run a service you worry about where it runs I just need it to run on this cluster um so you'll see me kind of using those interchangeably throughout uh so this is already yeah yeah this is me see our note okay so the the primary goal of going down this uh I guess research project that we initially took on Rabbit Hole of redoing our infrastructure um just because someone does it one way and this has been the way that everyone does it U when you look at the public repos that are out there on how people stand up infrastructure and how it's kind of

taught in trainings and things like that it doesn't mean it's the the only way um and so the way that we wanted to make this is pretty straightforward so all you have to do is just set up all of your variables so whether it's your your C2 domains you want to use um the usernames you want to create the the certificates that you want to have minted the number of systems that you want to have deployed or the number of cloudfl or cloudfront instances we want to deploy um that all gets done ahead of time in a single variable file so we have this tfrs which is a terraform variable file you just have to go in and

set each of the the the fields um to the different setting that you want after that all you have to do is run terraform apply after you review your plan um that'll spin up uh all the infrastructure for you any of the services that we want spun up in AWS or cloud cloud flare um Azure whatever you want um after that it'll put together a uh uh an inventory file for anible and this will have the IP addresses within the different sections based on the role of that system you run the anible Playbook it goes through initializes the Swarm um pretty much running this uh this Docker stack deploy after it initializes the storm to get each of the

instances up there um and then you can either run that Docker uh deploy command manually for each stack that you want to have uh deployed or you can just use ansible to deploy them all at once so um so what this looks like and oh um it did not Advance okay so what this looks like in a traditional workflow um we go through we run terraform apply it connects out to um Cloud flare to configure the DNS records um it'll go through and stand up the ec2 instances that we want it'll stand up um any of that infrastructure uh that we need and then it'll also go through and build out all of the configuration files um so

we're having terraform take all of that variable output um that we didn't really know at the time which was IP address addes um service names DNS records and it'll populate that into each of the Swarm configuration files uh those Docker composed files that John was mentioning um so it'll automatically build all of that dynamically with the inventory that's created from terraform after that the operator just has to run the anible Playbook where it's going to go through configure each of those nodes um set up Docker on them initialize that swarm uh that we see there um it'll then take all of those uh swarm figs and deploy each of those Services um so you really only have to

run two commands and then the operator just has to wait for the stack deploys to finish and then test the services to make sure everything's functional so um it's pretty straightforward when you kind of actually go and running it um but there's a lot of stuff that happens on the back end because we're offloading a lot of that to terraform anible and Docker swarm so um that's kind of what it looks like all inone run and then this is what the example deployment looks like so in the past we had all of those you know 10 different in individual systems um now we're just slimming that down to three we have a manager we have two worker nodes and all

the services running at top them um load balanced and then we just have all the different redirectors in this case we're using a bunch of different services like cloudflare cloudfront fastly and Azure CDN to push all the traffic through route it and then uh we can use something that we'll talk about here in a second called traffic to a lot of the load balancing and uh redirecting and checking of the traffic okay I think this is me it doesn't look like the notes persisted stand cover okay um so then when we look at the docker images um we kind of have them broken up into uh a couple different uh categories um the first one is a private

uh container registry that we have and these are for kind of internally maintained containers so these are going to be things like your C2 server we're not obviously going to you know just push that out to uh to Docker Hub um other things whether it's um specific uh logging infrastructure that we want to have for containers um and we have these wired up to ECR and then everything's built on a pipeline so we have uh fresh um container images that are stored there and available for us to pull uh their version pin so we know what versions we're running um but all this is housed inhouse um that we can just kind of run and then there's the other

services that are going to be things like meta services so things like traffic or painer um which we'll talk about these are things that they're public um they're maintained by trusted third Pro uh thirdparty providers so the folks that manage and create traffic they also build and deploy their own um Docker images to dockerhub so we have those version pinned um we kind of review those to make sure that that version uh nothing's changed it's going to break um but those were fine pulling down externally you could also just build them yourself and um if you needed you know host those within ECR as well but um for those we're fine um just grabbing those from the public

repos how many red teamers does it take to get through PowerPoint ridiculous um so this is an example of a Docker file if you've never seen one before um this is basically the definition for a container container image um so this particular example is for evil Gen X 2 um so Three is out now but this example came from before um all it does clones the repo uh runs the Mak script inside of it uh configures a start shell script and then that's about it uh the start shell script itself just launches evil Gen X inside a screen session uh the reason you have to do that is because evil Gen X was not designed to be

containerized and that's kind of like a pit one of the pitfalls of working with this design is trying to take existing tools that were not meant to be run this way and shim them into a containerized environment so sometimes you have to do some weird hacky stuff but most of the time it's pretty clean um evil Gen X is one of those few exceptions where things get weird um but for the most part the docker uh Docker files are relatively straight forward um so again to come back to Docker swarm this is our container orchestration engine um and again if that phrase doesn't mean anything to you think kubernetes but not as complicated and and complex um

so its whole job is again to decouple the application from the underlying infrastructure my application doesn't care where it's run it just car it only cares about what it needs it needs half a vcpu and 500 Megs of ram I don't care where I get it those are just the things I need um so using doer swarm we can completely decouple those things um and again like I love kubernetes I hate it every time I see it in production um it's just it's overkill for this kind of project there's way too much complexity involved and and like Dan said a couple slides ago the key here is like we need easy TurnKey use both in deployment and

modification I don't want someone who need wants to Tinker with this setup to have to go and learn kubernetes and everything involved with it I I love it personally I'm not going to expect everyone else to go do that Docker swarm is much much much simpler uh you can go read a composed file for the first time and you're going to understand what's happening it's very much like python in that way if you understand English you're going to understand what's happening um so the way uh Docker swarm works is through what's called Stacks um so a stack is a collection of services and a service is a collection of containers um so basically what I'm

doing is I create a stack for every single service or every kind of service I have within the environment so at the bottom here you can see the the example deploy command for portainer um and I'll go over that again we'll go over that in a second as to what that actually is um but I've grouped everything necessary for the painer service into a single file um that includes uh the containers that needs to run for painer there's actually two sets of containers that need to run um all the networking configuration with it any file system configuration uh etc etc so all of that is bundled into a single file in one single spot you run uh stack deploy and

you're done that's it easy peasy there's really not much you need to do there uh so uh looking at this again uh we can see the Heart of the Swarm here so on the left we actually have our swarm infrastructure um so if you take a look at the numbers on the right match up with the number on the left so for example uh number one in the top right hand corner uh htps protocol hitting the Food food.at domain uh and the pH route this uh first hits our redirector which in a lot of cases is just cloudfront cloudfront has a rule to say any traffic you get push it back to our uh red swarm

infrastructure uh sitting in front of that is traffic traffic says hey I got a request uh for htbs protocol on the food.at path or uh domain with the pH path I know I have an internal rule that says anything that matches those rules route it to the fishing server um so using those kind of rule-based things I can map external requests and domains to my internal Services um and this is all happens programmatically again using terraform to define a lot of this just defining a bunch of variables up front it generates all this for you um there's a lot of there's a bit of leg work up front like making the templates but otherwise it's all programmatic all I

say is hey I have a fishing domain here this is my fishing service met any domains or any request for this host domain rout it to my fishing service um two three and four are much the same so we've got uh more stuff coming through the redirectors from our Target environment hitting our internal stack and again we use those redirectors to auscap the red swarm infrastructure itself we never want to directly expose our infrastructure uh to our Target environment um and then below that we have the same thing for our operational stuff um so as us as red team operators are using this exact same stack so when I'm uh interacting with the Cobalt strike uh

multiplayer server the sliver server I'm doing so by routing through traffic and hitting those domains um so we have domains set up for our internal stuff that we don't expose food. bar.com um all I do is connect to that traffic is smart enough to say hey anything hitting Cobalt strike. food. bar.com route it to the Cobalt strike server um so it makes it very easy for us uh to again split up uh each component keep them very isolated and decoupled but they're still working together in a programmatic fashion and this is the the full slide okay so painer uh this is something I I mentioned a few times now um I'm throwing a lot at you guys

especially if this is not something you've ever worked with before you never worked on a red team for what this infrastructure works like or never worked with uh containerization and container orchestration uh painer is something we picked to make a lot of this a lot easier again I'm very comfortable working on the command line and working with just raw manifest files cuz that's what I've done a lot of time doing for someone walking this the first time that can be very daunting there's a lot to learn painer makes this very easy it's basically uh registers a service with inside inside your cluster and gives you a nice uh web UI that displays everything that's running inside of it

and it's very fantastic for debugging specifically um so just as an example of what it looks like um this is my test cluster I had stood up um and again to go back to one of those Stacks I was talking about before painer just gives you a nice list of all the stacks you have inside of your environment um so if I am having trouble with one of my particular Stacks I can just jump in here and start looking around to see what's going wrong um so for example I've clicked on the traffic stack here and we can see we've got two Services running each of those Services has one container um and the very bottom you

guys probably can't see this because at the very bottom of the screen um but at the very bottom I've selected one of the containers and I have a couple of options for actually directly interacting with it um in this particular case I just want to open up the logs and again bottom of the screen but you can see the the log output from the container saying uh configuration loaded from Flags um so me if I want to deplo deploy a new stack a new set of services and containers it makes it very easy for me to jump in and just look around to see what could be going wrong take a look at log output you can also

drop into a shell very easily um you can do all this by just running the docker CLI like that's perfectly fine um but if you're rapidly jumping around and especially when I'm working on some of the more complex setups like I'll get into at the end of this for our uh service called soour um I might be looking at logs from three or four different things at the same time trying to figure out where the breakdown might be especially when I'm like proof of concepting something very new and I haven't really ironed out the Kinks yet um so painer makes it very very easy for you to just jump in and get a high level

view of what's going on within your uh your cluster oh I'm covering this too cool I love this slide uh so traffic uh again another thing we've talked about a lot of traffic is a load balancer at the end of the day that's all it does um traffic runs can be be run in a wide variety of configurations um it just so happens to also support Docker swarm and supports it pretty well from my perspective um so in this particular case we use traffic as our Ingress and that routes to your C2 traffic your fishing traffic and all your your Canary traffic too or whatever you may have um so it supports it's primarily an

HTTP proxy so I'm going to say that up front if you have like very weird or non-standard or complex TCP and UDP requirements you might end up bumping your head against traffic uh but it works well enough um the way it works ultimately is you configure uh a set of services and a set of routes and you map rules to those to control uh traffic that comes in to services on the back end uh within Docker specifically uh you do that via container labels and you'll see that on the next slide um so the nice thing about traffic and this again this works in other environments not just Docker swarm but it's actually smart enough to monitor the docker

socket so it will watch your cluster CL for any container that comes online and if it matches a rule it'll automatically start getting served stuff so I don't have to once I deploy traffic I never have to touch it again it just sits there monitors containers as they come up come online and come go offline and we'll dynamically set up all the routes between uh traffic itself and everything else so if I've got traffic listening on uh worker node one and I've got service listening on worker node 2 I can hit worker node one and traffic is smart enough to say hey this is over on node two let me rout it over there um so it

makes it very flexible again I don't have to care about my infrastructure anymore I I I let traffic handle that for me I completely decouple those things all I know is that I've got a service running and I can hit it if I match rules x y and z uh the other really nice thing about it is it uh automatically renews let's encrypt certificates um so as long as you've got 80 and 443 or 80 open to the world um it can do its HTTP Challenge and you've got certificates ready to go nice and easy um don't and it'll it'll automatically renew them as well so it's very easy from that perspective persective um so this is a daunting

slide uh and we'll kind of jump into it but this is what the labels look like within uh uh Docker composed file for configuring a um a externally facing service so first up at the very top we've got uh the enable and the constraint labels um so traffic enable just says Hey I want traffic to actively monitor this uh this service here uh the second line says uh sets a restriction So within Docker sworn you can create uh virtual networks and you can give them a name so in this case I created a virtual Network called public that way the only way something gets exposed through traffic is if I explicitly put it on the

P public network um so it's just kind of a controlling it's making any kind of mistakes or anything like that if something's not on the public network it can't get exposed uh the second part here is I'm setting up the router uh so the router is what sits in front uh this is what consumes the the traffic from the Internet or for external from the cluster technically um so the very first line entry points as I'm saying it's an https entry point so I want you to listen on 443 and I expect you to to honor TLS protocols uh same with the the line below it says actively explicitly enable TLS uh the third and fourth highlighted

line uh configures let's encrypt so that cert resolver equals le le is let's encrypt which tells traffic I want you to use I want you to create TLS certificates for this service and I want you to use the let encrypt service for it um and the entry point itself is a named Port So within my traffic configuration uh which isn't shown here but for configuring the Traffic Service itself I have a uh a named Ingress called operator Ingress TCP and that's explicitly mapped to a port uh which uh what is it Sliver so for sliver I think that we have like 50,000 in1 or something like that uh for our multiplayer server and then above that we have have

our our routers themselves so uh this is where the actual router rules come into play um so the very first line You'll see for the um uh the actual C2 traffic here we have our rule set up to match against our our domain a path prefix and a header um so kind of like sop for us uh when we expose our C2 Services we make sure we at least have some beyond all the other controls you have in place uh we make sure we have at least a what we call a magic header in place which is usually just authorization equals as a cookie or not as a cookie but as a HTTP header and that's just one of the mean

or one of the ways we use to prevent it get scanning activity so the way this is set up is if a request comes in and it doesn't match that host that path prefix and those headers traffic is not going to Route it back to sliver so that way we can know that the only things that are going to hit that sliver container is legitimate HTTP traff or legitimate C2 traffic uh the second one down below uh underneath multiplayer connect um again going back to where TCP and UD P gets a little wonky your rules within tra traffic are very limited in this case you can only use the host Sni Rule and it has to be wild card so basically

anything that comes in on TCP on the matching Port uh which you can see down here 3 uh 31337 uh that gets routed back to traffic so really um because this is our operator uh endpoint though we have this IP restricted so we have it locked down to our VPN and that's it so while we can't use the rules to really filter against bot scan or anything like that we just use traditional firewall IP restrictions uh ec2 security groups um and then the top two you can see um the uh the services themselves that's the back end that maps to The Container itself so sliver is listening on Port 443 and it's expecting TLS traffic um we don't obviously once we're

inside the cluster we don't really need TLs but that's what sliver uses by default and that's just how we've set it up in this particular case all right um one of my favorite topics monitoring um SO3 oh yeah we good yeah just give me a time um so when it comes to your infrastructure it's good to have some type of monitoring in place you want to know when you know things are going wrong you want to know when things are going right Um this can kind of break down into your traffic logs um so you want to know what HTTP traffic is coming through especially if you need to debug things is the C2 traffic coming through

through is there's something that's not matching um we have a means to kind of monitor that because otherwise um I will say trying to debug using traffic without some type of monitoring in place is a complete pain in the ass um but then you also want to monitor your system health or is my cluster um am I running out of dis space am I concerned about CPU utilization as one of my containers taking up way more CPU than it should be um maybe I have some type of memory leak so these are all things that we kind of want to know and you can kind of put this into some type of uh monitoring setup the other thing that we

want to know too is we want to collect Telemetry from the services themselves um so those containers like John said where we're getting logs out of uh the containers um we want to get things like the fishing logs if someone successfully clicks a fish uh enters their credentials and we get a session token or um you know someone executes something um we want to be alerted to that and so we want to have logs and monitoring in place um we also want to know what our operators are doing within the C2 platforms themselves maybe they're running commands they shouldn't be running maybe they're doing dangerous activities uh on an endpoint maybe someone needs to have a talking to yeah

maybe someone's doing things they shouldn't so um we want to be able to kind of monitor what our folks are doing too um to make sure they're following procedures and um it's just good to have an activity log in case someone comes back and says this server went down at this time was it you we can we can definitely say it wasn't um and then the last one is Canary logs and so um canaries are if you can think of kind of canaries and uh honey tokens and things like that um from a defensive side we do the same thing offensively so we'll set canaries within payloads themselves so if a payload gets executed within a

Sandbox we'll know um if someone does a lookup on a domain that we're not using but we have that set as a as a domain as a canary so it's never going to hit anything unless someone's you know looking through that payload um pulling out the domains that it's reaching out to and then they do lookups or they start uh interrogating the infrastructure we'll get an alert saying someone is um you know analyzing the payloads and looking at strings that are in it so um these are all things that we want to be able to to know about um to tell us uh when it comes to logging and and getting the metrics there's a couple

different schools of thought and you know we've got some some real world examples of doing both uh the first one is using uh something like Prometheus and Loki um where we're getting those metrics and that's just native within uh the docker swarm setup itself um it's very lightweight um it integrates with the stack like I said um it allows the setup to be very TurnKey um and the log ingestion it because it's all within the Swarm it's all natively supported uh the one downside with Prometheus and Loki is um it's not as flexible um especially when we're working with that in some type of pipe blind so being able to get more bespoke logs from a C2 server or

something like that there may be some things that you need to do the opposite side of the house is looking at something like a red team Sim with which uh myself and Evan uh worked on in the past or something like a red elk where you're spinning up an entire Sim platform in this case elk uh you're going through and uh ingesting all those logs you're formatting it you're putting it into some type of structure that you can search on create dashboards with um the problem with that from our perspective is it's very resource intensive elk in a way is pretty overkill for what we're looking for um there's a lot of functionality that you

can use it for but um in our case um it's just very complex and very resource heavy for what we're looking to do um the plus side with something like Elk though is log log parsing is like native to the platform because it's a Sim um it's one of the built-in functionalities so it has a really strong you know log parsing uh tool log

stash so um this is just an example of one of the gra dashboards we have I have a couple here um these are just uh the out of the-box ones that are uh freely available to anyone this particular one is node exporter um so one of the services we have stood up is called uh node node exporter and it's a basically it's a container that's deployed globally across every node and it just exports a bunch of metrics for the machine itself so things like CPU utilization Ram utilization disc U ization all that good stuff um some of those things are pretty important to watch uh some C2 Frameworks go little hog while on CPU and things get weird

when they start to Peg all the way um so it's just a good thing to keep an eye on and set up alerts for as well uh another one for C advisor which is very similar except instead of working at the node level it works at the container level so now we have container level metrics where we can see what specific containers are using uh using uh particular resources in this case CPU um and the final one here is we got one for traffic so we can see all the traffic metrics that are coming in we can see the actual real time um uh HGB traffic this one is particularly interesting uh because when we work red

teams a lot of times we'll talk to the client and be like hey we exfiltrated like a lot of data and you didn't see it so we grabb screenshots a lot of the time we would use cloudfront uh graphon dashboard is another great place where we can pull those metrics on was like look we just shoveled 12 gigabytes of data out of your network over the last 24 hours did anything fire um so stuff like that can be really interesting too uh the thing to note with a lot of these dashboards is again these are just like out of the box they have not been tuned so um there's a lot of work we need to

do on the grafana point but they work pretty well um I don't know how many of you actually worked with both Cabana and grafana Cabana is so ugly it's just like you can tell it's you know two generations old uh I think grafana is really pretty so so we keep using it um and then this one so the last three were for Prometheus which uh we don't we're not going to dig deep on this Prometheus is a metrics engine so it's a Time series database it collects uh statistics for a thing over a given period of time so at this particular time stamp these particular metrics were the thing uh Loki is a log ingestion

engine so that's where you're actually consuming more traditional logs things like access logs for like Apache or engine X um in this particular case here uh we're consuming traffic logs so we can see access logs from traffic if we want to pull it from their direct uh actual file logs in this particular case I'm just doing a search on act me like for example sometimes uh my let's encrypt stuff will break it's my fault I've broken something but one of the things I'll do is I'll jump in here do a search for Acme and see if traffic is even registering and Acme request and trying to Min certificates or if something else is going wrong um so it

gives me instead of me having to Shell into the machine jump onto the share actually grip the logs themselves I can just do loky and call it a day um and as long as I've got a rule set up to ingest those logs I'm good to go uh cool so uh jumping into sojourn um sojourn is a stack I started working on over the weekend when I was very sad because I build things when I'm sad um how many of you have heard of Nemesis the new project from Spectre Ops has anybody been following that just Evan okay um so Nemesis is a is a really cool project I philosophically love it I really dislike the actual implementation

itself but uh Spectre Ops has a gun a ton of great blog posts about it and they're really trying to do something very similar to what we're doing here they're trying to bring offensive red team operations into the Modern Age where we work with structured data with automatic parsing um event cues uh basically automating things um we're not looking to do the next DB autoone or crack map exet we're not trying to automate the red team operator itself we're trying to automate those boring administrative things um that just eat up time and resources to try and let those operators do the things that they do best Go reverse engineer some weird payload and build me an execut or build

me a payload for it um another big reason I I brought this or started working on this is uh we went pretty hard into the pr Prometheus Loki stack for a while and there are some limitations there um the alerting stuff specifically was kind of janky to do so that DNS Canary uh stuff that Dan was talking about before we implemented that in uh in Prometheus and the chain for doing that was just ugly it was another case of things just being too complicated where these tools are built for like full Enterprises running like Google scale infrastructure we're not doing that um it's still great for us and it's still lightweight enough to be used in these situations but that

particular feature was just it was Overkill um so instead I went through and I built out uh what I'm calling sojourn which is a general purpose event looping framework um again my my whole goal for this was to act as a for samplifire I want to take the red teamers who are really good at Red teaming and let them spend more time doing red teaming instead of doing the admin ative stuff um so that can come down to like Lang wrangling really large data sets so something like uh you're going through and you're just browsing uh Network shares or local file shares I want to be able to ingest all that and put it somewhere that's searchable later on so

that in two weeks and I'm like oh hey I saw a reference to that on some system somewhere and I need to go find it again but there's no log of that because I was just manually CP around um I can have that ingested in a backend data store somewhere someone can quickly go search for it like oh it was on this system in this path easy and go get it again again um again setting up alerts becomes very important um things like say uh new Beacon activity uh like someone executed our implant I want to be able to have an immediate call back that says hey new Beacon turned online go look at it go do

whatever you need to do or even take the next step and do Beacon automation like great new Beacon just checked in immediately kick off the key logger or immediately grab a screenshot or start a hidden desktop session or something like that um and then also like artifact monitoring operator monitoring like Beacon logs themselves um um there's a lot here I think we can do uh from an automation standpoint to really make people's lives easier especially as someone who's leading red teams and a lot of this administrative stuff falls on my back a lot of the time selfishly I I don't want to do it anymore so I'm writing something to do it for me like a

good computer science nerd um so it comes down to a couple of core components here uh are you guys familiar with rabbit mq yes yeah okay all right got one hand yeah uh rabit mq is great um it is a it's uh typically referred to a pub sub system publish subscribe it's an event Q system so basically uh I want to set up a key within Rabbid mq I want to be able to publish a message into rabbit mq and I want to have an arbitrary number of subscribers Downstream with that consuming that message um so again say I have um I have a component that's watching my uh Cobalt strike or my sliver instance and anytime a new Beacon

checks in it's going to fire off a new Beacon activity message and then uh anything on the subcribing side of that can subscribe to that and anytime that message gets published the subscriber functionality is going to immediately trigger and that may say uh fire a teams web hook to let people know that hey you have a new Beacon or kick off some automation to start the key logger or something like that um the other data store component to this is a postes database um this is meant primarily for long-term data storage so for example if I have a component that tracks Beacon logs um it's a general purpose activity tracker so it's meant to consume Beacon logs

from a variety of c2s a lot of c2s has export functionality natively built in but what if I'm using multiple C2 Frameworks now I've got to find some way to shim those together the data isn't going to line up I'm going to have to massage it in some way it's a pain um so now I have one general purpose uh uh database table that stores all my activity and then you can take it a step further and you can start tracking other things too like fishing attempts like I can have some CLI tool that whenever I kick off a fishing campaign it registers like hey operator a started a fishing campaign targeted these users use this

payload Etc and then I can export that with a click of a button give it to a client and say hey this is everything I did and it's no more that back and forth game of like hey did you do this or what time stamp were you in here or I know you launched a fishing campaign but when did anybody click on it I can just export it call it a day they have all the information they need uh the third piece is a it's a core Docker image is what I'm calling it um rabid mq is great post CR are great there are a lot of complexities involved with working with them and again I don't

want to make people who want to help with this learn all those complexities I want to abstract this away as much as possible so on the Rabid mq front I export some convenience functionality for you uh if you're going to write your own module if you want to subscribe to an event all you do is say event. subscribe feed it a name and now you get everything you need you don't have to worry about setting up the cues setting up the exchanges making sure the rabbit Q rabbit mq server is online what to do with dead letter cues all of that like engineering nonsense that I personally love but don't expect other people to have to deal with um gets abstracted

away for you um and then the third one is apps Smith um I'm not or I'm sorry the fourth one I'm not necessarily tied to this it's a low code platform um there's a couple out there I'm still trying to figure out which one I like the best uh really the core piece for this is to provide just a easy to use UI so um I have a um uh an application within apps Smith that will just dis that wires into the postres database and displays all my payloads so they're right there or displays all my beacon activity logs um so I can download it programmatically or I can just have a nice TurnKey UI that's nice and

searchable like oh hey I remember running this command I don't remember the exact details of what I use let me go search it really quick there it is there's all the flags I use that the exact configuration Etc um so just as a quick example use case um I have modules and I have messages so in this case I have the FSM module which is file system monitoring um and it publishes to the fs activity file system activity uh message CU so it monitors a directory and any subdir and anytime it sees a change to a file within that directory it fires event to the fs activity que and says hey a new file is created or hey a mod file is

modified on the other side of that I have a payload tracker module which subscribes to the fs activity Q um and its whole job as it name implies is to track payloads um so anytime it sees a modification to a file within the payload directory um it calculates the shaw 255 or 256 hash dumps into the database and now I can view it directly within the app Smith UI um um so one of the things you do on a red team is you have to provide all the hashes of all the malware you just dropped on a network over the last 6 weeks eight weeks 10 weeks it's a long time and a lot to keep track of it's very easy for

things to fall through the cracks now I don't have to worry about it anymore I throw all my payloads up to my file server hashes are automatically uh calculated for me as long as operators are doing due diligence and not overwriting files where things get wonky you're good to go I don't have to think about it anymore it's just done for me how did that happen okay all right um so a couple other like really there's a million different things you can do this your your imagination's the limit you can do things again like Beacon activity I talked about that already you can do an evil Gen X call back so hey you've got an evil Gen X fishing campaign out

someone actually fell for it now you've got their session great take my uh Road Recon module and immediately reach out and start scraping o 365 to see what kind of configurations they have dump that data somewhere and I can immediately P go from a fishing campaign to having good insight into their environment um so yeah there's there's a million different ideas I can come up with here and it's kind of a new thing I'm working on so really trying to flesh out all the use cases for it cool so all right a lot all right so um like John said we tried to implement this and this is something we've implemented kind of over the past year and a half um and we

presented this last year at another conference uh where we focused a lot just on terraform there were definitely some lessons learned from John's willingness to uh only use terraform um but one of the big things was traffic definitely is best suited for HTTP um we did some work with trying to implement some UDP rules um a UDP service it it works but you know like you said before you kind of have to work with wild cards and things like that because there's not really rules supported at looking at the uh the underlying um packets themselves um the other thing too is when we're working with traffic um it's an overlay Network so it's on top it's

within the Swarm itself um it's not kind of lowle of the operating system where the rest of the networking is so there's its own kind of of pain you have to deal with there especially with kind of sources and destinations and worrying about where things are within the Swarm itself um that's just the way that overlay networks work um also Docker swarm is a bit Limited in the way that you have to use distributed volume so um we have a a back-end data store that we've got within like EFS or elastic file system and we're you know we have all of our data back there but how do each of the containers Mount that and have the data

persist and things like that they're all things that you kind of need to worry about um and you can't just easily distribute a volume to all of your containers uh through the Swarm um the other thing is there's not an easy way at least we haven't determined it yet where you can kind of of control the container creation order so when you're building out your Stacks um the containers that need to be deployed for each of those um there's not necessarily a way that you can control the order it's going to happen in so you either need to do that with anible and make sure that as you're deploying them uh the containers will get stood up in the

fashion that they need to or you need to include some test cases to to wait until the service is up before I continue spinning up the other ones and then um the the last one here is just around infrastructure thoughts so um you know like I said I've been doing red teaming for the past you know uh 5 to 10 years and just because someone does infrastructure one way doesn't mean that's the only way that you need to do it um there's kind of the sense in it and and even within security where if it's not broke don't fix it um and a lot of things you know carry over too where it's I don't want to change this because

it works but in our case when John came on a few years ago it was you know we were doing the exact thing where it was a lot of single use systems and John was like hey there's these things called containers and if you look at Red Team tooling and and pentest tooling there's really not a ton of support for Docker and containers if you look at a lot of the tool sets that are out there you either have to work them into a container um there's very few that actually have support with just containers there's a Docker file associated with the tool itself so I think in general containers aren't that widely adopted in the the offensive

community so um it's definitely something to think about um but always when it comes to infrastructure think of ways to reduce stress um improve your life when it comes to standing things up reduce things that you don't have to worry about or the things that people are repeatedly doing wrong um you know figure out ways to protect the infrastructure in the way that it should be um explore new technologies and tools and this case it's Docker swarm maybe next year it'll be something else that's hot uh who knows um and at the end of the day be a hacker think differently try things out come up with new proof of Concepts and just explore um and then I'll leave you here

with um some resources just around traffic doer swarm um Loki uh we'll have our slides published uh on our GitHub but um we'll get those up either tonight or tomorrow and then uh if anyone has questions um we'll just hang out around for a little little bit feel free to come up but thank you for [Applause]

attending

Get in the Box: Containerizing Red Team Infrastructure

Related talks