CGroups for DFIR

Name: CGroups for DFIR
Uploaded: 2026-06-09
Duration: 24 min 55 s
Description: A detection engineer explores how Linux control groups (cgroups) — the kernel primitive underlying systemd services and container runtimes — can be leveraged during forensic investigations. The talk covers how cgroups label and relate processes without relying on process lineage or timing, walks thr

BSides KC 202624:5531 viewsPublished 2026-06Watch on YouTube ↗

Speakers

Thomas Gardner

Tags

CategoryTechnical

TopicContainer Security Detection Engineering DFIR Linux Security

TeamBlue

StyleTalk

Mentioned in this talk

Tools used

Falco

Concepts

eBPF

About this talk

A detection engineer explores how Linux control groups (cgroups) — the kernel primitive underlying systemd services and container runtimes — can be leveraged during forensic investigations. The talk covers how cgroups label and relate processes without relying on process lineage or timing, walks through investigating a compromised host across containers and systemd scopes, and discusses collecting cgroup data with Falco, procfs, or eBPF for detection engineering.

Show original YouTube description

Endpoint DFIR practices over the last 3 decades have largely focused on Windows, with more recent improvements on macOS. However, despite the platform’s dominance in the cloud and container space, Linux has remained far behind these other operating systems in terms of forensic investigation techniques. This talk provides an overview of a little-known Linux kernel feature called CGroups and how it can be used during forensic investigations on Linux endpoints. CGroups are the basis for all container runtimes, and most popular distros leverage them via another bedrock system utility - systemd. We’ll review common patterns across containers and system services, and how to link related processes using CGroups without relying on process lineage or time.

Show transcript [en]

Um yeah, well thanks for uh showing up despite the ice cream. I really appreciate it. Um so yeah, I'm going to talk about croups for security investigations. Um it's basically a a Linux DFIR primer. Um so if you're not familiar with DFIR, digital forensics incident response. Um hopefully that'll be obvious shortly. Um so a little background on me. I am a staff detection engineer at uh Zcaler now. Um formerly Red Canary. I've been here been there six years. Um all on the detection engineering side. Before that I did five years at Lumen, formerly Century Link. Um shout out Bill. Um where I did a few different roles in incident response, threat intel, thread hunting. Um, and

I've always really enjoyed sort of less researched areas of DFIR. Um, so today we're going to talk about Croups, namely what the hell are Cgroups? Um, which seems to be the first question I get a lot um, whenever I bring this up. Um, we're also going to talk about like how they're actually used in production Linux systems. Um, you can't find bad if you don't know good. uh we're going to actually go through an example of using them to sort of build up context during a security investigation. And finally, I'm going to touch on uh detection engineering with them. That is after all my job. So, I hope uh I can provide a little more um

insight uh for you all to take away to your own jobs. Um but first, a sneak peek uh on why I think it's so important to monitor croups going forward. Um, so they provide some really interesting uh benefits to us when we're like investigating a Linux compromise. Namely, they they service relationships between processes exceptionally well. Um, they add really interesting context um, namely in like the container space. Um, since they're so like fundamental to that area. Um, and finally, they're they're pretty well tamper resistant. Um, like attackers don't really have a lot of um, ability to change them once they've landed on a box. Um, and we can really use that to our advantage as

defenders to sort of prep our uh hosts and uh kind of trap them in ways that we uh are really interested in understanding. But first, what are croups? Um, so croups um otherwise known as control groups. Um, they're really just like a labeling system in the Linux kernel for limiting resources. Um, they work on a per process level. So every process gets assigned to one or many croups depending on the the resource we're trying to limit. Um those resources include things like the number of processes that like the entire group can spawn. They include the maximum memory an individual process has uh what devices they have access to. Um and fortunately for us there is a

unified nested hierarchy um in the croup system that we can use to identify relationships between processes. um kind of looks a bit like this. Um I just sort of uh summarized some stuff, but you can think of it kind of like this sort of hierarchy. So we're going to look at an example. Um we have like a system running say four processes, three different croups. We can have multiple processes in the same group. Um and we have sort of pit 100 um sort of no croup, croup slash the root one. um two in processes or two processes in croup A and then the third in croup AB that sort of nesting that I was talking about. Uh the root croup has

kind of unfettered access to system resources. Um in our case it's got like 16 gigs of memory. It's can spawn a thousand processes just fine. Um uh processes 101 and 102 are a little bit limited. You know, maybe they still have access to 16 gigs of memory, but they can only spawn a 100 processes, right? These are things that the kernel labels uh or the kernel interprets the labels given to these processes as resource restrictions. Um and then uh our like nested process uh in croup AB um can't have more processes than like the parent, but it can have fewer. Um so it's only got two gigs of memory available. Um can only spawn 10 PIDs. So now you all understand

croups. They're not that complex. Um and uh we'll talk about how they're actually used in practice. Um so we'll go over a couple big applications um for Linux. Um first one I want to go over systemd. Um maybe the little less known well less well-known one but actually probably the more used one. You just don't realize you're using it a lot of the times. Um so if you haven't heard of systemd it's an init system. Um sport of a a more modern one. more modern being like I don't know 2008 Linux doesn't move terribly fast all the time um but it's responsible for starting starting services on system um and also managing resources uh on the system it's used by

all major distros um so your iuntu your red hat um Debian Arch Linux they all use them use systemd um to actually like start your sort of userland processes um if you caught System D is responsible for resources and also Croups are responsible for resources. It's because systemd uses croups to manage those resources. It does them via these things called slice and scope units. Um those units are really like building blocks of resource management in systemd. Um there's a whole talk we can do on systemd later that I would love to do, but I only have 25 minutes here. Um, and there's two important concepts in systemd that we can use to our advantage as defenders or as investigators. Um,

there's explicit separation between system level processes. Basically, um, like everything after PID one and probably before like PID 20,000 um, probably a system level process um, and user level processes or user level services. Um, that gives us a good idea of like permissions available to a process when it's running. Um there's also probably most importantly standard croup path conventions that systemd uses. This is used across all these huge major Linux distros. Um and these this path convention gives us a lot of info as you'll see shortly. Um so let's review a couple common path formats on the user level side. Um so we've got our sort of root user slice. Um, basically every user can spawn processes as every

operating system allows. Um, but all of these processes are limited to just the resources available to this user slice. Um, at a bare minimum. Um, systemd does this so that no user can like overwhelm the system resources on their own. Um, within the under this slice, every single user gets their own individual slice. Um, this is really useful from like an instant response, like a forensic standpoint, because we know straight up off the bat like exactly which user is responsible for this thing without having to go find like a sort of map in memory of like which process belongs to which user, stuff like that. That's just embedded right in the croup itself. Um, and then users can also

spawn services um on their own just like system services. They just run under less permissions. But the the cool part about systemd is that it also tracks terminal sessions. um just via croups. So uh every single terminal session, every time you log in by like SSH to a server, every time you have like a local login or something, those are all actually organized automatically via uh croups. And that can tell us a lot about which services or which processes are running next to one another um in like a timeline, right? Rather than having to rely solely on like process lineage, which can be really difficult sometimes to to fully investigate. On the system level side, we have a much more simple

sort of hierarchy. We have the root system slice. Um that tells us right off the bat there's probably more permissions for any process included in that slice. Um and if we're looking at like malware or something, um then we probably need to be a little more careful with um sort of what other access the attacker had on the system at the time. Um and this is actually the area the system level services. These are the areas that like attackers love to drop in on and and deploy to um because that's really what gives them the elevated permissions. It it's typically how they want to establish persistence. Um I've got a couple examples of just like service croups

that are actually that were actually found and used in the wild um as part of like some Arch Linux malware and uh North Korean threat actor that CISA called Kimsuki. I probably mispronounced that. Um, from a like visual standpoint, if we look at like how a system service runs, um, let's say we take the cron like task scheduling service, um, every process spawned by that service, including the root like cron demon, um, gets tagged with this croup. Um, this is great again because we don't actually have to know anything about how like was this the great grandchild of cron d to know it was a cronuler. We don't even need to like identify or like go inventory like

scheduled tasks on a system. If we have a croup data and we see a process with the croup, we know cron.service means this was a scheduled task. And that gives us an instant like insight into what do we need to like investigate further? How do we need to like clean up this system once we're all like comfortable with the scope? um on the user level side. Um so those terminal sessions, those scopes are really useful for differentiating between functionally equivalent processes. So like seeing bin bash and binsh spawn right next to each other in like a process timeline or like a say our e r logs or something can be really confusing. But if we know they respond

from different session scopes, we instantly know they're probably actually not related to one another, right? It's probably just happen stance because a lot of things just use the Linux shell um during like regular execution. Um I've mentioned process lineage a couple times now. Um so this is actually sort of a a diagram of that Arch Linux malware I just referenced just prior. So the way that malware worked was um there's a you install a like Linux package through like the Arch Linux marketplace. Um it installs a system level service. that service spawns a shell which spawns wget which spawns another shell to go execute a like malicious script that it downloaded from the internet. Um if we had to

investigate this on the process lineage side maybe we know like oh it's weird for wget to be running as root that sucks. Um and we can tell maybe there's a a shell process that ran immediately after. So we can we can be pretty confident in our association between the two. Um but when we look kind of zoom out and say okay how badly is this or how bad is this compromise? how uh broad is this? There's these other shell processes being spawned as root spawned by the init process. Are those related? Um we have to do a lot of leg work to really like scope that instant. We have to do uh like a lot of inference to to

be confident in our assessment. Whereas from a croup side, I mean if we just look at the like croup uh available for the service, in this case xy actor.service, I don't know how they came up with that name. Um maybe we see like oh this actually this shell process is in that croup too that's definitely bad and this other one is in rsis log or some other service that's definitely good and we could ignore it right it's very quick data for us to go scope out our incidents and and respond faster the other application I really want to talk about is containers um so containers are not a thing at least in the Linux kernel um the kernel has no

idea what like a docker container is it doesn't really care like what like container runtime you're using. All it knows are these two sort of low-level technologies. One being croups, the other being name spaces that provide resource limitation and isolation between processes. um that is uh really nice for us because if we're running like a big containerized system in say Azure Kubernetes service or in Cisco Splunk um for example um like our friend here then um it means that if an attacker compromises one of our containers they have to operate within the croup paradigm like every process they spawn in that container is actually um tagged with the croup that that container is involved in. Um

unfortunately from the container side um it's not quite as standard as the systemd sort of croups are. Um the open container initiative is quite vague about how to implement croups. Um but there's a few like common patterns uh that are like available to like all docker and run c run all the big container runtimes as well as kubernetes. So we'll look at a couple examples. Um the paths are at least straightforward in a lot of cases even if they're not standard. um in docker. Uh so these look a lot like systemd service level stuff. We have our sort of docker root and then ev under that is every container ID. Very simple just two parts. Um so we have uh this is the

exact same container ID that's viewed when you actually like look at your running containers and when all your logs that you collect from your container run times um are available they pull out this docker ID or this container ID. Exactly. Kubernetes provides a little more info. It includes things like a quality of service class which is kind of like a like a resource like profile almost um and then a pod ID. And so if you're big on Kubernetes um you understand like pods are actually the the sort of unit of measurement in Kubernetes. So typically like one spot one pod spawns one container but there's no actual restriction for why a pod can't spawn a bunch of containers. So we

embed that relationship between a pod and its containers in the croup path as well. That gives us a lot of latitude um without actually having to touch the Kubernetes system which can be extremely complex not just for users but also for developers. If we're looking at this from like a visual perspective um again we have our like root docker croup um the container ID and then all the processes that spawn just like that cron implementation right every process in this container by necessity because of what the kernel does is tagged with this same croup right so we know without looking at process lineage without hooking into a container runtime without having to go to any other external system we know

bash and ps and ID are all related to one another by the container they're in. So, that was a lot. Let's start looking at like a security investigation. How can we actually use this on say a compromised host? Um, I didn't want to attempt the demo gods today. So, you're going to look at some terminal output and I hope it's going to be the same whether it's a live demo or not a live demo. Um, but before I do that, um, I was actually very pleasantly surprised to see many references to Falco and similar tools. Um, I'm going to include my own. Um, so Falco is a great tool if you're unfamiliar for monitoring um,

container systems um, for monitoring like Kubernetes deployments. It's uh, originally built by Cyig but maintained by the uh, I can never remember CNCF. Help me out. He can also never remember CNCF and that's fine. Um maintained by the cloudnative computing foundation. Thank you. Okay. Um so these people um basically uh handle a lot of the like community around different sort of container uh initiatives that are like really popular in open source systems. They also manage Kubernetes. Um Falco is a great tool though. It comes with a lot of capabilities by default um including monitoring processes, generating uh like alerts, um helping you like export your like telemetry from your uh containers and Linux systems. Um and it's also

really extendable. Um it's really nice in particular because it already collects the croup info. It just doesn't expose it by default, but exposing it is as simple as a little YAML config change on the right. Um so we have this system, we have this Linux server. We start seeing alerts pop up. Um and the alerts contain some interesting info. Um they trigger a few rules, some like credential theft rules. Um sensitive file open, grapping for private keys, AWS credit search activity. These are unusual. These definitely like alert your sock and say, "Oh, something is going on on this server." Um, and Falco maybe doesn't, you know, know much else other than a croup, but that croup

contains a Docker path, right? So, we know that there is a container that's been popped. We know exactly which container it is. Few minutes later, the exact same thing happened. Looking at the croup again, we get a different container ID, which is really interesting now, right? Because it starts like generating new questions. Um, do we have a vulnerable base image that's getting exploited? Right? what's going on? Uh did we write some exploitable code or something like that? Um what's in common across these two containers? And then a third instance, we keep getting the same alerts, right? Only this time, um we actually don't have a docker croup path. We have a systemd terminal path, right? Which

tells us, oh, they broke out of our containers probably and they are on the the system, right? And they're running as UID a thousand, for example, right? um adding just croups into our sort of investigation path gives us a lot of info that we otherwise wouldn't necessarily have by just looking at like oh the same host is generating the same alerts. Um hopefully that doesn't result in your like sock analyst just hiding the alerts but um this does gives us like good like uh a good timeline already of like things to be wary of. Um we have a user to lock down. We have uh a couple containers that we know we need to rotate and probably some like uh

questions of like our actual deployment mechanisms that we need to go like answer and potentially go remediate as well. All just with this like addition of just this one path. Hopefully by now um I'm slowly convincing you that there's a few advantages to monitoring croups, right? Um it's a great alternative to simply tracking process lineage. Um, it's great in addition. I would not necessarily do it as a replacement. Um, but for systems that like run Linux, it's very often that we can get these long process chains, right? You can kind of just chain SH- C a bunch of times and most monitoring tools don't do a great job of like keeping track of all those cases.

Um some advantages of of these as well uh for like systemd in particular is that since it's such a common init system um we can take advantage of its standard path format right we can learn on like a local iuntu system and apply the same thing on our like production red hat systems right um and the attacker has very little control over how systemd applies croups to the processes they they deploy basically. Um from the Kubernetes and container side um we uh can definitely infer a lot about like the containers themselves, the pods, like how um different processes are related to one another. Um but it does come with a couple big drawbacks. Namely, um and this is really

just Kubernetes containers in general. if you don't have access to the host or if you don't have a way of like deploying sort of root level or like a privileged container um you're going to struggle with monitoring croups um in order to do it effectively. Um so uh running this in something like a serverless environment or like a managed Kubernetes service like um elastic kubernetes service EKS on AWS um might be a little bit more of a challenge. Um, this definitely excels when you're able to like manage the host yourself. Um, so finally, um, the thing that I love the most, um, that I I do want to talk about is like actually detecting with croups.

Um, before you start detecting though, you actually need to collect data. I mentioned Falco. Um, I really like it. Namely, it's free. Uh, it's open source, so it's and it's well supported open source, which isn't always the case. and it's really customizable. Um I have a GitHub gist up on GitHub that uh just like at gives the config change the simple config change in it. Um I will find a way to distribute slides after this too if anyone's interested. Um but the thing you need to know about Falco and the way it collects this data it already collects exec type SIS calls um which is kind of all you need to monitor to sort of trigger that croup data

collection right so it's a very straightforward way to collect this stuff um if for some reason you can't deploy Falco um maybe you don't trust the the the writing or something like that or you can't deploy thirdparty code to your systems common in production environments you can monitor croups yourself by collecting just from the file system. You can watch proc fs which is the like pseudo file system that Linux sets up for everybody to go like check you know how much memory a process is using stuff like that. Um there's a simple file under every PID directory that is just called crgroup contains the croup the process is in very easy to access. There's also a CSF croup

directory that you can do the same in it's a little bit of the reverse where you monitor the croup that gets generated and check the PIDs that are in that. Um, but also a very easy thing to do, something you can do with common Linux tools like cat and ls, stuff like that. Um, and finally, most advanced, um, especially if you're a student, I'm glad to see a lot of students here. Um, if you're looking for a project, you can collect croups via, uh, this tool called EVPF. Um, there was a talk earlier today that was uh, all about stopping attacks with EVPF, but it's great for collecting data as well. Um there's a simple helper

there that it it can use called get current croup ID once you're in a process context. Um it's as simple as calling this helper and it'll spit out the croup. Um and then there's also a map type data that like just stores that stuff um while running. So we've collected our data. Now let's actually go and use it in our detection system or in our home lab or something. Um there's a few like ways we can do that, but really they just come down to like layer your croups on top of your existing detections. We can use these as atomic indicators. So XE actor.service is probably never going to be a legitimate service. So if you just look

for processes with that croup path, you're probably going to be golden. If you find it, it's probably bad. Um Team TNT um is a little bit older now. A few years ago, they were really big, but they've had sad service. service team PCP which is sort of the new hotness in Linux infrastructure uh hacking is uh uh or supply chain hacking I should say not infrastructure hacking um they use sysmons service lately but these these things are so easy to change that like uh you have to be careful with just using them as atomic indicators um on the detection side you can also use them for anomaly detection um so if you have like a cluster of engine X servers web

servers and all of a sudden Apache service pops Um, maybe that's legitimate Apache, in which case you should figure out why it popped up. Maybe it's not though because there's no reason to um or you don't have to name your croup after the process that spawned it. Attackers can lie. Spoiler alert. So, they can lie about spawning Apache on your your system. And then finally, um really the big one is just layering croups on top of your existing tools. Just like we did in the Falco example, um we uh we can add uh like croup info on top of credential theft um or croup info on top of like um weird Python being spawned. And uh if we use that to also include oh

this Kubernetes environment or this um particular pod ID, then uh we've become we've we've turned like maybe a slightly noisy detection into a really high fidelity one. Um that's all I got. um really appreciate it. Uh any questions? I'm also also be here after as well. Um and I will find a way to share these slides or you can handr write out all of these URLs.

CGroups for DFIR

Related talks