
Thank you. Thank you. All right. Uh well, good morning, everybody. Uh my talk today is going to be on stopping uh attacks in Kubernetes using a product called Tetragon. Uh and uh before we get started too far here, I'm sure you're wondering who am I and why should you care about what I think. Uh so, I am Richard Wilheit. I am a security engineer at Splunk, uh which is in San Francisco. And I've been doing various technology things for the last 20-plus years. Uh everything from desktop support technician to now security engineer. Uh and I'm Kansas native. I grew up in Basehor, up north. Uh and right now I live in Lawrence the last 10 years.
Now, if you want to reach out to me after the conference uh with any questions or any follow-up things, you can find me on LinkedIn up there, which you can probably find easier on the video afterwards. Uh or I'm LFK Rick on second C's Discord. Uh and then finally, we've got uh this little bee down in the corner here. That is EB, and uh she is the Tetragon mascot. Um if you want to find more fun pictures, uh there's plenty throughout the presentation, but there's actually an EB Dex uh out on Isovalent's GitHub page. You can go and read all sorts of fun stories about EB. All right. And what we're going to be talking about today, uh we've just got a
few sections to go through. >> [clears throat] >> Uh first up is the detection gap, which is just the delay between detection and enforcement, kind of piggybacking on what Tim talked about in the the uh talk before here. And then we're going to talk about some of the other tools we've got in the the space today. Uh then we'll move on to how Tetragon works. Um and talk about the three pillars of enforcement. Tetragon does a lot of things, but really day-to-day we focus on three main areas. Uh then we'll try to do a live demo. We'll see how that goes. Uh and then finally follow up with some uh best practices and resources. So, what am I talking about with the
detection gap? This is again, uh you know, aligning with what Tim talked about in the keynote. It's the delay between detection and response. Now, some of the popular tools in Kubernetes for security there stuff like Falco and Tracy. Uh and they do great job of alerts and then you can from those alerts triage, enrich, uh investigate, respond. Uh but all of that takes time. And like Tim was talking about, the time that we have to respond keeps shrinking farther and farther. Uh we used to have a CISO at Splunk, Jason Lee. He was a CISO at Zoom before Splunk. And he uh challenged us to have a response time of less than 7 minutes. And that was tough.
Uh we really struggled with that. Uh and in today's landscape with AI, I think that 7 minutes is going to continue to compress. And so, being able to respond quickly is really key to protecting the environments from from threat actors. And you know, if you don't respond in time, you've got people in the environment who have already pivoted laterally, escalated privileges, they've exfiltrated data, or they've harvested credentials. So, if you have to detect and respond, you're betting your environment on the size of the gap and the speed of your software. Uh modern runtime security has to collapse that gap with policy-driven inline enforcement that's just better than just alerts. So, again, I mentioned in the last slide
we talked about Falco and Tracy. Uh these are not bad tools. I don't want you to think that I'm, you know, saying they're bad. They're actually really good at what they do. Falco is a great open source library. A lot of times when you've got a zero-day uh threat to come out, the open source community has detections out in Falco within the same day. You can deploy Falco very quickly. Um a lot of times for teams, especially smaller Kubernetes infrastructure or security teams. uh Falco is the fastest way to get baseline security visibility in your clusters. Tracee provides provides really rich telemetry. It's great for forensics and gives you process file network stream events. It's great for finding out what happened
after the fact. It also gives you a mountain of data. And you know, as the someone who works at Splunk, I do know a few things about mountains of data. So while the Tracee data is great, a lot of times I think what it leaves you with is you end up being data rich and knowledge poor. You know, after the fact you can go find out what happened, but you're still in the news. So, not exactly what you're looking for. Both these solutions are detect only and they can tell you, you know, suspicious process just spawned, a container is touching a sensitive host file, or there's an unexpected outbound connection. But neither of these can block.
So, Tetragon by contrast is a rich observability solution, but it also has enforcement capabilities. So, it can both log suspicious processes, but also block containers from reading and writing critical host files or stop network connections out to command and control infrastructure. So, Falco and Tracee do a great job of helping you understand what went wrong after as it happened, whereas Tetragon helps to close the detection gap instead of just shining the light into it. So, how does Tetragon work? So, at a high level Tetragon is a per node agent, a cluster level operator, and then custom resource definitions. So, for the agent, it's a daemon set on each node. It attaches eBPF programs to kernel hooks and provides Kubernetes
context for those events. And the Kubernetes context is really one of the things that it shines to provide. You know, that context helps identify where those connections came from. It'll give you the pod, the namespace, the cluster, the node. and it helps you narrow down where your threat actor is activating, you know, [clears throat] malware or privilege escalation, whatever Uh since it's EVPF, it's got visibility in both kernel and user space, which gives you great visibility across your clusters. Then the operator manages uh policies. So the cluster-level policies are the operator pushes this to the agents, and then offloads some of the compute of the agents so that the agents can be as performant as possible on the nodes.
Now the policies are just um normal Kubernetes primitives. So we've got tracing policies or alert rules are a couple of the type of custom resource definitions that the Tetragon uses. The tracing policies tell Tetragon what to monitor and how to enforce, and alert rules provide a higher-level security context. Those would be what you could send out to the SOC to respond to. >> [clears throat] >> So but the key point is that those uh policies are Kubernetes native objects, so you can manage them by applying them to your cluster the same way you would any other Kubernetes resource. So while Tetragon does a lot of things in the day-to-day, when we're running Tetragon, what we end up doing is focusing on
three areas: file system unauthorized file system access, privilege escalation, and then suspicious network connections. >> [clears throat]
>> So when we talk about unauthorized file system access, these things like reading at C shadow or at C password on the Kubernetes nodes, dropping binaries or scripts into temp. I don't know if anybody had to work on the trivia incident here lately, but kind of myself familiar. Uh or modifying system configuration files to establish persistence. Now if we look on the right-hand side here, that is really hard to read on the screen. Uh But, I tried to share an example of a tracing policy. Um hopefully it's easier to see on the video, so I'll just talk to it and you can follow up later if you'd like to see it better. Uh the main thing we've got here is so
we're using a kernel probe, in this case we've called it KProbe. >> [clears throat] >> Uh we're looking for a syscall for opening a file, the sys_open that. And then we're matching our arguments, we're going to look for the Etsy shadow file. Uh and then finally we're looking to perform a couple actions here. We're going to post, which means we're going to log the activity. We're also going to sigkill, we're going to block the process that's attempting to access the file. So the second pillar I talked about was privilege escalation and uh these are things like invocation of tools like nsenter or chroot, attempts to join host namespaces, or sudden capability changes like gaining caps_sys_admin.
Uh again, I've got a tracing policy that I'll do my best to talk through here. Um so again, we're using a kernel probe uh and while I've got KProbes up here as the examples, you can also use UProbes UProbes if you want to inspect user space activities. Uh but we're looking for the setns or set namespace system call. We're looking for just a couple of binaries here for nsenter located in either user bin or bin, and then again we're going to log the activity and kill it. So finally, we've got the network uh suspicious network connections. So these are outbound connections from pods to known uh command and control infrastructure, uh processes that should be local only
that suddenly initiate outbound TCP connections, or lateral movement inside the cluster, pods that shouldn't be talking to other pods. Uh now, this tracing policy we've got a couple more things included here. Uh again, we've got a uh kernel probe to look for TCP connections. Um we're limiting it to just look for curl in this example, but most of the times when we're honestly doing network connection monitoring or alerting, we don't typically add the binary in there. You could leave that section totally out and monitor for everything that goes to those connections. And then finally, we've got a couple of IP addresses that we're going to monitor for and we've got our action to sync kill.
So, in this example, we've got specifically two destination IP addresses. And um there's a lot of options for what you can do there. You can use source address and destination IP address, CIDRs. Um you know, this is just one example. There's a lot of options. Now, one other thing I want to point out is since this is a normal Kubernetes primitive, you can use normal pod selectors or namespace selectors that you would in any other Kubernetes um deployment. So, you see here I've limited this to just a single pod in the X-Wing. And I think that's one of the great things about the the Tetragon policies is your ability to just uh work them in
um to specific areas in your environment. All right. So, now we're going to try to do a live demo. Uh we'll see how this goes. Uh but essentially, what we're going to do is we're going to install Tetragon in Minikube uh with Helm. And I know a lot of you are probably thinking, "Well, Minikube installations are a lot easier than an EKS cluster or an AKS cluster." And you're not wrong, but actually, I've had pretty good luck with installing Tetragon on EKS and AKS and GKE clusters. Uh so, I can tell you it's not much more complicated than what we're going to show today. Uh then we're going to execute execute curl out to malicious IP address and
then try to block them.
So, and then I got to try to type in order not to so fun. >> [clears throat] >> Oh. He's going to help me out. >> I can't type. >> [laughter] >> All right.
>> It's for them, not you. >> All right. So, hopefully you guys can see this okay. So, first off, let's check that we don't already have Tetragon running. Make sure I cleaned up my mini cube instance, okay?
Yeah. So, we do not have Tetragon there at the moment. Let's install it.
>> Batman doesn't have the shell on his screen, by the way. He is straight up staring at here for y'all's benefit. So, give him some love real quick.
>> Sure, can I do you a favor and get you a chair? >> No, cuz then I'm blocking. It's all good, appreciate it. I ain't that old yet. >> [laughter]
>> Sure. Yeah, that's probably easier, right? There we go. >> All right.
>> He's just mumbling, he's fine. >> There we go. All right. Now we should have Tetragon installed. Let's check that out. Yeah. So, pods are coming up there. We've already got a Tetragon agent right here. And then we've also got a Tetragon operator that's going to come online here shortly. >> [clears throat]
>> All All they're up and they're running. So, we'll let's see if we can find some logs. That's not the one.
All right, we're going to have to use this other partner.
All right, let's wait. >> [laughter]
>> I found >> Oh.
Indeed, I am. There we go. I am so glad that you appreciate it. All right, cool. Now, we're getting some logs. All right. All right, so let's go ahead and try to do a curl command out to our malicious IP address.
That's the old one.
>> There we go. >> Hey. >> Yeah, so there we got our IP connection out. And uh it's a little hard to see, but there over on the side there we did see a curl command. That's drawing something else here.
It is since I sized up the screen a bit here, then go by fast. All right, that's a little bit better. So, we can see that we had a bash process fire and the curl command underneath that. And then we can also see that they exited with a successful zero exit code. All right, so now if we wanted to say we had the sock come along and say, "Hey, we see this IP address connecting out. We want to you know, understand what's happening with it." Uh we could deploy a tracing policy. We can look at that. Here. And this is just the same tracing policy I was showing the example on the slide. Uh you can see that we're uh
limiting it to the XWing pod for this demo, TCP connection for curl out to some IP addresses. In this example, we've got a sigkill as the action, but you are probably not going to want to start there uh unless you really hate your developers. But uh And I mean then, you know, have fun. Uh so, but what we'll start here is we'll go ahead and comment this out. And then we'll add a post action in here. That way we can just log the activity to start. Now again, since this is just a Kubernetes primitive, we can just apply it like normal. Super K apply. Yep. So, now we've got a tracing policy in place. So, if we do our curl command
again here, uh except for I'm going to grab for curl now so we don't see my example. The last [clears throat] time
we'll see if we Oh, curl's not there. Oh, my god.
Give me that.
All right. Now, it's going to go by really fast, but we can see now that we've got a TCP connection there out to IP address. Just the source and the destination IP and the source and destination ports. So, we got a little bit more data to work with. All right. So, the sock says, "Hey, great. We can see those connections. They're really bad. Please block them immediately." So, we can go back to our Whoop. We can go back to our tracing policy now. And we can say, "All right. We got a bad thing. Let's kill it." And the great thing is we can just uh Well, where was my bug? We can just apply it over the top. Uh so, it's going
to update the tracing policy. Uh you don't have to delete the old one. You just reapply and it's going to overwrite. Now, it does take uh when we go to sigkill, I've been seeing about 30 seconds for MiniKube to pick up the policy and push it out there. So, we'll see how fast this works.
Still out there. There we go. It got killed. Uh so, now we can see over on the right-hand side >> Yeah. >> the counter. Uh we can see over on the right-hand side there that the it's going to go by fast. We sigkilled the process and it's got a should have an error code instead of a successful uh command return. Cool. >> Woo. >> Yeah, you can take that back. All right. Thank you to my microphone holder. >> Celebrate.
>> It's getting slow now.
>> Cool. All right. We survived. So, some best practices. Obviously, what we showed in the demo there was you usually want to test before you go into enforcement mode. Uh like I said, unless you've got a really bad relationship with your developers or you really think you're good and you'd like resume generating that. Uh you know, definitely best to test uh visibility before you go into enforcement. The nice thing is is it's really easy to flip those switches and deploy uh the enforcement uh policy. So, it's not a huge lift to move from one to the other. Uh my second recommendation is definitely do CICD integration. So, this is probably something you're a lot of
you are already doing if you're working with Kubernetes. So, it's I doubt it's anything new to you. But at least what we do at Splunk is we have a pipeline that deploys the Helm chart for Tetragon, and then we have a separate Helm chart that deploys our policies. And the Helm chart is pretty simple. All it's doing is deploying the policies, but it does allow us to use that Helm logic to deploy policies to different clusters or you know, target different name spaces on different clusters. You know, the options are kind of endless there. And so, that's just, you know, we use a single pipeline that deploys Tetragon with our policies, and it just makes
life easier. And then, you know, when we add in new clusters, again, they both get the same deployment and the same policies all in a single pipeline. And then finally, we've got the Tetragon quick start guide. So, everything I talked about today is stuff that you can kind of test out if you follow the quick start guide. There are instructions for installing it on a mini cube or EKS, EKS, GKE, whatever you want to test with. And then also point out, obviously, I work at Splunk, we're part of Cisco. Isovalent is also part of Cisco. And but everything I talked about today is part of the Tetragon open source project. So, you can go home and try all
this stuff out today. Now, we do have a a paid Tetragon enterprise product. And when that comes into play is like, you know, you go out, you test it, you're like, "Hey, this is great, but I'd like some support or, you know, better SIM integration or, you know, some default policies out of the box." Cuz open source does not have any default policies. Or you might want a system D agent for all those virtual machines that you haven't migrated over to Kubernetes yet. That's where the Tetragon enterprise license comes into play. But I do want to be clear that everything we talked about today is from the open source. And with that, that is all I got. So, thank you very
much for coming out. Please be safe. You see me out there. If you have any questions,