From Interview Questions to Cluster Damage: Adventures in Kubernetes Clusters

Name: From Interview Questions to Cluster Damage: Adventures in Kubernetes Clusters
Uploaded: 2025-11-08
Duration: 49 min 58 s
Description: Amit Serper and Travis Low explore Kubernetes security pitfalls discovered during cloud team hiring at CrowdStrike. The talk demonstrates how to abuse Kubernetes native features—admission webhooks, DNS, and secrets management—to exfiltrate data and sabotage clusters, then discusses detection and def

BSides NYC · 202549:5876 viewsPublished 2025-11Watch on YouTube ↗

Speakers

Amit Serper Travis Low

Tags

CategoryTechnical

TopicCloud IAM Container Security

StyleTalk

Mentioned in this talk

Tools used

CoreDNS kubectl nginx

Platforms

Docker Kubernetes

About this talk

Amit Serper and Travis Low explore Kubernetes security pitfalls discovered during cloud team hiring at CrowdStrike. The talk demonstrates how to abuse Kubernetes native features—admission webhooks, DNS, and secrets management—to exfiltrate data and sabotage clusters, then discusses detection and defensive strategies for securing containerized environments.

Show transcript [en]

one of those uh politicians with like tons of microphones at me. [clears throat] Okay, so one last mic check. Can everyone hear me? >> Okay, great. As you can hear by the sound of my voice, I'm recovering from a nasty cold. So, I'm sorry. I promise you guys, I've hit puberty. It's really confusing because of my boyish charm and looks. Okay. So, welcome everyone. I'm going to talk to you today about um some interview questions that we've asked and they turned uh to all sorts of like interesting and and and cool and funny uh Kubernetes shenanigans. Uh this talk was prepared by myself and by my co-orker and partner in crime, Travis Low. Unfortunately, Travis uh wasn't

able to make here to make it here today. So, unfortunately for you, you're stuck here with me. Um, so wait, another thing is that I'm using two laptops to do this presentation because for some reason I can't see my speaker notes here. So this is quite awkward. Here's my second laptop. It's going to be interesting. Um, so what am I going to talk to you about today? So how many people here are familiar with Kubernetes? Show hands. Awesome. That is so great. Okay, great. Because the last time I gave this talk, I had like three hands. Wow. Okay, great. So now you can also like call me on my BS. Um, so what is Kubernetes for those of you who are

not familiar with it? So, uh, there was this saying years ago that the internet is a series of tubes. And I think that today the internet is more like a series of cubes, right? Everything that we do or everything or every service that we're accessing today is running on Kubernetes. So if you're even like asking I don't know Chad GPT to generate an image for you there is some Kubernetes magic that happens behind the scenes and some uh workloads are spinning up in the back and and and this is all you're doing it you're hitting enter on your chat GPT window and things start spinning up and because everything is focused and is built with Kubernetes

there's also a lot of like really interesting pitfalls that we can that we can uh come across. So, Kubernetes is also a series of YAMLinduced headings. So, that's the way I I like to put it. So, um Travis, who's not here, he sort of got me into Kubernetes. I wasn't like I wasn't that much into it. And the amount of YAML that I had to uh that I had to uh deal with uh was kind of kind of horrible. I'm not a big fan of YAML, but this is what we have right now. the origin story completely unrelated to the previous talk uh or to the whole theme of the conference. This is just a slide that we

put and I'm happy that it's working well with the conference. So [clears throat and cough] the origin story of this talk is that uh Travis and myself, we work at CrowdStrike and uh we're interviewing a lot of people for our um cloud uh security team. Uh we're focused on building our cloud uh line of products. uh and a lot of the people that were interviewing were asking a lot of Kubernetes questions and unlike the uh great show of hand great show of hands in this room with people who are familiar with Kubernetes a lot of people uh so and that is very surprising they are coming to an interview and they have no idea what they're getting interviewed

about so we're interviewing a lot of people some of them know stuff some of them know nothing uh but the questions that we ask and the answer that and the answer that we're getting are really really uh uh uh interesting and this is what we're talking about uh today. So if you uh came to this talk looking for exploits, you're going to be disappointed. But I will show you some cool tricks uh to steal and excfiltrate data out of Kubernetes cluster out of Kubernetes clusters only by using um Kubernetes native features. So we're just abusing the system which is my favorite thing to do. So the obl obligatory uh who am I? So I'm the guy

on the left with the weird name. I'm Amit Surfer and I've been doing security research for a while now. And I said, I know my boyish charm and good looks. Uh, it's kind of hard to tell, but I've been doing this thing for 20 years now. And every time I say I've been doing this for 20 years, I feel very, very old. Um, and Travis, who is my coworker and he's not here today, he kind of made me respect Kubernetes and hate YAML even more. Uh I'm uh leading the security research uh on uh Linux and cloud stuff on CrowdStrike. And that's me. This is Travis. Works with me on my team. He's like my Kubernetes uh uh uh god. I

always come to him with weird questions and he always gives me the answers and somehow uh he has the patience for me and not many people do. So when uh we were making this talk, I saw this meme. [laughter] Wait. Um and it's so true because um this is my house. On the left is my Kubernetes cluster that's made out of mini PCs. On the right is um the reason why my uh uh power bill is very very high. I well I used to run the server up until like 2 weeks ago. It's a a 15-y old Dell that um costs to run as much as an electric car. Uh so that's my home and this is Travis's. So this is

Travis's Kubernetes setup at home and we're doing great. [laughter] Okay, so why are we here? Right, so it's a bit complicated. Um we're here to talk about Kubernetes. Now Kubernetes the first time that I saw it and when I started getting into it because I I my background is like low-level research vulnerability research uh before that was malware analysis low-level IDA disassemblers shell codes that kind of stuff and when I saw Kubernetes I said two things one this is very high level too high level for me it's written in Go um and the second thing that I said [clears throat] is holy [ __ ] it's so overengineered like there's so many components. There's so many moving

parts. You have the API server and CD which is the database that keeps everything in it. And you have theuler which is what what plans which workloads go up and down and where they go. And you have the controller manager and you have all these stuff right and each one of those components have to be healthy and working in order for everything to work properly. So a failure of one of those objects, one of those resources, one of those components in that big machine that's called Kubernetes, if it fails, sometimes those failures can cascade and cause a whole lot of problems. Uh sometimes it can even crash your entire cluster. It can make it can

render your entire clusters entire cluster. And some clusters are huge with like thousands and tens of thousands of nodes. They could all be rendered useless by simply one little misconfiguration. um there's lots of dependencies between all of those resources, right? So we have and and I'll go over it for those of you that don't understand what I'm talking about, but we have pods, right? Which are the smallest unit of compute in Kubernetes. Pods depends on nodes and nodes depends on the cublet and services depends on pods and ingresses depends on services. So you have a lot of like those dependency uh uh uh relationships between things. We also have tons of ways of introducing security or security

issues into our clusters. We have our arbback configurations for role based access controls. We have service accounts. We have network policies which are like sort of like a firewall for Kubernetes. We have the pod security context. We have secrets that we need to manage inside the cluster passwords and tokens and API keys and all of those things. and the dynamic nature of Kubernetes which allows you to like scale very very quickly and all of those things are so codependent and when something fails it can be pretty critical. So, as I said, um, let's talk about some basic terms. So, just to make sure we're all on the same page. Uh, when Travis showed me this light, I thought it's like the

death star or something, but it's actually a can of paint, >> right? So, let's talk about uh the some quick uh uh overview of Kubernetes. So, what is Kubernetes, right? Kubernetes is just an orchestration platform. Kubernetes is made uh uh solely for the purpose of starting, stopping, managing and building container uh and and and running containers across a lot and a lot and a lot and a lot and a lot of computers in all sorts of ways. It's the it's the industry standard for container orchestration. As I said, pretty much with every big service you're using today on the internet, there's a Kubernetes thing behind it. When you're using chat GPT, you're hitting enter, something spins up in the background.

The whole point of Kubernetes is to man manage things at scale. So you could have like a small two node cluster made out of two mini PCs at your house like I do or you could have like 10,000 nodes on AWS running all sorts of containers. Kubernetes can do it all. Um let's talk about our core components for a second. So in in Kubernetes we have two planes. We have the control plane which is the backend. This is what runs everything. This is the API server that you talk to when you're when you're uh configuring things. CcD, which is the database that saves everything in it. The scheduler that makes sure um okay, so this

workload should be spun up on this node because it has a GPU that's required by this workload or this workload should spin up on this machine which has a more beefy connection to the internet because it needs more bandwidth and this workload should spin up on that node because it has more RAM. Right? So kind of getting what uh you kind of understanding where where I'm going there. Um and we have the worker nodes. So worker nodes are simply computers that are running your workloads. So a worker node could be uh a physical computer. It could be a virtual machine. It could be an EC2 instance on AWS. Anything that can uh run um uh Linux and

run containers can be a node assuming that you're installing all of the Kubernetes stuff um on it. Every worker node has something that's called a cublet. A cublet is sort of uh it's basically the the agent for Kubernetes. So the cublet is running on a node and this is what talks this is what makes this the computer a node. This is what makes it a part of the Kubernetes uh uh setup. Uh we have our basic units which uh a pod. So as I said it's the most basic smaller unit of compute and we have deployment which is when we want to define a workload we're calling it a deployment. Okay now we're deploying this app into our cluster and we have

services. Services is what uh what we often like expose or how we manage what we expose to our deployment. So if we have a web server that's running a work a WordPress blog because this is always the example. So we have a service that is tied to that um to that deployment. The benefits of Kubernetes it's easy to manage and if uh everyone here again how many of you that are actually using Kubernetes think it's easy to manage. Okay. So we had like I don't know 30 hands and now we have like one guy who says it's easy. >> This is my entire point though. Thank [clears throat and laughter] you. Um it has self-healing capabilities. So

in often cases when when something breaks, Kubernetes can sort of figure it out itself. For example, let's say you have a workload that's running on a node and something happened to this node and it crashed. The scheduler will know to say, hm, this node is not working anymore. This workload, this deployment is now down. Let's spin it up in a different node. Uh we can do horizontal scaling. Oh my god, my website, my WordPress blog has so many hits on it right now. I got to spin up another container with another web server or another database or something so that I could handle this uh traffic better in uh in the olden days that was very

complicated. In the age of Kubernetes it's like very easy. It just happens if you're configuring it that way. Uh we have load balancing. So what if we have a lot of servers or a lot of pods that are running a server and we have lots of traffic and we want to distribute this traffic uh sort of evenly or in a certain logic so that um one node won't be overloaded. Kubernetes gives us load balancing for that. So the common use cases for Kubernetes obviously if we're using a microservices uh like if you're building an application with like microservices so each team can work on their own microser and deploy it separately. um cloud native. So if

you're building a cloudnative application, everything again is is is run by Kubernetes somehow. Uh CI/CD pipelines are really cool because you put you can push in a new version to git. Your CI/CD pipeline picks it up uh turns it into a container, pushes it into your repository. Kubernetes does a rollout and it updates a deployment the the the workload that you just pushed. But there is more to that. So all of those things are uh uh part of the objects. It's it's it's the rest of the objects that I'm not going to talk about, but all of these exist in Kubernetes. So as I said, Kubernetes is very complicated, very overengineered, and uh um in my opinion, not the easiest

thing to learn as someone who started to mess around with it not that long ago. But let's talk about what we care about today. This is the part where I'm looking at my um at my notes. Um, so we're going to talk about ways to extract information out of a cluster by abusing uh uh abusing the features of Kubernetes. And I'm also going to show you one of the interview questions that we've asked. Uh, and I'm going to show you how you can like cause a lot of chaos in a cluster fairly easily, assuming you have uh uh uh the permissions to do that. And uh this is one of the this is one of the interview

questions that we're asking. And this is where we're getting a lot of interesting answers, but we never got the answer that I'm going to show you. Uh, but let's go back to talk about some more Kubernetes basic stuff. This is so difficult with the with the notes on a separate computer. Okay. So, let's start from Can you see the mouse pointer? Yeah. Okay. Let's talk start to look at this from the right side. Okay. So, we have an internet client accessing a service. Again, let's use the WordPress blog as an example, right? So we have a person that's coming through the internet and if that person hits the load balancer first that load balancer routes you to a service and that service

routes you to the actual deployment. Now what we have here um is a deployment. The deployment has a pod which as I said small smallest unit of compute. Inside the pod we have a container. All of that lives in a namespace. A namespace is just uh it's not like a Linux namespace. It's just like a thing where you can say okay I am putting all of my deployments in this namespace or I am giving someone permission to deploy things in this namespace but but having something in a namespace on its own doesn't mean that things from this namespace can talk to other namespaces. So it's not a Linux namespace. A Kubernetes namespace is just like a more of like a a drawer, an

organizational thing you can put stuff in. Also a pod can have multiple containers. So of the usually people think of a pod as a container. This is a mistake. A pod can have multiple containers. Uh a good example for why you want to have uh um multiple containers in a single pod is because all of these all of these containers inside that pod, they're going to share the same Linux name spaces. So this is a bit confusing. This is we're not talking about now in within the context of multiple containers inside a pod. We're not talking about a Kubernetes namespace but a Linux namespace. Uh I can give you an example for it. Let's say that you're at home

and you want to um use uh bitter to download your favorite Linux distribution. Um because this is what you download with Berant, of course. Um but you want to be careful with where you download your Linux distributions. You don't want to do it from your from your home IP because you don't want people to know that you're using Arch, by the way. Uh so I use Arch, by the way. Um, so the best way to do that would be to have a pod that has the bitter client and another pod that uh buil like that connects to a remote VPN could be any VPN you want or any any VPN that's being shield over on YouTube and

um just by virtue of having those uh uh containers in the same pod they will share the name space and your pod with your bitter client will then use the VPN to download your Arch ISO. So this is like a small example for why you want to use multiple containers. Now um let's go back to how things work, right? So as I said, we have the control plane all the way on the left. Control plane is what controls everything. The [clears throat] control plane talks to Kublet, which is a piece of software written in Go that sits as sort of like an agent on your node. Again, a node is a computer, a server. The cublet is what is actually

orchestrating all of the all of the all of the containers and the container runtime on that host and that's what manages things. We can have deployments across in in the same name space but across different nodes. So you can have the same the same deployment that has the same pods just running across different nodes on which are on different computers on the same name space for and why would you want to do that because so how many of you know what a replica is great so for those of you that don't know what a replica is you can say I want to have a service a WordPress blog and I want it to be uh I want to have

three replicas of it meaning I'll have three pods running in case one falls or in case one gets too busy with the load balancer or whatever you'll have another two of them now because you have let's say in my home cluster I have two nodes so it'll be divided something like that you'll have like one uh node that will have one pod and you have the other node that'll have these two pods but these these three pods are all part of the same deployment and they're in the same namespace. So a namespace can cross nodes. Another thing that we have in Kubernetes is our arbback. Uh so we have a service account which is the who. We have the

role which is what can you do? And we have the role binding which says okay this guy can do those things. [snorts] Let's talk about secrets. And now I'm going to show you some things. So with secrets it's kind of interesting. By default Kubernetes encrypts secrets with B 64. So, let's say that you want to keep some secret. Come on, it's encryption. Uh, it's not funny. Uh, let's say that you um uh your deployment has some secrets that it needs to work with that it needs to pull from uh Kubernetes. You can define those secrets. You can add them into Kubernetes. Kubernetes by def Kubernetes then saves them in B 64 uh uh format. and that's how it accesses it.

Obviously, B 64 is not encryption. It's it's clear text by any definition. And um unless you have a really good reason to keep it in plain text/base 64 encrypted, you shouldn't do that. There are ways around it. I'm not going to I'm not going to talk about it uh now, but this is just to to show you um how uh easy it is to decrypt uh B 64 secrets. Again, if you're a Kubernetes user, you probably know that. If you're not, you know that now. Uh, next slide. Another thing that we have in Kubernetes are config maps. Now, config maps are basically just configuration files. Let's let's just equate them to configuration files for the sake of

discussion. And you can put these configuration files also inside Kubernetes. And you can say, okay, I want this deployment to use this config map. Because if you have a deployment that's spread across many many many nodes and you want to change something in the configuration, you only have one place to change it and that's the config map that's tied to all of those uh all of those uh deployments. A config map obviously like a config file can contain secrets or other important things. So a config map is also important. And the reason that and and this is all going to tie back together. Um, now here's the thing about Kubernetes security and visibility and Kubernetes security and visibility. It's

really, really complicated. As someone that works for a security company that makes a security product for Kubernetes, because it's so overengineered, because you have a billion things of of of a billion ways of achieving the same goal, getting visibility is kind of hard because you have the Kubernetes audit log that shows certain things but doesn't show other things. And you have all sorts of ways to get information out of a cluster, but it's not a single way. So you have, and I'll talk about it, you have like three, four, five different ways of getting information about what's currently running, what's happening behind the scenes, what's happening when you're starting a workload, what's happening when you're uh stopping a workload. And

those things don't get piped into the same place. There's no like there's no one central log where everything uh uh where everything gets poured out of, poured out to, sorry. um there's so many objects that we care about, right? So there was this gigantic list of all of the things we want to we want to monitor and as I said it doesn't really pour out to the same to the same log. So it makes visibility very very difficult. Now the runtime side which is what's running the the the workloads and the control plane are two different things. Your runtime could be cryo, it could be docker, it could be whatever you can spin up virtual machines. uh

your control plane is a different thing. Your control plane is just what talks to um just what talks to the cublet and tells it what to do or what's what's the database that keeps all this stuff. But the control plane is not what's running the workload. As I said before with my uh secret example, the clusters can hold many secrets in them. So that's another thing to worry about and you want to have visibility too. Clusters can also pave the way if you have a service or or an ingress that's insecure to a more to more critical parts of your network or your cloud environment. So for example, if you have uh uh some publicly

accessible WordPress blog um in your VPC on AWS and there is a vulnerability in that WordPress and we all know that WordPress is like the most secure platform in the world and there's no vulnerabilities to that ever. um a vulnerability from WordPress. And it's kind of funny because when you we looked at when we look at like Kubernetes uh uh um Kubernetes hacks, it often starts from uh it often starts from a WordPress deployment that was insecured and then there's a crypto miner at the end of it that's running on the it's always a crypto miner. It's kind of boring. But but what if you uh what if this U node Kubernetes node is part of a bigger VPC

in your in your cloud network and it's insecure and an attacker is in there that attacker can now access portions of your network that they weren't supposed to access. Right? So this is also something that you want to gain visibility to. Another thing about Kubernetes and this is something that um comes across a lot at interviews and we're going to touch this uh thing too is that all the attacks are relevant again. So if you are now as an attacker you land a shell on that WordPress uh on that WordPress uh um uh pod. You're you want to say okay let's see what's around me. What are you going to do? You're going to scan the net, right? You're

going to scan the network. You're going to scan a slash16. Super loud. Don't do that. But again, mass sport scans are now relevant again as if it's like 1999. Security products don't pro don't always provide the best the best visibility. So, as I said, you have you have there's like a few uh there's a few Kubernetes security products now. Um, every one of them lacks something. None of them gives you like the holistic picture of what's actually happening on the cluster just because things are so complicated. Now, we can actually begin to talk about this. Okay, so let's say that you all are evil hackers and you have cluster access insufficient privileges which is often the case. You would think you

would think that if someone runs a WordPress blog uh this whole things runs this whole thing runs under a very restricted uh uh uh service account. No, in most cases it runs with admin privileges. Um [cough] excuse [clears throat] me. So you have cluster access, you have sufficient privileges. How would you keep the cluster from running and recovering broken or new pods? Because we are now at a point as evil hackers where we just want to destroy everything. We we've we've siphoned some data out and we want to burn the whole thing down to the ground because we don't want the the the cluster admins to figure out what happened. [clears throat and cough] Sorry. to figure out what happened and

to figure out h um how to deploy like all sorts of defensive tools or how to deploy things that will allow the administrator to know what happened. So how can we do that? So I would like to welcome you all to a game of reverse DevOps Jenga. [laughter] And just as it is in real life, it is always DNS. So let's talk about DNS and Kubernetes. What I like to call a complicated hate story. In Kubernetes, we have something that's called core DNS. Core DNS is basically a DNS server that lives and serves inside the cluster. And the cool thing about it is that it allows it allows developers when they're running uh cloud/ Kubernetes native apps. It allows you to

not use IP addresses. So if you use like microservices or if you if you want to uh let's say that um you're the uh uh you're the front- end development team and you want to access the database. So you don't need to know the address of the database because Kubernetes is kind enough to give you a very easy uh uh DNS host name. Uh and and usually the the pattern is this. So it's the service name dot the namespace name. SBC.cluster.loc local and that will point you to the address inside the cluster. So very easy. So that means for you as a developer super nice. You don't need to know IP addresses. There's always this one host name you need to

know. But you know for you for an attacker you're like hm [clears throat] that makes DNS the weak point because if I uh turn DNS off everything goes to ship. So let's talk about that for a second right? So let's talk about the DNS resolution flow. So a pod makes a DNS query to my service. That query goes to a pod that runs in the cube in the cube- system namespace uh and that is called core DNS. CoreDNS then checks the internal record, right? It doesn't go out recursively. It looks inside the records of Kubernetes and it returns the corresponding cluster IP address. So that's how things work. Now let's take down DNS. So what happens if we

take down DNS, right? as as uh as attackers we can say okay I want to kill core DNS I don't want the core DNS pod to work so if you do that new DNS queries from pods will fail because your DNS server is missing it's down right existing connections remain unaffected because you're already cached in connections are al is already up you're already communicating with an IP address you don't need that so that still works DNS resolution uh timeouts will occur for new requests so if something new needs to get resolved obviously that's not going to work that's going to fail fail and applications start to behave really unexpectedly. They crash, they hang, uh they give you all sorts of

weird errors. Things are starting to fail. Uh considering that a DNS query in Kubernetes the time the default time up is 5 seconds, it's also going to introduce slowness. Pods will try to u pods will try DNS queries again and again and again and again and again and again which causes things to hang more. Applications will start showing errors like cannot resolve host. Okay, you get you get where we're going. Um, it's annoying, but it's still not a complete disaster because things are still up. Things are still working. Some requests can still make it. New request can't. Now, I have a question for you. Um, uh, last time I gave this talk with Travis, we gave out small uh, shitty plastic

trophies to the audience. I don't have them here. So, uh, you'll get you'll get the facts. So, this is what I can offer you today. I can offer you a handshake after if you want. Um, so audience question. What if the admin wants to start a new pod with a new container? So we just killed DNS. There's no DNS. What I what I told you the last time. But now an admin wants to spin up this pod that has some tools on it to see what happens. Will this pod start or will it fail? >> Huh? >> I'm I'm >> You're saying fail. >> You're saying fail. Anyone else wants to >> fail? >> Fail. Okay. Anyone else? You said start.

>> Huh? >> I'll say start. >> Okay. >> Is it possible that pod will start but the service you won't be able to like resolve the sort of assign a cluster IP to the service? >> Okay. So those of you who say that the pot is is going to fail, you're not getting claps. You're not getting handshakes. I'm sorry. My apologies. Those of you that say that it's going to start, you're right. And that happens because [clears throat] let me go over this slide and I'll you you'll figure it out. You'll understand. Okay. So, uh the Kublet's DNS role, right? It acts as a local DNS configurator for pods on the node. When we're starting when we're starting a

pod, Kublet actually tells the node to pull the container image from whatever re from whatever repository you're accessing. So, Docker Hub or your internal repository or whatever, right? this uh this request gets resolved by the host and not by by the cublet. So that's why it's going to succeed. So the pod is going to spin up, but once it spins up, nothing is going to work inside, right? So good on you, sir. You got to handshake. >> I wish I could. >> Oh, yeah. I can't. You can uh you can uh give me your email address and I'll send you something that's totally not malware. I swear. Uh >> I get enough of that already.

>> Oh, great. Okay. Share some with me. Um okay so let's talk about the role right so Kubernetes as I said it acts as a local DNS configurator for the pods on the nodes and it manages /c/resolvecon for each pod right so each uh container that's a part of a node has its own uh /c/resolve.com which is in Linux it's the uh file um that has all of the DNS configuration and that gets injected by cublet into these pods which is why things fail only inside the pod and not outside of it. I also just realized that there's children in the room. >> That's right. >> So, I should tone it down. That never happened [clears throat] to

me. [laughter] Yeah, my uh my talks usually have more profanity in them. So, you guys are in luck. Um, okay. I covered that and I covered everything. Okay. Yeah. So, node level DNS cache, right? So, uh node level DNS cache is is optional. We can have Kublet manage it, but it's optional by default. It's not how it works, which is why things which is why those pods are going to spin up. Um the all of that all of that DNS magic is runs runs as a Damon set which means every node that joins the cluster gets it. So every node that every node that joins a cluster uh cublet pushes the DNS configuration into it. Why do we do that? It reduces DNS

latency inside the cluster and it provide and it provides DNS caching on the node level. And as I said in the beginning, a a Kubernetes node is not resolving DNS queries through core DNS. It will be the couplet process that will try to resolve DNS using the node system DNS server and not the clusters DNS. Okay, so we've done that, right? So as I said, things are still running. They're kind of wonky. They're still running. How can we still destroy things? Again, we want nothing to work. So what we can do, um, and again, no one gave this answer in an interview. What we can do is we can patch that damon set. Right? I

said there's a damon set that runs on every node um that gives you DNS. So we can patch that damon set to say okay I want you to only provide DNS service to um to nodes that have a certain label on them. So we're doing this thing like the coupubectl patch damon set core DNS- N which means namespace coupe system. And what we're changing is we're pushing this spec and we're putting in a node selector that says you're only provide you're only going to provide service to nodes that have uh a label that's called non-existent and it's set to true that non-existent could mean anything just just a string that I that I uh that I

selected. So what that so now none of our none of our um none of our nodes have this label that's called non-existent equals true right. So this will cause DNS to get evicted from these nodes. So there won't be a DNS pod uh on these nodes. Thus you won't have core DNS across the entire cluster. Now I thought that this was going to work but no because we still have old replicas, right? I asked you all what a replica is. We still have old replicas that are running. So what we need to do is to also manually patch and edit the replica. You can see it here with the arrow. we just changed that to zero and

things just stop working and it's glorious because it's a cascading failure. So if you have like a cluster with tons of workload when things are starting to fail with this sort of like reminds me like the last scene in um in Fight Club where like everything explodes and the buildings collapse because that's sort of like what it feels. You're starting to see like tons of failures in the in the in the logs and it's like failure after failure after failure then nothing works then there's no logging anymore. It's amazing. Um but as you can see uh this is me spinning up uh spinning up a pod I call it test pod. The image that I

pulled was busy box just to have like basic Linux stuff and I started res trying to resolve things across the across the the the cluster and you can see I can't I can't have DNS. I'm getting connection refused to my core DNS service inside the cluster. DNS has failed. Nothing is working. How can we prevent that? So arbback hygiene super important right um which user can do what your deployment with your uh um uh WordPress blog why does it have all the permissions in the world to do anything I swear most of the most of the Kubernetes setup setups are like that every my home uh Kubernetes setup is like that um by default there's a super

powerful user it can do anything I understand if you're doing it like you know for a project to learn how Kubernetes work. But if you're in a production environment, this is super super duper duper duper bad. And this is how you get crypto miners or worse inside your network. Um, another thing you can do and I'm going to talk about that and the rest of edition are is to put a mutating web hook. And does anybody know what a web hook or a mutating or validating web hook is in Kubernetes? Oh, very Oh, great. You're going to learn some cool things now. Okay, great. So, I'm going to talk about that in a second. So that web hook is going to

monitor those things um and and will give you and we'll give you a way to handle it and I'll show it in a second. So good monitoring is the bottom line. You got to know what's going on in your cluster just because of how overengineered it is. Now let's talk about how we can let me find the notes here in my other laptop. God damn it. There we go. Now we're going to talk about how we can extrate data from a cluster. So ruining things is fun. Extrating Exfiltrating data is even funner. So let's talk about admission web hooks. So in Kubernetes we have two kinds of of uh of uh admission control web hooks.

One is a mutating admission web hook and one is a validating admission web hook. A validating admission web hook is something that acts like a judge and says this is allowed or this is not allowed. That's it. A mutating admission web hookbook can take something. So you can say okay um I am going to add a secret to my Kubernetes cluster and I'm going to save it in B 64. That mutating admission uh web hook can can intercept this request and say no no no we're not saving this as B 64. We're saving this in in a vault and we're putting it in a way where it's not easily where it's not where it's not uh uh in clear text.

Right? So these are web hooks. Now let's talk about what we can do here. Sorry it's so so so so complicated to do this with with uh two laptops. Okay, I said that. Now as I said web hooks they intercept requests. So this is this is the this is the flow, right? So uh someone sends a request to the a to the API server to say okay I want to add a secret. I want to add a config map. I want to add something. you do uh your offen off Z so authentication or authorization and then you get to the web hook the web hook picks up the the mutating admission sorry the mutating admission picks up that web hook with

that request that has oh I want to add that secret now if there is a web hook and that web hook can see all of the request that means that an attacker with sufficient privileges can also add their own web hooks are you getting to where I'm going with this So let's talk about the flow of admission web hookbox and I'll show you a demo. Right? So this is the flow. I am dreading this because I have my notes here and I need to like show you this. So okay, let's do this. In number one, a developer makes a request to create to create new secrets. Then the web hook receives the request and validates the

secrets. Then number three, engine X. Let's uh we have an enginex server that's hosting the web hook inside the cluster. EngineX receives that request, runs our web hook, runs our uh um admission and says, "Okay, this is valid or this is invalid." Right? The request and that now we're number four. The request gets either allowed or denied. If it is allowed, meaning return true, it gets persisted into the Kubernetes database. If it's disallowed, if it's illegal, right, um the user gets a 403 error and it's denied. Did it? So, let me show you an example for a validating web hook. So, what we care about is actually what the big green text. So, this is how we define a web

hook. We have in the URL you can see admission.jellyparks.com. This is the address of our uh of our uh uh uh web hook where where it's where it is. And we are also telling the web hook listen this is what we care about. We care about those resources. We care about pods and config maps and secrets meaning fire up every time something is being done to any of those resources. A pod, a config map or a secret. How am I doing on time by the way? Okay. So now let me show you a demo. Oh, the demo is here. Not this one. Not this one. This is the demo. Okay, let's do it. Okay so

this is our validating uh uh web hook. This is the code for it. Super simple written in Python. We have our app route which is /v validate. So, jelly parks blah blah blah validate. And what we're doing is we're taking from the request that's being sent to us by Kubernetes, we're taking the UID and the kind that's in the YAML, right? We are then looking for the string that that has the secrets in it. And then we're going to allow it. That's pretty much it. Now I want to show you what happens. So we're starting the admission web hook server. And this is the web hook. This is the address as I said before. And we're monitoring the operation

create on these resources. Pod secrets and config maps. So every time we're creating a pod, a secret or a config map, our web hook is going to grab that. Now let's run it. So here we're just going to delete the secret that we just added and we're going to add it again. So that's what the Oh, it says there on the bottom. It's kind of hard to see. Why is it go away bar? Go away. There we go. So we just created the secret. Let's go to the other window with our web hook. And we can see that our web hook, you see, grabbed our secrets. So here is the password. Here is the username. If this is attacker

control and this is this can be totally outside of your cluster or inside of your cluster but this thing if an attacker puts in a web hook inside your uh inside your uh cluster this sort of stuff can happen. We're excfiltrating stuff out of the cluster and we're using only we're using Kubernetes native features. So this is what we've just extracted with a web hook. Okay. This is so so so difficult. Okay. Let's do present. Okay. How can we prevent it? Again, healthy rback. So, make sure that only the permissions that are required are available to the workload that you're running. A WordPress deployment shouldn't be able to uh I don't know manipulate secrets or add web hooks.

Also, in if you're a cluster admin, how often do you change your web hooks? Do you do that on a regular basis? Do you add do you add and remove a bunch of web hooks? Most people don't. So, monitor the web hooks on your cluster. Super important. Um, if you have a Kubernetes security product of some sort, see if you can have a detection for new web hooks. So, that again depends on the Kubernetes security uh uh product that you're using. Another big question, do cyclists have to uh stop at stop signs? The answer is yes. The answer is yes. If you're a cyclist, you got to uh you got to stop at a stop sign. I almost got ran

over one day. Not fun. Um now, another thing, as I said before in the example that I gave, all the facts are new again. So, if you're an attacker and you just dropped into a shell on a pod, you want to know what's around you, you're just going to scan a SL16 like it's 1999. That is really bad, right? because it's super noisy, very easy to detect and you know suddenly your admins, your security people start seeing lots of like scanning activity and packets from your WordPress blogs. Everybody's going to know that your WordPress blog has been uh popped and also you're scanning a /16 and you're hoping to get information uh for your entire cluster. you're you're

sending so many packets just to get so little uh uh uh responses, right? There has to be a better way. So, let's talk about that. A lot of people in Kubernetes are using all sorts of services for visibility and management. So, how many of you uh Kubernetes users are using uh STTO or coupube cost or Argo? Argo is a really big one. How many of you are using Argo? Yeah, I use Argo too. Argo is awesome. Um, how many of you are using those services in conjunction with something like Graphana to have like nice graphs? I expected more of you. Okay. So, what I'm going to show you now is how you can how you can use this. Let me

pull up my notes here. how you can use exposed API uh um exposed uh um sorry exposed API endpoints on your cluster to do all of this to get a a complete picture of your cluster including everything IP addresses and everything with one single request. This is super cool and often and often uh doesn't been it's not being taken seriously enough. Right? So let's go back to the demo video and I'm almost done. So, what I'm going to show you now is we have this we have this uh pod that we're going to pull, right? And this pod the only thing that it's doing it's running this app.py file which I'm going to show you in a second. So, let's apply

this file to our cluster. We created this pod. We can see that it's running initializing. Okay. Now, how many of you are using coupost? You sir and anyone else? Okay, cool. So, coupe cost is this tool that you can run on your cluster to help you understand how much uh how much money it costs to run your uh your cluster because if you're using EKS or any one of those services, it it can be very expensive and you want to always optimize your costs, right? But coupost exposes a metrics endpoint for graphana to be able to be to so that graphfana will be able to show you a nice graph of what you have again if those endpoints are not

properly segmented and configured because as I said just the fact that something is in a separate namespace doesn't mean that you can't access it because kubernetes namespaces are just squares on a diagram they mean nothing so if you if you are not careful enough on which uh endpoints are being exposed and what's and what's open out there. You can do something like this. So, we're only going to send one request to this URL right here, which is that's what it's called on our cluster. So, coupubecost cost analyzer.cubecost.svc.cluster.loc. Remember I gave that uh example before with the internal DNS inside the cluster. So, we're going to send one request to this endpoint that knows everything about our cluster

and we're then going just we're just going to format it nicely in a table. That's that's all this does, right? We're going to put it in a nice table. Let me fast forward a little bit. So, that's what that app that we loaded is doing. So, now let's look at the logs. And boom, you can see your entire cluster with one request. No need to scan. I'm almost done. No need to scan uh no need to scan SL16s. No need to make a lot of noise in the network. You just abuse that uh uh endpoint that should not be exposed to anything else. And look, here is our here is our cluster. The name spaces on the left,

the names of the pods in the middle, right, the IPs on the right side, and that's it. So, I just showed you how you can abuse Kubernetes with one single packet. So how can we avoid that? Network policies super important. Think of network policies as a firewall for Kubernetes. So if you have uh monitoring uh monitoring infrastructure on your cluster, it's super duper duper important that a it's the it's firewall the right way with a network policy so that only what's supposed to touch this endpoint can touch it and your WordPress blog pod shouldn't because why? Um, and also a very critical question that you need to ask yourself in your organization. Who manages the network

policies in your organization? Is it the app owners themselves? Is it the security team? Is it the Kubernetes team? Sometimes in organizations, and we've seen it, those teams are like completely separate and they don't know anything. So, you have your Kubernetes team and say, "Oh, yeah. Um, we're just going to give you a namespace. Do whatever you want inside this namespace, but they don't take, you know, but they don't take care of other namespaces. They don't take care of network policies. So, everything is just open. So again, segmentation and proper hygiene in a cluster super duper duper important in those cases. Any questions? And please remember to rate us five stars. I'm done. [applause]

Thank you. Any questions? Great. Thank you guys. Oh yeah, you sir. I was going to ask cann be externally hosted at all. >> I think you you want to sir

From Interview Questions to Cluster Damage: Adventures in Kubernetes Clusters

Related talks