Kubernetes Best Practices for Security

Name: Kubernetes Best Practices for Security
Uploaded: 2024-05-06
Duration: 23 min 39 s
Description: Is Kubernetes running in your environment? Is it a bit of the wild west still? Have you perhaps started to dip your toes into Kubernetes but you're not really sure where to start when it comes to security? This is the place for you. Throughout this talk we will cover 10 (or more) best practices tha

BSides KC23:3956 viewsPublished 2024-05Watch on YouTube ↗

Speakers

Travis Lowe

Tags

CategoryTechnical

StyleTalk

About this talk

Is Kubernetes running in your environment? Is it a bit of the wild west still? Have you perhaps started to dip your toes into Kubernetes but you're not really sure where to start when it comes to security? This is the place for you. Throughout this talk we will cover 10 (or more) best practices that can be applied to help harden Kubernetes within your environment.

Show transcript [en]

real quick show hands who all right so if you're not familiar with it we're going to cover up in 30 seconds a lot of slides and is really easy so there should be no problem okay go um purpose uh it is orchestration platform basically what that means spin up container for anything that you want and it makes sure it count you know you can want 10 gives you 10 you want 20 gives you 20 we can do other things containers that's primary purpose that people need it for we're going to ignore the entire containers live in pods multi cleaners can live in a pod and then pods are created by rep refrigerated by deployment that BR jobs and so on and

so website all right so from a security perspective we care about log probably one of the more important things we care about in all of security Inns as well um there are lots and lots of different you can choose from and you probably want wrong so you're going to need them all to figure out so probably most critical here API mods they tell you exactly who what and where happened in the cluster um need application Ling just like need application Ling throughout the entire environment so engine X logs you're going to want those pass you all to random SQL log you're going to want all those as well they will exist here uh a lot of times in kubernetes uh you'll be

something like fluid D which will just scrape all of those logs uh into your log aggregation system uh same thing with network logs in kubernetes the networking is virtualized so if you're up in the cloud you've got virtualization on virtualization on virtualization just keeps going the same things with the networking there is a it's called a container network interface and all of the pods communicate on this virtual Network inside of the cluster and that's how traffic flows from one node to the next node to the next node um and when I say node if you're not familiar just think VM uh it's just a instance type between Type CL and then DNS logs who doesn't

love DNS logs I think it's about enough set there uh pain for you and application owners and everything else it's usually pretty mild for all this logging the the nice thing is that we have a shared Faith with the team that manages the cluster because they also greatly care about the logs because it's impactful to their operational uptime so in a lot of ways your journey and their journey is the same here so it's usually pretty easy to get that turned off um resource limits so we're going to go a little bit slower here um this is a pod definition file so pod remember is smallest unit of computer netes and containers live within there so if you

see uh 9 six you start talking about containers and on line 10 we say we're going to request and I'm just going start going down here we want to request memory and we want to win that 64 Megs uh same thing in CPU we're going to request a certain amount and then we're going to set our limits uh this is important because without it pods can request whatever they want and your cluster will just make it happen as much as it can uh if it's too big it the cluster won't be able to facilitate it but a lot of cases there's a lot of big computer members that get tired clusters and they'll just Happ and make it happen

for um we'll talk about how all this can be enforced because right now this pod that seeing here is what an application owner will create this their manifest file you know push it to the API server in kubernetes and then kubernetes will do its best effort to create that resource for them um again this is very virtually painless because the kubernetes administrators don't want their clusters falling over so they probably already have something like this in place um read only file uh again we're going to go the top half of that uh manifest and then on line 11 there we have security context and then below that on line 12 we have the read only file system so what this

means is when the container gets created nothing can be written on that entire disc for the container the entire volume of that container can't touch it so if someone breaks in to let's say it's an aach web server and they get p in there based off of some just man injection they can't drop a web shell in the root of the web record it's not going to be uh this can cause problems if applications seem to write things like LS disc or something like that that way they can be scooped up by fluent need to log aggor and that's where we get into the volume amount there uh it's called empty der that's what it's called so on

my 18 you call D and then we name that volume as engine X logs and then up online 13 14 and 15 we're actually mapping that into the Container so what this does is it allocates space in memory on the node of VM that this container is going to be running in and says you can write logs to this one location so when someone lands on the box that would be the only location that they could read to uh it's not bulletproof but it doesn't make things a lot harder when someone foot in the environment um this one's usually pretty painless um you do have to work with the application owners again because they can fly manest and they are going to

know their applications so they're going to know what needs to be written where um service accounts uh in kubernetes every pot has a service account there's a default service account and that gives them access to the API regardless or not if it actually um on line six we said that false that way that token's not there it's kind of a least privileged point of view if there's no reason for them to have a service account token regardless they can do anything or not then there's no reason for it to this um that one is usually pretty pretty painless it can be a bit touchy when you get into networking so there's this concept called like a surface mesh which we're

not going to touch on but think of that virtualized Network that IAL talked about a second ago there's another layer on top of it where the service shs uh and what ends up happening is there is a proxy container that you put in the Pod so you'll have an engine X proxy container and that proxy container requires a service comp so if you go through your environment and force everyone to disable your cluster is going to fall apart so you have to kind of approach that one a little bit more carefully um so it can some all right don't good a lot to go all right in L there are things called capabilities every process has a set of

capabilities and then when you get into a container the container run time where Docker container whatever assigns a set of process capabilities that it has has rights to own system basically things that it can do so whether it will be like assist time setting time it can packet captures so on and so forth this security Conta here from 11 down 14 drops all those paper uh a lot of times we get away with this sometimes there's only one or two that an actual container needs but in this case here we're saying just drop them all in there um if everything's self-contained if it's just making like a database call or something like that you can proba you

probably Justus find like this but if you need to add capabilities you can just add the additive line there on 13 and you can just add in the very specific capabilities that you need so in this case uh I have random on like ad time so time they could change the time of the container if they wanted to for whatever reason again pretty painless uh the question of enforcement like I said we're going to get to because right now everything that you're seeing if still on application owners to implement and if they don't do it there's nothing that's going to stop this stuff from um non container um it's basically just what it sounds like a lot of

containers run as roof user and there's a couple of different places where you can combat that one is at the image build process it's very hard to do it there uh especially when you have people and developers that are pulling in information and contain it straight off of the internet or dockerhub uh but if you change it here and require this in the Manifest where there's a security context that says run as Roof Run as user run non roof you're effectively changing the user that's going to run that container when it actually runs onqu the downside not be careful there's another one that I didn't include here but it actually changes the file system permissions because a lot of times if

you run into a developer Builds an application and builds up container runs it as a roof and then you think with something like this and it fails because user one two 3 4 Group 1 two 3 4 doesn't have Al it doesn't have permissions to run binary or the python web server or whatever so there's a there's another line can go down that basically says change the file system permission that way you don't problem um this one bit more painful because of all of the little got that can exist remove escalation uh so on line 12 we're still looking at the samea Pod here but we've got a loud pration what this little flag means is when a we'll just pick on uh

this is not engine so we'll just pick Onex so when Engine X spins up we to say it's running into the www user someone gets access to it and they want to escalate their information root in the container this will prevent that from happening based solely on the fact that this flag prevents any process any child process in the container from from escalating having more permissions than parent process um and so that kind of shut that down and one thing I need to talk about on this slide here where you're changing the user um when you think of container States if you're a root user in a container and you do an State you are a

root user on the underlying host so this goes a long way to prevent that from happening because if you're if you land a one 234 in container you have to either escalate inside the container the rout and then Escape it or you have to escape and then escalate on the actual line Coast um so this is just all about adding additional spes uh okay peration usually pretty famous they just turn it on no one really knows Network policies networking is wild in and they have this thing called Network policies and there are things tied to surface mesh and things like that but basically is not what policies think of them as fire uh by default everything in a

cluster uh is allowed to talk to each other it's fully open so just think of your soft switching internal Network it doesn't have both based firewalling turned on it's the same thing only this is all virtualized and kubernetes plane by forcing Network policies you can say only certain parts can talk to certain parts you can say based off label only certain BS with certain label or from a certain team are allowed to communicate and you can control Ingress and ESS really powerful also really painful because you have to make a choice as an organization who writes those rules you have your application right do they right firewall policies today is it a big stretch for them and I think a lot

of times this just becomes a giant partnership between security and application owners that way you can ensure that you've got the right policies place and because of that it's pretty painful because some people are going to nail it other people are just going to be like when you want a policy quad zero everyone can talk to me it works and you know they're not WR it does work uh images from trusted reps um so my 8 here we're pulling Alpine mon um this is coming from public Docker Hub you have no idea what's in that image you have no idea push that image up you can trust some of the bigger ones because usually they pretty

safe but we all know how that goes for open source software um so the better thing is to have your own image repository and just require images from those you can limit you can put a rule in place that says only from these four repositories or some Theory uh and that gives you a lot of opportunity to do things like vulnerability management scanning image and onboard us PR M it's just kind of thing you got to work with all of your developers to hey this is what's happening and you have to have that trustom begin with um and now for admission control this is where all of this kind of comes into place I've ended a couple

times about enforcement um this is how you do it uh in kubernetes there's a bunch of different enforcement ways you can do this uh there's gatekeeper there's various security venders there's other things there's native things but basically what we've got here are policies I'm going to hold that b off for just a second there's policies that you can create that mandate and require all of those controls that we just talked about and that's what you get into like policy like this here um and so at top we've got a cluster policy as we through it um like online 13 and 14 this particular policy is looking at all the pods and then you get into 15

and then on the next section over there 22 23 this is where we're setting you've got to have them it's default we don't say what they are they just have to be in place um these policies you can make them all kinds of different ways you can make them uh like this one here I don't have written in this one basically you can mutate what the developers push in and add this for them they're missing it and you can set Baseline to minimum expectations or you can just have it flat out reject request so when they go to create a pod and say I'm going to push this into the cluster it'll validate it against this Rule and

all the other rules we have all the criteria you have fails KCK a Dr message it says sorry done wrong try again um I prefer that approach because I don't like carrying people's mistakes Forward Forever it's just easier if they fix them begin with and then we just don't have that don't have that issue but it's an organizational choice so your miles can vary any way uh so on the Pain Scale this is anywhere from 2 to 10 because it's your you get to pick the policies you want how you want to implement them how complicated you want them do you want them to mutate not want them to mutate um you mode but this is how you would enforce

your Secor baselines for workloads that are um run time protection uh runtime protection uh everyone has an EDR question becomes how do you do EDR in kubernetes uh there's a couple different ways you can stick it on the nodes themselves uh if you're doing like eks there's ways to get it on there if you're doing computer there's ways if you're running Bare Bones sticking on the host uh a lot of the major vendors are container aware not that they're detection policies are more strict because it's in the container but they at least surface the information information like what container uh executed you know who am um another way you can AV things Bea side car uh and basically you inject

with one of those uh web policies that we just talked about those Mission policies you basically create one that says anytime a new pods created stick my security pod in or stick my security container in that same po that can work as well um downside to that if there need problem with your container spining up because didn't pay your licensing there's some weird mismatch on the host the entire PL will fail start and that means your application owners they will't be able to launch their workload so it kind of puts Security in a weird position and it's in my opinion it's a lot easier just to get on the overall host and then you can prect the host and

work um because of that if you go if you go to the host level it's really pretty straightforward just find for what process it is to go there uh vulnerability management um you need it it is what it is right it's vulnerability management um it's you need it at the kuber nettic layer that way you can understand what vulnerabilities exist in the current version of kubernetes that you're running um you also need to scan your images um the best way to do that is scan them in the pipeline so when your developers are building an image and they're using uh a Docker manifest stick their application into a container build it stick it in your repository stick it

in the cluster if you scan it in your pipeline it allows them to fix things before those CL and you also have to monitor the cluster that way you know what images are running uh and mainly that's because things can shift um so if you have U an engine I'm speaking on engine if you have an engine X web server and it's been running for six months or and it's using the same image it's 6 months old and a new one mie comes up for it you need to be able to know where in your environment running just like with any other program so you need visibility in the pipeline help developers Fast Track remediation but

you also need it at the tail end um that way you know what's running there um and then it's going be very painful very easy culture thing um depending on how you integrate developer pipelines we go otherwise could be painful um I wish I had better news for you but um if you want all of those gamls that I showed find them there um the most part nailed it one get all right uh are there any questions I got a couple minutes actually yeah not really a question or comment I like that you have kyono out there lot default like gatekeeper or something yeah the kyono I don't like it yeah gatekeeper is a lot more

complicated uh as in the way we structure the rules and the basically build a template and then you build um a file that consumes the template and that's how you get your full policy versus you just build it all in the gam and it's just done and it's a lot easier I also like that like actually apply objects so that just sits out there any other questions all right oh yeah work for yeah absolutely yeah is fun uh basically so if you don't know what faku is it's an open source designed to work in kubernetes and container environments um they have a bunch of example policies that you can you can leverage you you have to be careful in larger clusters

because it will fall over and just not the cluster St will so you have to make sure that you implement the right rules that you care about um otherwise yeah otherwise you can follow yeah so if you're open uh no yeah a lot ofers what has a lot ofers oh cool one more over here like brought up Falco have you like looked into like CU armor at all it's a newer project that came out July or June July last year g a lot of traction I've looked at it I haven't implemented in play with it at all so it's it's something I usually like put when I work with like AKs customers recomend vend stuff yeah one thing you really touch on

but the secr management for boot strapping the painfulness of that horribly um vault is pretty much the de facto with some kind of like container so it's be full there's not really a good way for it um unless you just stole a PL Secrets but don't be fooled secrets and kubernetes have just Bas 64 encoded or woman men they're not really yeah it's painful anything else yeah um I have a question about the L specifically yeah I saw that kuet can handle like loging it has and then you find a message that it will like it will out whenever this er do you somehow use something else to structure

it um so a lot of that failed itself yeah messaging to when that fails do you is there what do you keep it or store it somewhere um oh yeah so when the when the admission process fails um yeah that message usually gets returned to whatever tried to through the workload so like if you tried to P something this use control you would get that message right back in the console if you had like a pipeline for job you would get that back in the pipeline pipeline would

fail thank you all right thank you very much e

Kubernetes Best Practices for Security

Related talks