Amiran Alavidze: Securing Kubernetes Clusters in the Cloud

BSides Calgary50:02174 viewsPublished 2020-12Watch on YouTube ↗

Speakers

Amiran Alavidze

Tags

CategoryTechnical

StyleTalk

Show transcript [en]

uh good afternoon this talk is securing kubernetes clusters in the cloud in february 2020. my name is amran alawiza i'm director of security at desktop we are a vancouver based startup helping companies succeed in their digital transformation journeys you can connect with me on twitter i'm at airman604 as security people we're often thrown into something we have no idea about this project goes live in a week go figure out and make it secure kind of thing to me personally this is not a bug this is a feature this is a core characteristic of our roles as security professionals and this is exactly what happened to me with kubernetes we were launching a new application a

cloud-based application and kubernetes was going to be at the core of its architecture and this talk is essentially the amalgamation of many months of learning threat modeling and incrementally improving security of our production kubernetes clusters and by sharing this i hope you'll be able to benefit from our experience uh to begin with let's talk a little bit about kubernetes itself kubernetes is open source a container orchestration platform that originated at google it was designed from the ground up as a loosely coupled collection of components centered around deployments maintenance and scaling of workloads essentially applications slash containers it has very good support for declarative deployments and immutable components let's go through some of the terminology to begin with

it all starts as this is container orchestration technology it all starts with images and containers so image is essentially a template based on which the containers are built and containers are workloads applications that you run uh in in your production but in kubernetes the smallest deployment deployable unit of computing is not actually a container it's something that they call a pod in many cases a pod is a single container but it doesn't have to be a pod can be multiple containers and they can share resources across those containers they essentially have the same name space for file systems so they can have access to the same files they can see the same files and they also have access to the same

networking so they can actually talk to each other through through the localhost interface uh after po pods uh there's also things that are used to inject information into those pods and those are config maps and secrets uh depicted on the right here that's essentially a way to to tell the pod hey by the way your database is here and your username to connect to it is this and your password to connect to it is this essentially any sort of configuration type information so like nginx config or apache config and things like that and and or credentials can be injected into the containers through config maps and secrets then we have the actual scalability part of it and that comes

in the form of deployments so deployments is a concept is an object in kubernetes that allows you you to deploy multiple instances of the same thing so it manages uh a number of consistent number of replicas for the pods that you need to run and it also manages things like um things like rolling updates so if you need to deploy a new version of your containers you can do it through deployments and you can set parameters around how that's done whether you want to kill all of your old containers at once and then deploy all the new ones at once or gradually kill the old ones one by one and deploy the new ones one

by one under the hood deployments also use an object that they call replica sets replica set is purely the scaling part of the deployment so it only makes sure that if you said you're going to have three pods using this image you always have three pods using that image and if one pod dies replica set make sure that there's another pod that's being started on top of that there's a concept called a service and the service is essentially a way for the kubernetes cluster to expose those workloads that you run either to the outside world so basically your users connecting in and using your application or even within the containers you can define internal services within the cluster

as well so if you have for example a web tier in the database tier you can use service to expose the database to the web tier and it's essentially a glue a networking glue a networking and dns glue that ties those two components together last thing i want to mention is namespaces namespaces essentially are a way to logically group all the workloads all the resources within your cluster and for example by default it's it's used to uh segregate the the control plane workloads that are part of the cluster itself uh from all of your application workloads but it can be used for more things and some companies are also using namespaces to host multi-tenant kubernetes

environments and have multiple customers using the same clusters but still being segregated based on those namespaces now having said that and uh mentioning some of the terminology i wanted to cover briefly uh essentially the architecture of kubernetes uh this picture you can find in kubernetes documentation uh there's essentially two uh two parts uh to any kubernetes cluster there's one that's that they call kubernetes control plane and that's uh one or more masternodes that run system components that makes makes up the kubernetes cluster the right-hand side of that is what they call kubernetes nodes and that's essentially worker nodes that talk to the control plane but ultimately their core their core responsibilities to run your work workloads run your applications

so on the control plane there's multiple things that are happening there uh at the core of the cluster is something called fcd that's essentially a configuration database that stores current uh state and desired state all the data for for the whole cluster then there's the cuba api server this is how everything within the cluster interacts with the control plane so if you need to deploy something or if uh one of the worker nodes needs to check or report its health status this is all done through the kubernetes cube api server then there's processes within the control plane that actually do the man all the management pieces uh within the cluster uh one part of that is what what they

call cube controller manager that's uh things that manage those pods and manage those deployments optionally there might be a cloud controller management that component is responsible for interfacing with the cloud service provider so if you're running your cluster in infrastructure as a service type environment in aws or gcp or azure or any other provider and you need resources for that are used by the cluster that infrastructure as a service provider is responsible for things like load balancers etc that's the component that kind of talks using the is apis and then interacts with the cloud-based infrastructure provider the last main part within that control plane is the cube scheduler that's the glue or that's the magic that

essentially is responsible for assigning specific workloads to specific nodes based on some predefined algorithms and predefined parameters that you can set for the cluster so it uh it takes into account things like free memory and current utilization of the nodes and any sort of affinity rules that you can set for your workloads and it uses all of that to define that this workload goes to this node and this workload goes to that node so on the right hand side we've got kubernetes nodes as i said those are usually vms that actually run the workloads run the applications the core component that has to be installed on every node is called cubelet and that's the the

piece that talks to the api server and manages all of the local resources based on the commands from the api server the other core component there is the cube proxy and that's the glue that supports that notion of the service it's responsible for any sort of network-based interactions and exposing of the services within the cluster either within the cluster itself or to the outside world as well now having said that let's i want to give you a kind of an idea of how kubernetes works overall and how all of these different things uh work together to provide that kind of consistent experience

so this slide or the series of slides is what i call a life story of a deployment so let's say we create a new deployment and by the way uh this can be done either uh through uh in kind of imperative way say uh launching cubectl which is a command line tool to manage your kubernetes cluster and saying hey create me a deployment or it can be done which is usually the case in a declarative way you create essentially a yaml file that describes all of the workloads all of your environment and you tell through cubectl again usually you say okay i want to apply this end state this desired state to the cluster and then cluster does all the magic and

provisions those workloads etc so let's say we create a new deployment we run qctl either way everything in the back back end happens uh the same way whether you use imperative way of uh creating deployments or a declarative way it doesn't matter what cube ctl does it essentially all it does it connects to that cube api server and it tells it hey by the way i want a new deployment what happens next is the api server talks to that fcd database and it updates the state updates the data in that lcd database and then once that's done this call is complete and cube ctl returns back you you get your command prompt back so now within fcd

you've got this deployment object all of a sudden within that deployment this is essentially describing the desired state this is telling kubernetes cluster i want this deployment as part of the deployment there will be a number of parameters there's usually a name you can say i want this many replicas there's also a definition of the template of which of the pods so things like image name to run something like nginx or ubuntu or whatever else you're using or in most cases this is going to be some custom image that you've created using uh standard image plus some of you your code etc so now we have this deployment object in the lcd database and now comes that

controller manager that was part that is part of the kubernetes control plane the controller manager sees the deployment object in the database but there's there's no corresponding replica set in the database so it says oh that's an inconsistency let me create a replica set for you and that replica set inherits all of those parameters from the deployment so things like number of replicas things like what image to run in the pods that this replica set manages etc etc and then the same controller manager looks at the data in the lcd again and it says okay i've got deployment now i've got the replica set now but there's no pods yet so it creates objects for the pods and

again those pods will inherit parts of the configuration from that deployment so things like how many container is going to be in each pod and what image is going to be used in each of the containers and now comes that scheduler that defines where those pods are going to run and it does its magic algorithm and it runs through it and it essentially assigns each of these pod objects to a specific node so it's important to say that up until now no workloads are actually running this is all happening within the control plane and it's all all happening it's all essentially manipulating the data within the lcd database nothing's running yet and this is where that part of the magic

comes into play so now cubelet which is that component that runs on each of the kubernetes nodes uh checks with the api server and says hey um is there any new work for me do basically tell me my desired state that's defined in the database and i'm going to make sure that my actual state is the same as the desired state an api server checks with that cd and tells cubelet hey yeah there is new work and then it's it's the cubelet component that actually provisions all those containers now that it knows about the new pod that has been assigned to that node it will work with the docker as the underlying containerization technology uh which is uh what kubernetes u is

using by default though it's not uh the only option nowadays and it will provision that uh the the containers as part of the pods and it will provision any sort of networking configurations and everything else that's been defined in that deployment and this is where the actual work starts uh one more thing that i want to talk about before we dive into the actual security implications of all of this is why am i talking about cloud-based deployments of kubernetes specifically this all sounds almost magical and easy to manage and easy to use well a matter of fact is that although it does sound magical and it sounds like it's easy to use it's actually very hard to to manage and there's a

number of really difficult problems to solve from engineering perspective if you're gonna run your own roll your own kubernetes cluster things like high availability and replication of the lcd and all all the networking part all of that are not trivial problems and also because kubernetes is sort of you can think of it as sort of like a lego all of those different things that i talked about those are actually different projects that have different repositories on github for example that roll into this one coherent system when applied together but there's multiple ways to provision a kubernetes cluster and there's multiple ways to link all of those components and which means there's multiple ways of making mistakes

and there's multiple ways of screwing things up especially in the long run as you need to scale that cluster to support hundreds and thousands and tens and hundreds of thousands of the workloads so if there's only one thing you take away from this talk is i highly highly highly advise you not to roll your own kubernetes cluster and if it is by any means a possibility to look at hosted kubernetes kubernetes options and there's all of the major cloud service providers support managed kubernetes and that's definitely something that will make your life much easier in the long run so now that all of that is out of the way let's talk a little bit about the

kubernetes threat model basically what are all the things that we we need to be concerned about and for that i'll come back to to the architecture because to me it kind of visually it's useful to map the things that i'll talk about um to this architecture the three pieces the three main pieces where uh there definitely are things that relate to security that we need to be concerned about are uh two with a control plane it's access to the control plane itself access to the cube api server and access to the lcd database because that cd database is the source of truth for the kubernetes cluster if i have full access to the lcd database i can do

anything i want with the cluster including getting out all the information out of it including secrets and config maps and or creating new workloads and deploying new workloads because that's uh that's essentially how kubernetes works off the edc database cube api server again anything that happens within kubernetes is done through the server so if you can find credentials and connect to that cube api server potentially you can do a lot of damage and the third part is the worker nodes themselves you are running workloads on those worker nodes so if one of those workloads is compromised that's essentially a door into potentially this whole cluster uh let's talk a little bit about the threat model there's kind of a four

main things that uh come to mind when you think about kubernetes so uh attacks on the control plane as i said if you can connect to the control plane to the api server if you can find credentials that's one way of compromising potentially the whole cluster a second scenario is if there's a workload that we're running if there's an application that we're running in the cluster and somebody hacks into that application somebody compromised that application potentially that could be a doorway into the cluster as well uh user accounts and credentials this works together with the attacks on the control plane so credentials that could be leaked either through source code or inconsistent access controls and

things like that could potentially be used to attack clusters the last piece is compromised images if you don't trust the images the workloads that you're running within the cluster that could potentially be a way into the cluster as well kubernetes project has done in 2018 they've hired two companies and they've done a full audit of their architecture and full audit of their source code and if you're interested i highly recommend checking it out all the reports that were that were the result of that audit uh they've published on their github uh and there's it's it's a it's an interesting bedtime reading reading for sure those reports are pretty big but they talk about some of the core decisions that the project

has has made kubernetes project has made and what are the implications of those decisions uh for for the security of kubernetes clusters out of those four things the main thing that we definitely definitely need to care about is the compromised applications that's always going to be an issue because the reason to run kubernetes clusters are those workloads that are part of the cluster and if the and it's the one of those workloads are bound to be compromised at some point of time and you have to be ready and you have to analyze what that means for the security of the whole cluster itself and for the security of other things that are running in the same cluster and so this is what

i'll focus on uh throughout the rest of the presentation and when we think about a compromised application there's essentially three main avenues that can happen through main things that can happen after that the first one is network access and lateral movement if i can access that container that runs my application now potentially i can investigate what else is on that network and what else is running in that cluster and try to move laterally throughout the the cluster fortunately or unfortunately those applications are bound to have some access within the cluster they might have access to a database or they might have access to some other services that that they need to use to provide the

end user functionality or in any sort of microservices type architecture and so that lateral movement throughout the cluster is definitely a big big risk that needs to be thought through and addressed the second piece is kubernetes cluster credentials in a lot of cases there are kubernetes credentials in those workloads and we need to understand what those are and what we can do to secure them further and how they can be used to gain additional access within the cluster based on the compromised application and the third avenue is container escapes this is essentially saying okay there's this node that runs a bunch of workloads i've compromised this one workload this one application if i can now escape out of that

container and gain full access to the node itself i can potentially get much more information out of that and maybe even uh take over the whole cluster itself so let's talk about those three things uh in a little bit more detail and the first one is uh the networking networking is quite an interesting component of kubernetes in a sense that it's not a core part of the kubernetes project networking in kubernetes clusters is provided through through plugins and they call the system cni container networking interface but essentially networking uh plumbing essentially is not part of the cluster itself it can be deployed and there's multiple different networking plugins that are available and can be used

but it's it's an external dependency for that cluster in a sense at the core of the kubernetes is an open networking plugin if you look at the documentation the kubernetes documentation it essentially says that it's a fundamental underlying assumption that any node within the cluster can talk to any other node within the cluster without any sort of network address translation this doesn't mean that anything any workload that runs within the cluster will have unlimited access with unlimited network access within the cluster but that does say that it's an additional step that you need to make and you need to make sure that that's happening because by default it's all going to be open to everybody

and so that that layer of network level security with the ability to segregate that network-based access is done through something that kubernetes calls network policies it's natively supported construct meaning that you can define network policies for a cluster but the important piece here is that those network policies have to be supported by the networking plugins and not all of the plugins support network policies so why why is it important to limit network access uh to to the workloads there's several things that are happening there there uh three main ones are uh instance metadata access so we all remember the capital one data breach and how it happened that's this is essentially uh cloud service provider credentials

and other sensitive information that is exposed through that metadata endpoint the well-known 169.254.169.254. uh if a workload has access to to that metadata endpoint potentially you have you can be exposing the cloud service provider temporary credentials you can be exposing uh credentials that are used within the kubernetes cluster because there's all uh there's often uh credentials that are used throughout the node provisioning process that the node uses to join into the kubernetes cluster and in in some cases uh this escalation of this this security issue doesn't require full compromise of the application and can actually be done through uh vulnerabilities such as ssrf so you don't necessarily need full access to the to the container you

might uh use this even without that the second piece is uh the control plane access by default everything that runs in your cluster has access to the kubernetes api endpoint so think about this any workload that you have in the cluster can talk to the control plane of your cluster it's exposed by default there's encryption again by default usually enabled there but essentially that that access is something that potentially needs to be protected and the third piece that we talked about is the lateral movement so it's things like access to databases access to other services or microservices within the cluster and sometimes those microservices are assumed to be running in a secure kind of segment secure environment and

don't always require authentication but if you've got a workload that exposed to the internet and it doesn't need access

helm helm is essentially a package manager for kubernetes so it allows you to combine different containers pods and components and services into this one description that can be deployed as a single application that can be consisting of multiple components working together it's very often used in kubernetes clusters because it's fairly popular and it's an easy way to deploy uh some of the kind of dependent services that are not part of your code base that your code that your application might be using to work the the interesting thing here is that there's basically two major versions of helm version two and version three and it's very important because they work very differently so helm version 3 uses this component

that's depicted on on this picture called tiller and it's the tiller actually that's doing everything uh doing all the deployments that you run run through helm which means that it's the tiller that actually holds privileged access within the cluster it can read secrets it can deploy new workloads it can check the status of the cluster and potentially do other things as well which also means that if somebody has network level access to the tiller they can escalate that might be an avenue for them to escalate their access within the cluster to the privileges of the tiller which is fairly broad privileges usually another important thing about version two is and this is a quote from their

documentation they say they plainly say default is installation applies no security configurations to the helm deployment itself which means that tiller does not even do any sort of authentication so unless you've put those network policies in place just having helm version 2 deployed in your cluster could be a potential security concern all right the third piece we talked about is cluster credentials kubernetes supports a number of options for authentic for authentication that use certificates internally they can do token based authentication they can do basic authentication with just username and password and they also support integration integration with external identity providers through openid connect interesting thing is that kubernetes itself the core project does not have a notion of a user it

essentially assumes that users are managed outside of the cluster and kubernetes does not want to be concerned about those users what this means is users are managed through those external integrations through open id connect and you you don't manage users within the cluster itself it's an external a completely external thing that you need to be uh to be configuring but what kubernetes does manage is those service accounts and service accounts are used within the cluster to provi to provide access for things that run with the cluster within the cluster that they need access to the kubernetes api by default every pod is assigned a service count even if you don't do it manually it will

be assigned a default service account that's tied to the namespace in which that pod is running but any workload by default in kubernetes is assigned a service account and again by default credentials for that service account are mounted in the pod which means that if you don't do anything you've got a workload running that workload is compromised now somebody who compromised that workload has access to those credentials and while they're limited and mostly read-only it's still some access to the cluster that can potentially be escalated further in terms of permissions and managing access kubernetes does what they call rbac role-based access control and that is managed through uh kind of two sets of objects one is

roll and roll binding so roll is essentially a definition of axes uh access parameters with uh within the cluster and role binding links that roll to specific either a user id or a service count and there's um there's also a very similar cluster role in cluster role binding the only difference between the two is the role is limited to a specific name space and cluster role and cluster role binding are applied to to the whole cluster and the third avenue for for the threat model that we talked about was container escapes container escapes potentially can be done in multiple ways there could be vulnerabilities in the container runtime as well something like run c or the docker it

could be done through outdated kernel through essentially a kernel level privilege escalation vulnerabilities it could be done through security missed configurations so things like mapping docker socket into containers so that container can now interface with the docker and like launch containers and and ask for things uh or mounting of sensitive paths things especially things like slash etsy that hosts um like password hashes and things like that uh within the container there's also this notion of privileged containers sometimes things that you run into your cluster needs need privileged access to the host itself in a lot of cases that that's that would be things like uh log management solutions that collect logs from the host and from other

containers in some cases it will be security tooling that need to look interface with the host at the low level and have privileged access to what's happening on the host but ultimately if you have workloads running as privileged containers that is a really bad idea because that essentially means if that workload is compromised by default and by definition anybody that compromises that workload has full access to the host itself and last important question here is the question of user ids or for the containers it's it's considered best practice to run processes within containers and as non-root user id in practice though majority of the at least the public images that you'll come across uh probably more than eighty percent of

them will have by default uh the container processes running as root and that essentially limits your avenues or limits that containment and allows for easier container escapes uh it usually needs to be coupled with other things on this list in itself it's not necessarily sufficient for to be able to escape out of the container but it's more like a defense in depth type of control so now that we're sufficiently scared about kubernetes and the security of the clusters what do we do important thing here is that back to that idea of not rolling your own kubernetes clusters a lot of the things to secure the kubernetes and specifically the control plane of kubernetes will be done by your cloud service

provider if you opted for a managed kubernetes service so let's talk about specific things you can do to secure your clusters and first and probably the most important part is the network policies this is my firm belief uh as we said network policies are native kubernetes way to restrict network access within the cluster again as i said it's not core functionality of kubernetes itself that's provided by the networking plugins and not all of the plugins support network policies moreover by default if you provision aka kubernetes cluster in either azure or aws or even google by the default network plugin that they use does not support network policies what this means is even if you apply

network policies firstly they won't work they won't be applied secondly you won't even get a warning like if your networking plug-in doesn't support network policies but you apply network policies there's not going to be any warning until you test or until you verify that until you switch that networking plug-in to the one that does support network policies by default all the traffic within the cluster is allowed this is true even for network plugins that do support network policies this is the default state it's all open by default this is even more important because even if you do apply policies again by default there's no network level segregation between those namespaces as i said they're more from the get-go they're more of a

logical way of grouping the workloads and if you are relying on the namespaces as a way to segment your workloads maybe you want to run your more sensitive workloads in a more isolated environment you need to make additional steps to make sure that that namespace is actually segregated appropriately network policies support both inbound and outbound rules so this is essentially like security groups in aws this is a way to natively support network filtering and network access controls uh within the cluster this is what a typical network policy might look like uh you basically give it a name you give it the rules in this case uh this specific one opens port 6379 from a front-end or from a workload from

a container labeled as front-end to a container labeled as a database credentials we talked about access to that instance metadata endpoint um that access definitely needs to be protected you can you can essentially take two routes to be able to deal with that you can either apply network policies and just restrict access to that magic ip address where the instance metadata is running or you can use uh now all of the major cloud service providers uh they support assigning iam permissions essentially permissions within the cloud your cloud infrastructure to specific workloads so they can be more granular than just the whole host they can assign those permissions to the individual workloads what that means is even on a single node

if this workload this container makes a call to the metadata endpoints endpoint it will get different credentials than if this container makes a call to that same endpoint aws calls this i am roles for service accounts google calls this gk workload identity and azure calls it azure active directory pod identity and this is the preferred actually way to secure that instance metadata access the second part that relates to credentials is kubernetes credentials as i mentioned that service count is mapped by default into every every workload that runs as part of the cluster that is controlled through a setting uh of of the pod called automond service count token i highly recommend disabling it the reason being in majority of the

cases probably 99 of all the cases your applications your workloads do not need direct access to the kubernetes cluster they do not need to talk directly to your cluster and so it in majority of the cases is just the best to again this might be a defense in depth because by default it doesn't give much access to to the cluster but it is that additional layer that you can put in place to secure uh credentials um of of your clusters and then you would usually use um your cloud service provider to manage access to the cloud cluster itself so you'll have im users that will be assigned uh credentials through their uh like uh azure amazon or

gcp and they'll use the same accounts that they use to access the cloud infrastructure to log in and to manage the kubernetes cluster now access to the control plane there's two things one thing we already talked about which is access to the control plane from workloads within the cluster it's enabled by default again as with service count in majority of the cases that access is not required to workload so it's a good idea to limit it uh based on on using the network policy there's another side of things here by default when you provision a kubernetes cluster in all three major cloud providers that cluster will get a public endpoint public api endpoint meaning that you can if you have credentials you

can interact with cluster and you can manage the cluster through the internet hitting directly that public api endpoint again good idea from defense and depth depth perspective to only limit that access to either from your cloud environment or maybe from the ap addresses assigned to your corporate corporate network and so that that endpoint is not just publicly exposed and then the last plane that relates to control plane the last piece that relates to control plane is keeping kubernetes itself up to date this is where partially uh the your cloud provider will help uh meaning that they to some extent have that process automated you still in most cases need to initiate the update though so in ak in azure and in aws

that initiation is a manual step that you need to make you can basically you you have to tell the cloud service provider i want to upgrade this node then this node then this node and then the control plane uh in gke that can be fully automated you basically can tell gke to and say i want to keep my cluster always up to date keep it on this uh bran version branch of kubernetes and it will do all the magic in the background securing the workloads themselves so this is along the lines of protecting from container escapes this is done for a parameter called security context that can be defined for each of the pods they they allow you to specify that

the payloads the workloads in that container should run is not non-root uh you can uh disallow privileged pods so remember that privileged flag that i talked about that gives full access to the host uh you can disable privileged escalation which essentially will make the set uid binaries not work and you can disable things like host file system mounting so security context is a way to set those parameters on individual workloads individual pods or deployments but you can also enforce those parameters globally across all of the workloads in the cluster using something that's called pod security policy or newer mechanism that kubernetes is gradually migrating to is something called opa or open policy agent what that piece allows you to do is to

say nothing in my cluster can run as root user and essentially if somebody tries to deploy something and they forgot that to set that security context and forgot to run the workload as non-root user pod security policy will prevent that workload from from being applied and last piece with uh securing the workloads um is uh keeping the operating system and the kernel up to date so that essentially allows you to protect from uh kernel privilege escalation vulnerabilities uh in azure and aws that's a manual upgrade upgrade process you need to initiate it in gke even this step can be automated so in terms of keeping your operating system and nodes and kubernetes control plane itself up to

date in gk you can basically automate it end to end if that's what you want to do some more things with regard to securing the workloads uh there's options uh potentially for better container isolation only google at the moment supports one of these they they they're rolling out something that they call gke sandbox which uses a technology called gvisor and that essentially allows better segregation through the fact that the kernel that the containers see is not the actual kernel of the host it's a user land emulated kernel so it basically adds that another layer uh of between the container and the host so that the if if there even if there is a privilege escalation vulnerability in

the kernel uh chances are containers won't be able to use that because that's not the kernel that they interact with the other option here is something called kata i don't think any of the cloud service providers support this yet but the idea basically is that instead of running workloads as truly containers they essentially instantiate a micro vm for each of the containers and provide a much better isolation for the workloads uh securing your images is definitely another big big thing here uh so things like making sure that you're using trusted images uh to run in the container or use trusted images as the base image for your own uh containers and images uh scanning for known cves and

missing patches and also migrating to minimal images so basically making sure that things like compilers and shells and things like that things are not usually needed within the containers are not available are not even present in the container so if the workload is compromised this is one of the another one of those defense and depth type of controls in a sense that they they make it harder for somebody to do anything in the with that container even if the workload is compromised um couple last things worth mentioning if using helm upgrade to version 3 uh version 3 does not have the tiller that is a big headache from security standpoint they basically use the same credentials

that developers or engineers would use when they use cube ctl to connect to the cluster and to provision those workloads that is that's done through helm logging is uh is a big thing so there's some logging available to in kubernetes the kubernetes control plane with deployments and workloads etc and some application level logging and make sure that essentially anything that runs within the cluster is usually ephemeral which means that it can be it can die at any point of time so if you need the logs you need to extract logs on an ongoing basis from anything within the cluster that that you care about uh i want to spend last five minutes uh of the talk

to talk a little bit about uh what i call cloud native security solutions i'm not gonna mention any of the vendor names feel free to ask me um after the talk uh through twitter or through uh hop but what what those cloud native security solutions allow you to do is uh multiple things first part is they can check the configuration of your cluster and usually they do it using the cis benchmark center for internet security and basically in an automated way tell you the the things that uh might be worthwhile addressing they also do image and container vulnerability scanning so finding known cvs within the components that present in images and containers images is essentially done

on a registry that you run if you have your own private image registry they can scan the whole registry and tell you issues with the images in that registry and container side is they essentially look at everything that's actually running at this moment in your cluster and do the same kind of vulnerability scans for those for those containers the third piece that they do and this is the most interesting to me personally at least is behavior analysis of what's happening within the containers you can think about this as edr for cloud native cloud native clusters for kubernetes clusters they look at processes that are running in in those containers and they look at things like syscalls and file operations

and there's two kind of approaches of what they do with that information there's either rule-based approach so things like oh if bash is started as a child process from tomcat that's or apache that's probably not a usual situation and i'm going to alert you based on that so basically rule based those rules have to be pre-defined and they're usually provided by the vendor themselves or there's also an anomaly-based approach which basically says that oh i'm gonna watch this container for a while let's say 24 hours and note all the things that it's doing what processes it starts what syscalls it makes what files it opens or changes or even down to the network connection level and then

everything outside of that profile that i've built that i've learned if anything happens outside of the profile i'll alert you or block it going forward most of the solutions can either be deployed in blocking or alerting mode with blocking you basically can disallow those actions that don't meet the policy that you've defined uh some of the solutions also give you that cluster level network traffic control this is basically uh very similar to network policies my personal view is that you should rather use network policies because it's the native kubernetes way of of applying this mechanism and they also have usually forensics capabilities meaning that they allow you to extract forensics data out of compromised containers keep in

mind all of this is ephemeral and like containers die and get restarted and stuff happens to them all the time and if you want to preserve any of the information such as logs or memory dumps and things like that for instant response purposes there's no native tools that are available for that so this is the gap that the cloud native security solutions address a couple of slides there's a bunch of open source tools that help you secure clusters there's cube bench cube hunter those do checks on your cluster first one does the cis benchmark checks and the second one does a more broader cluster configuration checks there's things that can scan for cvs in images that's trivia

and falco is that a rule-based behavior monitoring tool all of these are open source and you can use them anytime for free in your clusters some resources if you want to learn more i highly recommend the first one that's listed it's called introduction to kubernetes it's a free official cncf training uh about kubernetes and it's actually pretty good and uh they they host it on edx so go go check it out it's not too long it's it and it covers quite a lot of things with that that's all i what i had for today uh as i said again you can connect with me on twitter i'm at airman604 i'm also co-organizer of the local

vancouver based defcon 604 group if you are interested we have uh monthly workshops and technical talks if you're interested you can check us out on uh meetup and with that thanks very much

Amiran Alavidze: Securing Kubernetes Clusters in the Cloud

Related talks