← All talks

Secure distroless OCI images via YAML

BSides Sofia · 20231:02:51181 viewsPublished 2023-03Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
Victor Bonev
Show transcript [en]

foreign good evening everyone I'll be speaking and doing this presentation in English and the topic for today I believe it's one of the most interesting ones in the uh kind of new trends we're going to speak about secure destroys oci images we're going to Define first of all what is an oci image what is this true list and what makes this combination secure by all means uh we also going to explore our approach to a new way of building what we have used to call Docker images via Docker files and we're going to perform this action via ammo files so we're going to focus on what the first presenter uh Bojo has talked about about the containers and how nothing is secure in the container world and at the same time we're going to leverage this we're going to look through some exploits and a lot of interesting stuff um first of all I kind of reminded myself to introduce myself and so I'm Victor bonif and I'm part of VMware carbon black team uh carbon black is the team that manages cyber security inside VMware and we kind of develop a lot of those Solutions which are the xdr MDR EDR uh words that you have heard about the previous presentations as well I'm a senior developer and I'm very happy to be here and discuss gather opinion from each one of you and hopefully you can learn something that's good and uh kind of the companies look for a lot let's go through the the table of contents so first of all we're going to focus on what is oci we're going to Define it we're going to have a little bit of a brainstorm with our view then we're going to speak about the oci isolation so we're going to do a crash course which is 101 crash course which you used to have at the University understand what is c groups what is file system and the container aspect talk about the change route or sea root and namespaces then on the third chapter we're going to show some exploits and we're going to take a brief overview of the key moments that have happened in the past year of course many things have happened I'm not going to show all of the exploits but this I'm going to focus on the more interesting ones and based on the previous presentation we had a bit of chat about Alpine Linux so we're going to talk about Alpine Linux loved by many hated by even more so we're going to understand why is that the case uh then we're going to talk about this release what makes this release image and it's yeah it is a lot of uh interesting topic and we're going to focus on the on the last chapter we're going to talk about project Micron or michaelian also it is a project that we try to develop internally inside VMware carbon black and it is our approach to solving some of the issues which we will outline during this presentation with LV together so let's talk about oci what is oci it is the open container initiative what you have all referred to Containers Dockers images that is basically oci and that's the right way to refer to the container world it is an oci image it's not a Docker image it's not a podman image it is the open container initiative that's the standard wire of you know how those things that we're going to talk about are based upon so we're going to have a little bit of introduction what is this all about you're most familiar I personally I believe that you're familiar with Docker you have heard it during this presentation you know what is it about and uh Docker is very popular historically it has been here for maybe a little bit less than 10 years yeah cool well kind of I guess that right all right so the next platform for oci images and I'm not going to refer as containers is kubernetes of course and I want to pay attention to all of you that those are totally two separate things no matter that they handle containers in the same way they're totally isolated we're going to look why is that the case first of all if you follow you know all those containers by behind the word container there are a lot of services and software behind and sub-components so we have to break down what this is all about what is behind that structure that makes isolated environments unique and why is it so popular so behind Docker there is a container G which is the container demon basically behind that's the um basically you can refer to it as an engine but it's actually a demon that does the work for you to spawn and basically handle behind the curtains uh and in preparation for the next step on the other hand for kubernetes kubernetes does not run on containerdy by default it used to have Docker shim which was removed it's no longer the case and kubernetes uses a different runtime uh demon it's called cryo or you can also see it as as CRI as well that's container runtime interface uh and cryo was supposed to be a very lightweight uh very performance oriented container runtime interface so there are more than this one uh but we're not going to observe the rest we're going to focus on those two so uh what comes after those demons uh of course that is the open container initiatives so uh dopen container initiative sets a standard of what a container is you can think of it as a protocol even though it is not but the same way of how for example TCP is defined in a big protocol in a RFC yeah it's very different but you can think of it of an analogy what makes a container you have to have a a checksum you have to have a file system you know a lot of stuff that needs to be there in order so we can call a container understandable by container D and by order container runtime interfaces it's a set of standards that we have our agreed upon and it's being a continuously developed updated so keep in mind the containers of today's might be totally different from the containers of tomorrow since all of those run times and container runtime interfaces keeps developing new features come in and that's why we need to certify pretty much every three years this holds for kubernetes as well so in order to get a certificate and say a certified kubernetes administrator I have to certify every every three years due to the fact that everything changes pretty much everything a lot of those things changes maybe not the core ones but a lot of those stuff change so what what is after oci and what comes after oci is basically run C run C if I'm not mistaken was first introduced in 2015 it was version 0.1 and this is the executable or maybe is curable is not the right word but it's the runtime of how your containers are being run in the Linux world and run C takes always a Json configuration if you have worked with containers you have probably seen that you have to provide a Docker file in order to build stuff and you have a lot of from run uh a lot of commands which are integrated and those commands basically bundle different Json files it could be a Json file for the networks it could be a Json file for your volume mounts so all those configurations and pretty much in the dockerwood uh everything is in in Json uh there's b-zone as well everything fits to run C it basically runs C executes all of those things all together as a big chain so that you can have what you have called container so far that's how it works behind the box in a short summary without the details but this would do just fine for our demonstrations later on the showcases which I have prepared for you what I want to take from our view together and if you have to remember something is always question the whatever you come across that doesn't apply only to containers and kubernetes Docker whatever you use always question how this works and by no means those are magical services or software which are unique something is happening and if you don't know the answer to it there is a very high chance that you'll be exploited why you can be exploited due to the fact that you don't know the stuff behind it simply as that this was stated by some presenter I don't know who so how it works understanding those pieces of software that do Leverage The Linux kernel system and its components alongside with it attributes is the key and as famously as the engineering from the Mandalorian have said it this is the way and this holds for everything does not apply only to oci containers to kubernetes you have to understand the concepts so we're going to talk about the oeci isolation what I'm going to do without you with our view is a crash course 101 what makes sort of you probably have heard again uh why use containers in a company or if you ask someone in person well they provide an isolated environment and my question will be oh how so so in order to understand how that happens you might have heard you might have gone to the next level and probably heard well containers are a bunch of c groups and namespaces but what I have witnessed is that the questions about c groups and namespaces also are unanswered for most part so I wanted to show you and I I know that it might not be very visible which I'm I did my best but uh the kernel future basically c groups or control groups those are native kind of features and they provide resources isolation for example imagine you want to run a process and you want to allocate only 256 megabytes of memory to that process you can do this via control group uh all of them resides under CIS slash FS C group what is unique is that you you can create your own c groups on most Linux distributions uh they're already predefined c groups those are mostly your memory management CPU uh something that's related with network something that's related with volume mounts and you already probably map the features that Docker provide are not so unique we've got some Docker or or kubernetes basically what is exposed through a lot of its Flags is actually goes to the native kernel feature which is C group of course container D and cryo which is a container runtime interface leverage this via flag and I'm going just to spell uh say out loud what is written below so I can do a Portman run or equivalent podman is different flavor of it's a different software similar to Docker I can do podman run with detached mode with uh hypnd and allocate uh minus m flag which will limit my memory usage to 256 megabytes and I'm going to spawn on nginx and latest nginx container what it's going to do is create a a separate C group under my parent process ID which is podman because podman course container D and it's it has it already knows the namespace isolation which we're going to talk in the next Slide the namespace isolation will be your container checksum and you have spawn containers you have observed that every container has a unique checksum a unique shot so basically what you use to reflex is controlling c groups and basically you're controlling kernel features there's no magic behind it and in that case if I simply Echo the value I will see that I have allocated uh like a 2 million 68 000 bytes yeah the equivalent in bytes for 256 megabytes and it's present in my file system so we're going to take to the next level and talk about namespaces as you recall containers basically namespace in c groups so namespaces what are those those are even more important those are kernel features for resource partition partitioning into set of processes um if you compare this to c groups the c groups limits the resource usage while on the other hand the namespaces limits the resource a process can see what do you mean by that imagine you go to a shell into some server and you know just simply through that I can execute again I will run a container I'll do podman run with an interactive mode again with the same nginx latest image I'm going to execute a shell this will basically uh I'm going to use the no term don't use it SSH into the Container basically involve a bone sound bone shell session and I can list namespaces so the command to do that is lsns and I can see there are different namespaces each comes with an unique ID you can see it on the left column and there is a type of namespaces so we have now understand that there are namespaces and the different types of namespaces what is listed here we have time we have user we have net we have Mount we have UTC which is related to the universal time we have IPC which is inter-process communication we have pit which we you have heard in the previous presentation you know what Piet is for sure and you have C group out of those uh namespaces have Associated bit Associated user and what command has access to what namespace so now we're going to the more a bit in depth namespaces two and we're going to talk about what those kind of namespaces are and it's very important to understand the namespaces so we can provide security for our containers if we don't know that we can say that our container is secure so we have Mount we have process ID we have Network again interpros communication IPC Unix time sharing system UTC user ID and a control group so what is odat means well it all follows that those are simply attack vectors those are the primary targets of how you can exploit a container and we're going to observe some of those exploits that do leverage this some of those namespaces not all but at least some and you can see that there are a lot of attack vectors nothing again if you refer to the the first presentation for today maybe nothing is secure let's see containers do use namespaces to partition different resources imagine I want to run a podman with a BusyBox image I'm just going to give you uh an idea what busy box is BusyBox is one of the most lightweight images it's not a distribution by any means it's a type of a Linux image where all of your essential commands that the previous presenter have shown AOS cat Echo uh then netstat whatever you can think of ipss those are how bundled into one uh executable which is kind of a a greater uh set of those commands so they are not built each one by one they're bundled into one there is busy box and there is another flavor you might have heard it's called Toy Box same principle you out of your Linux command that you have used during your terminal sessions are basically bundled into one executable and once you refer it there is something that's happening that's closely to soft links and the soft links basically refer to different parts of the binary so you can command can run safely so what we do here that's what BusyBox is and we execute a shell what this does is it's going to create a separate namespace that will allocate my hostname and you know that if you spawn a container you see some checksum or maybe you don't see checks I mean depends on how this is configured and executed through runc and containerdy but usually behind you there is a hostname that's being created Network different types of c groups and a process apparent process ID so I can list you know I've now I'm into the Container interactively so I can interact from within so I can type yes and see what my processes are and currently there is not much not much of it going on I have shell and I have PS those are both root owned by root but what happens if I execute another shell so this if I execute another shell this will basically do shell within shell and if I do PS I will see that I have two shells so what is strange about this that I'm in my child shell process but I'm able to see the parent one observe here that my process idea is number one and if I invoke shell again I have a different process ID and this is the active one which is three but then I also am able to see the parent process ID which is in that case one that's the parent from which I have originated from and even if I spawn a PS3 which is same as three for file system but for processes I'm able to see that shell is my parent with Pros id1 I'm currently running on on the child process with bit number three and I've executed PS3 and that have allocated me uh process ID 11. so it is strange we're going to see why and before we go into the detail I want to do with our views on brainstorm and there is a simple process inheritance uh on the left we have a cluster it could be a Docker machine it could be a kubernetes cluster it's some sort of a cluster where many nodes or containers can run either through container D or through cryo for kubernetes and I have basically spawned a very traditional showcase I have my backend I have my nginx it doesn't matter where they run on containerd or they run on cryo I have four containers and I have my back end I have my nginx I have front end I have some metrics for logging uh which that's how it I love to do things there was a separate container that's responsible for logging it doesn't interfere when I update so imagine here that I have exposed for my nginx port 443 and through kubernetes this is quite uh easy to do you have either cluster IP or you have node IP for your exposure there is another type of exposure but we're mostly focusing on this ones so for that case that is a note Port basically your Port is exposed and can be uh you know you can interact with it so my question to all of you is what is the worst scenario that can happen based on that picture yeah you don't have to use the mic if you have ideas please raise your hand yes so communicating with the inter-process communication that's why it is uh bad but it's not the worst but it's a great observation can we think about something else any ideas right precisely exactly so this is the root of the problem and the worst thing that can happen and there were cases like that in Amazon in Google in the major vendors for for clouds and what has happened is that you can escape outside of your container towards the cluster and now not only you have exploited like how I can do that imagine I have a vulnerability which I can exploit in nginx it's rare it doesn't happen I know but let's say it's not nginx it's software that I have written so I can exploit that it's either it could be buffer overflow it could be used after free those are two of the most commonly used vulnerabilities that are being exploited I load my shell and then I have access but I have access only towards that container so I'm isolated I can't see what is the back end doing what is the front end doing I have access to that particular service and you know miraculously their issues and exploits that can allow me not only to control that container but I can take control over the whole cluster and this is drawn exactly through exploits that are targeted on c groups and namespaces and uh mostly you can Target container D you can Target cryo and you can get leverage of the cluster and now you do not control only your own data the cluster can have thousands of notes and I don't know how many containers uh maybe about 10 000 if it's a big one and owned by the Enterprise Corporation and I can I can control pretty much everything doesn't matter I have hole control so I'm going to show you two of those exploits which do leverage Linux type of vulnerabilities the first actually the both of them were discovered last year uh the first one is CV 2022 0847 the second one is CV 2022 1085 and what is the CV just so we are on Common Ground that's the common vulnerability exploit so notice the word common those are vulnerabilities that we know that do exist there are many that we don't know that do exist so what are those exploits the first one main cause it is initialized by buffer flag variable and this is a future that's uh in the uh in the kernel it's back in the kernel that can be exploited basically page cache is always credible to the kernel imagine I have to write a specific file but I do it in such way and I can write a c program which I can write a cache but while writing the cash in a specific way of course I'll show you how this looks like I can write a cache without the kernel checking for my permissions so this came actually from a ticket the exploit is called uh dirty pipe you you have seen it probably in the previous presentation one of those presentations it was called dirty cow so ba