BSidesSF 2024 - Effective Detection in Kubernetes Clusters (Shay Berkovich, Oren Ofer)

Name: BSidesSF 2024 - Effective Detection in Kubernetes Clusters (Shay Berkovich, Oren Ofer)
Uploaded: 2024-07-09
Duration: 30 min 6 s
Description: Effective Detection in Kubernetes Clusters Shay Berkovich, Oren Ofer Kubernetes attacks are on the rise and defense needs to up its game in response. In this talk we explore cluster event sources, assess cluster-cloud interfaces, and suggest useful rules to lay out an efficient and high-coverage d

BSidesSF · 202430:06291 viewsPublished 2024-07Watch on YouTube ↗

Speakers

Shay Berkovich Oren Ofer

Tags

CategoryTechnical

StyleTalk

Mentioned in this talk

Tools used

AWS CloudTrail AWS GuardDuty AWS VPC Flow Logs Falco kubectl Microsoft Defender for Endpoint

Platforms

AWS EKS Docker Kubernetes

Frameworks

MITRE ATT&CK Framework

Malware

XMRig

About this talk

Effective Detection in Kubernetes Clusters Shay Berkovich, Oren Ofer Kubernetes attacks are on the rise and defense needs to up its game in response. In this talk we explore cluster event sources, assess cluster-cloud interfaces, and suggest useful rules to lay out an efficient and high-coverage detection solution for production K8s clusters. https://bsidessf2024.sched.com/event/de2e236fe6000a973e578cde3b1a15b8

Show transcript [en]

uh what I'm excited to introduce is uh sha burkovich and Orin aler and they're going to give their presentation right now so let's Buckle in and have a great session now and of course a great day thank you folks thanks all right thank you very much for coming for joining um the next half an hour we'll spend talking about kubernetes attacks in kubernetes and uh what we think constitutes effective detection in kuber is in Cloud native environment let's start my name is Shay Shay bovitz I come with the software development environment uh background I have fun fact I hold Masters of Science in rantom verification of all the things from University of wo um but 12 years

ago I started coding for Waf and I got into the ABC and 12 years later I'm here working for wh threat research team um and since then I didn't I stopped being software developer and started being security researcher and with me today's or offer hi everyone excited to be here my name is Owen offer I've been practicing detection engineering at rtime sensor in the past eight years uh working on Windows end points Cloud Linux and naturally comines clusters prior to that I'm coming from a background of penetration testing and red teaming and now transferring that hacker mindset to the detection engineering field so happy to be here let's go let's do this so what can you expect

from our presentation well we'll start with the intro motivation um attacks in kubernetes the evolution of attacks then we'll start with the sources we'll start with the starting with the audit locks kubernetes audit locks and then finish with the runtime sensor then we'll talk we'll present the actual demo based on real attacks that we observed and we'll Analyze That demo and we'll think together how we can detect the full uh the full attack um and then we'll conclude with the with the final pitch about combining sources what constitutes effective detection in our opinion in kubernetes all right um so we promised this not there's not going to be all them models in this uh in this session

so that's that's a relief we'll talk about kubernetes attack chain first um kubernetes is a is is a complicated Beast it has its own ecosystem it has its own security domain we talked about it extensively in kubernetes security report but for for purpose of this talk we can we can simplify this right because it's still a system attacker still attackers still attack it starting with the initial AIS then there are multiple iterations of lateral movement privilege escalation perhaps Cloud pivot and then finishing with the impact either local or or or Global so let's keep that attack chain in mind and let's talk modern attacks What can you expect from them so we've seen uh several warrion Trends uh in the

field starting with the cloud component attacker snowcloud at this point um all the recent incidents including the gem security recent incident that they described we saw that attackers are proficient with cloud apis and they use Red Team Tools like pirtis and paku um so that's new uh not not new for them at this point um there are also new initial access vectors so if we take uh for example team TNT they started with looking at the exposed Ducker API sockets but then moved quickly into RCs for WordPress plugins postgre misconfiguration and abusing the leaked Cloud CS um and finally there is this trend of adoption of all techniques um our sensor research team has recently observed the

what they D pyos which is in memory execution of the crypto miners with python and by itself it's not new but the application is new so we see those three warison friends adoption of all techniques Cloud component and the multitude of initial access vectors and by the way if you want to know more about this I gave recently talk in cucon in Paris about this the various initial AIS vectors in kubernetes there a lot of them so these Trends they kind of make its heart on the sock right to because now sock needs to be an expert in everything um so us as as defense team we need to ask ourselves okay how can we

help sock to Sock analyst to to actually be successful in this in in modern um Cloud landscape scenario so we believe that the solution for Effective detection is multi-domain first is the it's the temporal domain so like tracing the attack chain and the vertical axis is the abstraction level and combining multiple sources for for detection so let's talk about those sources what kind of sources do we have starting with the sensor uh which has visibility on the Kernel level host containers then we have on the kubernetes the admission weap and audit logs and for the cloud we have of course Cloud detection feeds logs VPC Etc and more so let's let's talk about those um now as a metaphor we'll use

clusters of business which want to secure so so first we start with the kubernetes audit lock and uh which we compare to CCTV uh system that can detect some malicious activity inside the business um in terms of architecture kubernetes audit loock is part of the master node and it's controlled by kubernetes API server um the format of the events is as follows so this is not the full event but this is how it looks and and this is vanilla kubernetes audit lock event it has the three pillars username the principal in this case system Anonymous the verb list and the resource pods so we know that system Anonymous has listed pods in default namespace so this some

something that you probably want to detect um this is azure this is AKs manage cluster um the original event is now in properties. log filed and then there's a metadata but it's still retrievable it's still okay and they add another time step I don't know why um but it's I guess it's okay we can work with that and this is GK as you can see the the the format is completely different uh so yes they do change format into their proprietor format which brings us to the GES of audit login in manage clusters so first of all uh they're not all on by default only in gke the audit loog is on by default um in terms of medium where you can

retrieve those locks it's a cloud watch it's it's different everywhere of course event Hub and AKs Etc so you need to know that and finally audit logs audit options are unknown so audit options is actually the is the file that tells to the kubernetes API server what to lock which events to lock and which events not to lock and you can't control it in manage clusters it's controlled by Cloud vendors uh um which might be a problem if you want full controls of control over your clusters in that case you need to use self-managed clusters and of course proprietary format in gke which means you either maintain various rule sets or you need to normalize the events

um basically it's it's non nontrivial now the volume of the logs can get very big very fast and noise reduction is critical in this sense um you might want to to monitor only interest in actions so for example get list watch unless it's a secrets you probably are not interested in those um we recommend to we out control plane principles so in this case we remove all this all the principles starting with a system uh except system Anonymous or system service account which kind of makes sense or you can just include exclude non-interesting principle like a promis server that are known to be no noisy um so that's some of the methods to to reduce the noise now let's do a short

rule trivia uh who knows what does this rule detect anybody so for the background this is Rago on gcp event Jason and the key here to like some of the hints we have create the event ends with create the granted is true and we see the anonymous and unauthenticated in principal email which means that this is Rule resource created by Anonymous principle so this is something something that you probably want to detect it's high severity in in my view um which potentially hints that uh your cluster has authenticated mode enabled um now audit lock can't do everything uh because there are blind spots as we saw as we said it's controlled by API server so it can only

see what API server can see which means all the local operations on on the worker node are not visible okay things like static pods direct socket API RCS in data plane um kuet Lo won't see those okay now let's talk about admission web hook traditionally admission controller is thought as a as an enforcement enforcer right but in our case we don't need to use it as enforcer we can use it in the in the audit mode right so we Define it you can Define the monitor mutating or validating web hook in the audit loog and just collect those events the thing is those events are um State changing in the cluster so so in terms of coverage there's less

coverage than kubernetes audit lock and then you might you might ask ourself like okay so why why are we talking about is there's less coverage so like why won't we just use kubernetes audit loog well the thing is that first of all there are cost savings associated with less events of course um and you have full control over what you monitor because you define the configuration of that web hook and it assembles collects the events that you're interested in and it's Cloud agnostic so you don't need to maintain multiple rule sets Okay so this is option um this is how the event typically looks like uh it's called admission review because that's admission web hook um but it it's it's a

bit different than the kubernetes AIT log but it has the same components it has the principle uh operation create in this in this in this case and a resource type cluster role in this case so you have kind of still basic information to create to create detections around we would advise to start with uh Stars which means just monitor everything give me everything that you can and we'll go from there and to to reduce noise you can focus on certain research types for example pods monitor only pods monitor only create pods events we can start with that as well um another noise reduction is just reading out control plan principles or actions so in this case we're we're

ignoring the Nam spaces Cube system and Cube no list and for the rules three we we have here example of the uh which event by the way anybody knows so the key here is connect connect it means Cube C exact so we probably don't want to to um this is probably not the high severity but it could be high severity in combination with other events we still probably want to know that somebody executed into the into the pod in the cluster in terms of blind spots uh same blind spot as kubernetes audit lock in addition there's no impersonation information if somebody impersonated um the get list watch verbs are are are not there as I said there

there are only state changing events there and the denied requests are invisible because the web hook happens after the authorization and and authentication so all those failed attempts are not visible in this case okay to the cloud locks which which I think of as a neighborhood CCTV that helps to detect that uh the getaway car after the bulgary so to put kind the events together and what happened and where the the burglers disappeared um so there are quite a few Cloud logs we're not going to get into into too many of those like DNS quaries VPC locks Etc the two main one being cloud cloud audit lock like cloud trail and uh detection fits so there are there are actual

vendor detection fits like SEC guard Duty and Defender for cloud that you can use there are kubernetes rule sets in them like privilege container detected I mean you can Implement them with with your own uh detection or you can use this one of course there are costs associated with them so what we what we another option that you can do you can just use that as enrichment and kubernetes perimeter monitoring for your bigger detection solution so I'll explain what I mean by enrichment enrichment is when we want to uh to know additional information for example about acting principle history what what they did other other actions they did in the cloud privileges associated perhaps saked credentials same about

origin AP History reputation geolocation um subject resource information if something happened in the Pod who who started this pod who's the owner sensitivity associated with this pod resource Privileges and perhaps even security finding right all this information gives us additional picture um so we're not going to go into the rules triv with this ones I just put this snapshots to show the complexity of the cloud rules and the the use case here is the the the monitoring of the kubernetes perimeter so in this case series of Y enumeration attempts originated from an eks container and this one is MDS Connection in an eks worker node with high privileges so that means that we are interested in the events that

happened in the from the kubernetes reaching to the cloud API locks those are interesting events that monitor the perimeter uh that might symbolize the the the Pivot Point okay and I'm passing the Baton to Orin who's going to talk about agent Smith okay so our rtime sensor is our agent with the boots on the ground or the cluster if you will so no need to squint your eyes there um this is not the Star Wars info subtitles even though we just passed the May the 4th this is the extensive miter attack framework which extensively details the and attack life cycle and breaks it down into uh attack tactics techniques and procedures uh into single steps and the rtime sensor plays a

crucial roles in highlighting and monitoring those events so it does so by employing um behavioral monitoring of application users from the containers themselves up to the um node and host itself it monitors for any deviations in standard patterns to identify anomalies and suspicious activities uh and he does so by um monitoring specific events and we'll go uh to some examples soon the runtime sensor also have a crucial role in collecting and extracting uh crucial data for detection that can be integrated with third parties like f intels uh collecting ioc's or streamlining these events to our uh blue team Sim and they're correlating with additional detection tools another important functionality of the rantom sensor is providing with

prevention and response capabilities so once we identify something suspicious that looks like an attack it can automatically kill the process kill the container block and networ connections and more and when that happens the rantom sensor can also help us by collecting forensics package that will help The Blue Team to investigate the source of the bridge or May there was a lateral movement and when we need to keep our investigation going on so how do we get that data there are several instrumentation texts I am listing here The prominent one and some are being used from the user mode by placing hooks on user mode functions and or debugging the process as it runs and inspecting its arguments on each

execution there are uh technologies that utilize kernel module a module can inspect uh and place hooks on system wide events like system calls and monitor in internal structures in the kernel but we do expect to see in kubernetes environments the usage of the ebpf technology which kind of benefits from both worlds of the user mode and K mode it can place hooks on both and on the same while enjoy the inherent protection of that technology that safeguards us from not impacting uh the worker node functioning properly so what kind of events we would like to instrument so starting with the process and thread as soon as the process starts uh the r and sensor can

collect a lot of information such as information about the binary which will be used to calculate its hash and help us identify malware with known MERS and it can collect common lines to provide us with simple and yet effective detections of LOL beans abuse um and a lot more data so next it would we would like it to collect file events that will help us to detect credentials access um it will help us to detect priv escalations and persistence activities next we would like it to collect Network events we would like to get information about the local and remote endpoints about DNS queries or usage of sockets and that will help us to detect connections to monopoles

connections to cus Etc next we would like this Rend sensor to monitor memory events by monitoring memory events we know that hackers like to do evasion techniques and hi their PS in memory they use MFD s and other techniques so that will help us identify files and in memory pells that were decoded there next we would like to have a runtime sensor monitoring the discovery load and unload of user or Canon modules does will help us identify rootkits which will help help us identify any uh process execution hijack flows and additional Shenanigans the rat and sensor could benefit from also collecting information from the Linux audit logs the Linux audit log stores uh information about activities of services and

payments and applications that running on the system that for example We Gather could gather information from there and detect sshd printouts that would indicate Brute Force attempts and password get guessing attempts also very important the r and sensor can collect information uh from the container runtime itself and by that it can match the information of running processes and match that to the processes that are uh are coming from the image of the container and help us identify any deviation we know that hackers like to install their own tooling so that would help us to identify the container drift problem so how do we get uh our sensor so we would aspire to have our sensor on

every worker node um for that end we can utilize the kubernetes deployment that deployment types uh ensures that we will have a runtime sensor pod on each work and node with the exception in manage environments for the master node and that takes us to some considerations so as said one of the cons is that we cannot deploy it on manage nodes it's a given State another con would be um a proprietary application context for example in some organization environment the application do accesses for its benign and normal operation some sensitive files and I argue that it's not entirely a con because this can generate a productive discussion to identify why the application needs to access that um

sensitive file can we do it otherwise and if it is that the way that we should operate we can also allow list that operation another con the sensor theoretically can access kubernetes and Cloud context but that would not align with the best practices because that would require um applying High permissions to the sensor and the sensor would have to utilize a lot of the network to extract all those traffic and so that is not necessary as we can get that traffic that information elsewhere from the sources that sh mentioned earlier okay so on with the demo um we prepared a demo that is based on real events funny to say that in a movie theater and that we have seen at our

customer base um the initial attack uh access Vector is a Jupiter notebook that is off offered as a free service from some vendors so let's start so as soon as our attacker gains the access to D Pyon book he installs some tools including Cube C and starts U by interrogating the

cluster so using the O can I command the attacker quickly realizes that he has a very high permissions on that cluster he continues to find additional useful insights about the cluster like identify the name spaces that are available and also looks for any kubernetes secrets but that not yield anything interesting so at this point attacker decides to deploy a crypto Miner in the environment let's see that

okay deployment exam created so our Panic attacker at this stage would like to verify that is XM rig is up and running so he issues the getp Pod command we can also see here the Jupiter notebook that he operates from and the new XM rig P now our attackers are greedy they would like to further more escalate and PR pivot into the cloud so they search for um that are associated with the eks Pod identity rather new feature introduced by Amazon and they identify the bucket reader pod so they issue a command into that pod the cctl command to cut the file that contains the token of the EK spot

identity Now using the EK spot identity they are now generating AWS API

token now I would like to state that from this stage having a WS API token valid token they could continue and carry out the attack from any machine in the world that doesn't have to do that from inside the cluster but for the sake of the demo we will continue show it from here so using the AWS token our attackers are now creating S3 session and they now are validating that the session is uh live and valid by getting the

Arn next the attackers Now list the first

bucket now they are able to download files from the buckets extracting some sensitive data successfully completing the attack back to you thanks s okay so let's talk what we've seen here okay we'll uh we'll spread it into the stages so we started with initial access via Jupiter notebook right if we think about the detection sources how can we detect that well not really only sensor really sees the attacks that executed on the Pod but those attacks probably might seem benign so that's a tricky one okay moving on to the local Cube Cube cuddle install um this one is visible on the VPC flow locks and probably DNS DNS query as well and the sensor will see that okay so

that's better moving on permission enumeration right that Cube cutle off can I command that's invisible on Cloud on sensor because that's API call to the kubernetes API server and so the best one to to detect this would be kubernetes audit log okay and so on we're we're moving on with those stages building the full picture of the uh of the attack um and finishing with the creation of the session and reading enumerating the buckets the has three buckets right and when we think about that one the last stage it's unvisible on the cloud actually because that could be um could be run from any IP right because the previous stage was still in AWS credentials copy paste and then run it

from some kind of C2 server or something like that and another point the the there's a new that action is actually executed with another principle so how to connect all those stages that's the tricky part and it's it's it's a topic for another session on B sides uh but it's possible it's possible if you have uh there are a couple tricky stages that you need to connect to get the full uh the full view it's possible but you need the good medium that can express that logic so for example in the last last uh movement from St credentials you need to realize that okay the eksp identity file was accessed and that file actually points at that role so that's our

connection and then the next minute that Ro was active perhaps from another country you need to put this together okay and that brings us to the final slide uh of this presentation which basically I I think the most important slide that if you want to take away something that that's the one so we believe that effective solution for Effective detection is multi-dimensional the the X AIS is the temporal we want to know all this all the attacker actions starting with the initial axis through latal M Etc with finishing with the cloud pivot U and the vertical axis is is cutting through the abstraction levels from the kernel CRI kubernetes context Cloud conexs we need all of that and coupled with the noise

very smart noise reduction for cost concerns but also false positive reductions all those together will bring us closer to the effective detection kubernetes and Cloud native ecosystems and that's all we had for today thank you so much let's go round a hand thank you gentlemen uh I think we've got time for one question for folks at uh uh person gold Anonymous what are the yeah go figure right uh what are the most common types of false posit poses you guys have been seen with kubernetes detections well depends on the source depends on the source it could be just it doesn't have to be false positive it could be benign but you need to know which which one is benign and which one

not so there are different types of false positive depending on the source I guess we we we can't go too far into into those even that keep cutle exact it could be false positive well that makes sense yeah thank you gentlemen we've got some gifts for you here care of our friends at socket security give a round of applause again for uh here you go Orin and Shay gentlemen thank you very much for your time thank you very much folks thank you for coming

BSidesSF 2024 - Effective Detection in Kubernetes Clusters (Shay Berkovich, Oren Ofer)

Related talks