Cloud & Container Security: A Defensive Puzzle from Pipeline to SOC Operations

Name: Cloud & Container Security: A Defensive Puzzle from Pipeline to SOC Operations
Uploaded: 2026-05-02
Duration: 56 min 2 s
Description: A defensive deep-dive into cloud and container security that treats the ecosystem as interconnected puzzle pieces—from application code through the kernel to SOC operations. Barker explores how security engineers consume telemetry (eBPF, Kubernetes audit logs, API calls) to detect breakouts and late

BSides Exeter · 202656:025 viewsPublished 2026-05Watch on YouTube ↗

Speakers

Ash Barker

Tags

CategoryTechnical

TopicCloud IAM Container Security Detection Engineering

TeamBlue

StyleTalk

About this talk

A defensive deep-dive into cloud and container security that treats the ecosystem as interconnected puzzle pieces—from application code through the kernel to SOC operations. Barker explores how security engineers consume telemetry (eBPF, Kubernetes audit logs, API calls) to detect breakouts and lateral movement, then discusses the human and technical challenges of operationalizing detection across fragmented teams, including the risks of AI-assisted triage and the importance of keeping security analysts engaged in foundational learning.

Show transcript [en]

So today going to talk around cloud contain security primarily from a defensive standpoint. By that I mean not just about how we build it but more about what do we do with some of that information? How your sock or security engineers consume some of that information and how do we obviously avoid as much as possible things like alert but So this might look familiar to some people. Um the initial one I did besides London on the rookies track a few months back. Liz Rice dropped her second edition of container security literally three days before I did the talk. Uh she coined this puzzle piece. Not going to take her credit for it, but it's a

really really good baseline for helping people understand when we think of container security. There's lots of different pieces involved. So we think about actually there's the application security itself that we're building. We host it on something that be cloud your own hardware. We've got the actual whether it's Kubernetes, Docker, K3, some form of container runtime that we're looking at. And then we've also got our assurance friends. Let's not forget they still play a part in all of this. They give us those requirements. And then we've got things like our runtime. So what's our application teams doing? Are they looking after our users? Are we running databases in there? What's happening with those? And quite often those will separated across

different teams within an organization, especially really large footsy companies. They will have five or six probably different independent teams working across different areas versus smaller teams. So smallme companies they're really going to have a hard time with this because you get one person say doing all of your security in sock they're doing security engineering they're hardening our platform so it's definitely a puzzle and because it's so fragmented because there's different teams that creates a lot of human problem in there through no one's fault like human behavior is human behavior we all try to do the best we can when we're defending but if I can't talk to being said, if I can pick up the phone

when something goes wrong to query it with someone who actually knows the system inside out, that's really really tough on the person that's picking up the phone to make that call. So, starting at the bottom, quick show of hands, who understands what a kernel is? Bought the first commercial one to the UK market. >> That's good. This makes this a bit more fun. So when we talk about container security, the very first thing people think of, especially in the application tier, I've been there, worked with them. I'm building my container. It sits in a pod. It's its own thing. It's happening. It's out there. It's completely isolated. No problem. But from conceptual point there, protected building is the example.

We're in a room that's like a container. We're on a floor plate which is a pod but we're all using the same utilities. We've all got the same light source, power electric kernel. So whilst we think we're kind of isolated in this bit, actually it's all part of this bigger thing and the kernel is the underlying bit sits on top of the CPU that makes all of this work. So even though I'm giving this talk here, there's the red team one, there's the main auditorium speech going on at the same time, actually all of our recordings, I'm assuming, are all going into the same place and someone can see those. So few of the really interesting ones

just to give you an idea of how this comes into play. There was a race, sorry, a race scene vulnerability last year. So this was race condition called race because it's all about timing. gets exploited a tiny little microscond gap from when a container is initialized that goes down to the kernel and comes back up. Someone doing something nefarious in the kernel in the container could raise that response back to the container to mount the host file system. So they're mounting the thing that the kernel sits on. sorry the node level. Now that lets them then break out to the node which is then very bad because they can go actually I'm now just going to

snoop on whatever the red team talks talking about right now. Second one again another type of breakout there was this year or C bypass. So the way this worked was when you're making calls to the cubin API that comes again back down through the kernel goes off. So the Kubernetes API comes back and cross the cubin API can only look at so much. So it actually truncates the call. What they did with your Z bypass was pad the call with over a megabyte of random blob that was then evaluated. It goes hey nothing wrong with that. comes back through the kernel. Actually, because the kernel's got the whole thing, it's only handed off the truncated version

API, it then executes on the whole payload. So that means thinks it's good. Actually, what we've sent to the kernel is something very malicious. And again, that lets you then break straight out into something else. So whilst we like to think they're isolated, unfortunately there is a bit of flaw in the system which is we can break at the kernel level. Now there are controls you can put in place. There's various other bits you can do. So using Linux name spaces, C names to restrict how much resource. So I can't see all the electrics from all the other rooms. There are ways that we can help do this and help protect against this. Is your specific view of

containerization a Kubernetes style of model? Because there are other ways of containerizing the system, but they have less dynamic capability. So you can actually kernel a VM, right? Colonel could be the VM. >> True. Um and yes, and you can again it depends on your setup you're going with. Most of the time when I talk about this, it's because I'm afraid to get off public cloud. So AKS, EKS, GKS, public vendors. So public cloud. Okay. >> It's also worth saying if you are really really worried about this type of thing and lateral movement that might not be the right way to go. So like you said you can run it as your own cluster with

its own nodes having a dedicated workload and actually you segment at both the workload level you can segment at VM level you're justating the >> the containerization from the security point of view is a little bit at the mirage >> and there's a because you're going through the kernel level you've got a priority escalation opportunity that can be exploited >> that crosses over the container ization layers. >> Yeah. >> I get you to go and do this. >> You got this. So, if we take a step back, let's think about it from a business, from a user standpoint. The user doesn't see the whole technology stack underneath. The user sees a web app or an interface

somewhere that has been built somehow by our good developer friends, software engineers, testers. You've got product people in there. How does this all get effectively from a whole load of code, bits and binary on one side into a container on the other side instead running? Well, typically that goes through CI/CD. If you're being naughty, just use a helm chart and deploy it manually. You might just be writing Notepad++. There's various ways to do it. What we start to then break down is this whole pipeline is a control face. So, as I'm writing code, if I'm talking shift left, you can have plugins into your IDE. You can use AI in there. You can use all

sorts of plugins to do dynamic SAS as you're writing your code. You can use things to help you maintain all your package files at the same time. Fundamentally coming back as you start to move through the first major headache you get is around one open source. Pretty much everything we do now has some form of open source. Now, hands up, how many people heard of the Trivy hack that went on recently? So, those that know or don't know, sorry. Trivy got breached. Trivy is an infrastructures code scanner. Does quite a bit more, but it identifies vulnerabilities in your code. It looks for credentials. It looks for secrets that are embedded in there. It looks at

your cloud configurations to make sure you haven't misconfigured your cloud. Team PCP went and basically targeted Trivy as a piece of software supply chain. Uh it's the most popular instruction scanner out there. I forget the exact stats, but it's staggering and it gets even worse in a second. What they did was they rebuilt the trippy binary itself. Runs exactly as you'd expect. Gives the results back. But on the back side of that, it's then scraping git pack tokens. So scraping all those credentials out the system because they got hold of a developer credentials, rebuilt it, that gets pushed up into the cloud. So there's a whole series of trivial versions that are affected that then got deployed across basically

everyone's estate. So anyone who's pulling the most recent version probably got hit. That's where it gets worse. That's chain cascades. So they've used canister worm. They've been targeting inside check marks. So check mark got hit as a direct thing of triv. It's then hit notebook LLM. So sorry notebook light. get the right one that has something like 83 million downloads. So, we're not talking small things. And now they've gone on. They've now just hit Git Guardian all from the same open source thing at the top. That is one example. Okay, that's a really catastrophic example, but it is a major problem with all of this open source. How do you secure it? part that comes

into return repositories but specifically for containers and trying to sort out all of this actually you're looking at ephemeral hermatic runners the idea being even in the trip case of trivy this runner that builds the code happens in a sandbox with no internet connectivity at all it can't talk out it can't go anywhere that means we can build it in isolation we have some confidence that whatever's happening in that build is what it should be then go through all your standard things. Um I couldn't get a page big enough to put all the different things on, but some of the key bits around are you doing your SAS test? Are you generating rest? Are you generating your

SAS and DAS and all your other security testing talking? What do I do with that information? Most of the time that sits with application. The application team are looking that and fixing that. But that starts to be the very first signal that a sock and defensive team can then use to actually go and harden their environments, lock them down and actually use providence checking later on. So there's a really really good thing you can do with your pipelines called a testation signing. So every event that runs in a pipeline generate some testation says who it was that ran it, what it was that ran it. So was it a metic runner? Was it someone running on

the end point? what were the results and gives it a nice cryptographic signature that then you can send up into something like record. So that's part of sync store. There's some other tools out there. I'm CNCF guy. So that's where I said record that basically forms an immutable log of everything. So kind of similar to blockchain everything that happened to that individual piece of software. all the tests that ran, all the tests that passed, all the tests that failed, that all gets signed and stored in one place against this child. So you can then go back and create later gets even better because you can then look at that using something like registry and slightly further down VSA.

So you're looking at the testation that's provided to say did it pass everything that it should have done. verification summary testation quite literally did you pass my sbond policy do you have critical vulnerabilities do you have all these other things and then based on your individual environments you can start to say yep you can deploy there or no you can't deploy there so critically thinking of how do you harden this up from a security point for your production environments if it has passed all the policies just no go you can't deploy versus if I'm deploying to develop sandbox. Yeah, fine. You can deploy there. It might look slightly weird, but go ahead. That's also the record and the

way it can be used at that core nexus point is one of the key bits that we can use for the security op center. How does this play out in practice? We got our lovely friends on apps doing the good things. They're generating sbombs. They're following all the science and guidance we give them. On the left hand side, we've then got a sock. They're coming in probably alert fatigued. How's that play out? Actually, the very first thing they need isn't all the security results and all the scan results that are coming in from have you done your SAS? What's in it or towards critical vulnerabilities? The more simple question is did we build it? Did

we sign it? Did it pass all of the things we expected it to be signed rather than necessarily trying to interrogate everything under the hood at the very start. So start really really simple. Then how do we actually make use of that? Well, because of the way it works. So record will give you a proidence chain. It gives you the sh gives you all that core information. We can join on with things like cmdb. We can enrich it automatically. Right answer to it manually. Pull in your cmd. Pull in your change logs. This thing has happened at this time on this asset. We know what the asset is. But do we know the context of it? Is it business

critical? Is it a backend system? Is it the last talk? Is it a print server that's running? Because that will then start to dictate how quickly and how often we react to something. And one of the key bits, I say this, having worked a lot on the apps side, is it actually a schedule change? Because a lot of the time we can be doing what we think is a scheduled change and we've not told the sock for whatever reason. changes go through your ITSM for a reason through your change management system. From a sock perspective, if you don't have that, you're always going to be fighting what looks like something bad is happening. And quite often it's

actually yes, we're doing something or we're going through a change, it's gone wrong, we're having to log in and do things manually to a database. It becomes one of the most critical pieces.

So going back to the point going to talk public cloud. Um this can apply and the same principles do apply to physical infrastructure whether you be running it on something like Raspberry Pi at home public cloud, private cloud, physical tin. What are some of those guardrails and sensors that start to feed into the sock? So obviously we've talked about there is a container, it's built. Now where's it run? Where does it sit? What's it actually run on? And what I see some of the key concerns that sock should be looking at. First one being identity. I think pretty much like most other people have said identity is pretty much key to everything. Nothing should run without

verified identity in an ideal world. Zero trust being the hypothetical utopia that we'd all love to push towards. That's difficult. It's complicated. It requires a lot of context. Where do we want to know? What are things like surface accounts using? what is happening with my identity accounts? So, is someone logging into AWS from a random geographic location or are they logging in to get a secret to do a deployment that they've never used before and haven't used in the last 30 days? Those are the types of things that we can pull out and start to feed into the sock. Now, caveat with identity that's quite small in terms of log size, but we can go a step further. What about

organizational policies, controls and guard rails, things like cloud trail, cloud watch, all that event logging at the actual management plane is a huge amount of volume. So you're probably not going to ingest all of that to stop, but having it there for threat hunters to be able to use, having it there if you have the instant is helpful. As always, it comes down to money cost though. How much can you afford to actually create and generate and how much can you afford to store with things like this encryption being absolute key? If we're running everything encrypted, we want to know who signed what, who's using which keys. Those are the types of things we want to

turn on. CSPM again, it's an overlay that then lets us see into that world, but it costs a lot of money to run. Where a lot of public cloud providers actually have a good thing for us is things like guard duty defender that security aggregation layer. We don't need to integrate all of that information into the seam sock. You can use it at the cloud layer to help protect that and then stream certain events from there into your seam or into a data lake if you need to hold for regulatory requirements different things. So say you want to know what your security posture is. You're got something like 7,0001. You want to know on any given day

actually have I got my audit that my identity is properly managed. You can stick that into a b storage. So where's this now layer on top of if this is the foundational bit everything uses identity, everything uses encryption, everything uses firewalls. On top of that sits our Kubernetes. I'm using Kubernetes as this example. And again, we go back to everything is sacked on top of that one singular point. So underneath all of this, even if you're running on AWS or EKS, majority of cases, that's going to be singular VM. Again, depending on how you segment it up, that can limit and change the exposure blast radius. One of the scary points is according to CCF something like 80% of different

companies aren't using any form of network protection internally. So I build my human cluster. I build a cluster that's sat out there on the internet. I've got my great got my net fire. I've got my internet gateway that sits there and protects the edge. But internally pod pod to pod can enter container even in different workloads generally isn't protected and lockdown. And that means if someone does get inside, they don't get a bit of field day into they can query across. They can go all over the place. That's where things like service mesh start to come in. So it allows you to sidecar all of your containers to say this specific container with this ID and this

signature should be talking to this one over here. It allows you to start putting in granular rules that you can flex and say actually I want to dying everything into this database apart from this API and it has to have this signature and this authorization to come and do that. So it's a way of being able to start breaking out everything that's happening what's happening at what time put in network layer in the same way that we would have done traditionally in our data centers years ago and still work with nextg firewalls. We're just applying it at the micro level effectively to the network interface attached to each one of these containers. Second one here is admission gates. So

using admission gates and policy controllers. So admission controllers actually stops a lot of really bad stuff happening. Going back to you build the container, you sign it using policy code, whether that be at the infrastructure layer or at the deployment layer to say if you don't have a valid signature, you're not being deployed. That works really well for blocking known bad things coming in. You can also flip it on its head. If I know that something bad is inside a container and that is spreading and I know what that container is and that signature, I can flip the actual admission gate to say tear all these things down in real time. I'm not saying that's a good thing or right thing to do

in terms of instant response, but it gives you another tool in your arsenal if you are playing whack-a-ole with someone and you do need to get them out.

So going across we talked a lot about the higher level things but actually what happens at container runtime what's happening with the container itself now best in the world we can all build secure applications we deploy them out there's a lot of open source attacks going on right now it would not surprise me if that skyrockets again so last year it's about 800% increase according to Verizon hes to think if that's going to be that number. Again, if we're talking about how the application runs and where it goes to at the kernel level, what we start to look for is something like ebpf. So, enhance Berkeley packet filtering. This is probably one of the most fine

grain but also noisy employment logs you can get. But it will tell you everything that is happening with the container. So the example I use is stuff like EngineX. Engine X difficult proxy reverse proxy sits out there internet basic. There has been numerous issues with it over the year over years. Previously I think about four years ago they made a version that actually had to be run as root. That makes this even more interesting. What we can start to look at is EngineX is running normally. It suddenly makes a random SIS call that we don't expect. So it's calling down to the Linux kernel and going hey I want to open a shell. Why is engine X opening a shell? Well

that's logged at the kernel level. So it happens whether you look for it or not. The colonel is always going to register that and always handle that. Next thing it does edge next is doing curl depending on your telemetry and what you've got. Hopefully most people would see a random curl to a random domain on their firewall. But firewall is really busy and gnarly and there's lots of information in there. Whereas what BPF does, it goes this process which is engine X spun up a new one and now it's trying to do another thing to do a curl and now it's trying to do something after it's done the curl and it starts to build that

into a hierarchy. So everything is joined on. What it means from a security op side is either we can start looking at this in real time. So there are tools out there like Tetron, Falco, various others that use EVPF to say actually I don't want it to do those rules. So block it and the most advanced ones can block it based on behavior what's known bad and then good. You can put rules in there to say actually this front end web server should never do anything like reaching out to some of the other areas. So it gives you a really really fine grained this is what's happening but again sim some of the other talks going on it's too much

just to pump that into your sock and just go you know what have at it would a cost you a fortune and b you need to understand the Linux you need to understand the process tree you need to understand the natural behavior of what is normal container so that's where the idea of defense defense graphs come in. Uh I use the word defense graphs because it's a data problem. We've got lots of different tools, hence the defense bit and graph graphql kind of influenced my thoughts on this. When we're talking about data, a lot of people probably show hands. Who's heard of a data fabric or data mesh? Good news. The idea being I'm working as

a data engineer. I can start quering something from one place. I can see it even though the data is residing in somewhere else. That's the same type of logic we need to start taking from a defense side. Slight caveat with that is or the different tools speak slightly different languages. They all format things slightly different. So that's where something like open telemetry in the middle comes in. It standardizes what everything's called. It normalizes it to the same naming conventions to the same formats. It helps baseline everything into the same way. So what we can do is we can take all of our EPF events. We take things like our Kubernetes logs and audit logs. So those by default are

turned off in AWS NF because they're so noisy, but we don't want to ingest them to the scene. So we can turn them on, stick them straight to S3, they go out into Glacia. We can then use a collector to use open telemetry to then help query out those different bits we need. Then we're going to pump the most important stuff. So not the actual RO logs, not the events. We're going to use some intelligence on top of that. So correlation engines, take a pick. There's a lot of them. There's lots of different ones you can use, but you're taking the data from multiple of these points on the left. So has something happened at the current level?

So EBPF gone, it's trying to spawn a shell. I've then seen a Kubernetes or sorry an API call that's going to STS and AWS to get an identity and a service token and then I've seen a weird network thing in the flow log. If yes, that's probably an alert we then push to the scene. But you're not pushing the raw logs around. You try to keep it as compact as possible. It's about trying to pick off and layer off the different ones at different times because otherwise we'd all become overwhelmed. Also from an application side we can start taking those application ref logs type of H logs that we get that can also feed into this. Most application teams

are handling that. They'll have API monitoring. They'll have database monitoring. We take some of that information actually enrich what we get from the sovereign team. How's this play out when you're actually doing it on the ground? It's difficult. Containers by their nature, cloud by its nature is becoming more ephemeral. Hesitant say more ephemeral. But even now we're seeing more a shift to tools like Fargate where it's completely serless based approach. We aren't having persistence there. Makes some of this really really difficult for actually going and investigating and then doing the cleanup. When you first get that alert in Gartner is saying now probably by 2028 50% of what's done by a line one sock analyst

will be done by AI. That's a scary thought for two reasons. One, you're giving AI a look at all your core information I've just talked about. EVPF is these are the process IDs of everything running. Your CMDB tells it exactly what are your most critical assets. You've then got all of your application telemetry. Now, depending on how the other talks are saying, if you've misconfigured your URLs and you're locking all the tokens into URLs, it then can see all of those. So there's a lot of consideration with that. Second one is actually a lot of people starting line one in sock. That's how they learn. That's how they develop as security analysts and they mature into our threat

hunters of tomorrow. And if they're not necessarily getting into the weeds of how does this work, they just take it from the AI. We might be starting to lose some of our critical thinking skills that we really need. So level one is more around the data enrichment. Do we have the right data to understand what's going on? Like we said before, do we have our CMDB things? Do we understand the criticality? Do we understand where this is and what's happening? Is it in our public cloud or is it in our on prem? Is that public cloud connected to the internet or is it behind a firewall that's locked off 57 layers deep with bunkers? Then line two becomes more around what's

happening at the process level. What's the container doing? What is that process doing? function doing? How do we pick that apart and start to work out is this normal behavior that specific thing or is this something new and novel? Is it someone's exploiting a zero day and trying to do something? Is it they're trying to steer code execution. Then tier three is more looking at the impact of things. So revoking credentials like revoking a service account whilst it helps lock things down that can completely stop a different department from working if they're looking at certificates and rotating certificates doing certificate revocation. Again that can lock entire systems and even enterprises offline if you think about certificate pinning from

an external next vlog inspection point of view. So they're really focusing on those hard points. And then we also get into things like what happens with how are we going to do forensics for layers one and two. Forensics is probably almost out of the question just because by its nature container might be there for five minutes or less. You have to make a decision. Are we actually going to keep the container and then we preserve some of its information or do we kill it because it's doing an active data expert? That is a complex decision that no one individual should be making on their own. That needs to be two people consulting to make that decision. Not at

least because of the legal ramifications for preserving and tampering of evidence. Defense lawyers love it. um having seen that been through that if you go and touch something even a container to preserve the evidence you do it in the wrong way you've just contaminated it so that's also where having that secondary log aggregator with things like all your EPF logs really helps because even if you do make a mistake you've still got a secondary load of evidence there are tools out there to help you do containerized forensics so you've got checkpoint restore that does basically the container level freezes it puts it in time suspended but again you've frozen it you could put a network rule

around it so that actually it can't connect it's just frozen in time what's happening inside still happens but it can't talk out it can't do anything it's now isolated even from the node itself then similarly actually these are all running on a VM or a node somewhere we need to go and think about forensics on the node as well which in the cloud responsibility model is quite challenging to be honest because unless you're hosting it yourself, Amazon don't like you exactly going and imaging their own discs. If you're running self-hosted, that's great. You can go image the memory of that VM. You can image the disc rack, etc. Then how's this kind of wrapped together? Kind of filling the loop on

assurance. So unfortunately probably most of us are going to work in some form of regulated environment. We all want to be compliant with something like ISO as painful as that might be. You might be handling payment cards. You might work in a bank might be working something that's regulated third party critical third parties. All of these different regulations things form requirements on the right. They're your baseline for what controls you going to do, how are you going to do them, which ones do you need. So specifically things like identity. Every single team talks about identity. Every team talks about backup. Make sure you understand those and plan those in from day one. Then we get into the controls

requirements from the foundation. Scale out with your controls. Build them into the different layers. Always assume that you are going to be breached. The assume breach mentality works inside the sock but also works from a security architecture perspective. My personal favorite one is testing. Most of the time we talk about testing of controls or for doing compliance exercises we go and do it once a year. So even Dora now it says you need to go and do exercise test your controls test your backups once a year. But most teams are deploying more than once a year. Cloud is evolving in real time. Your users are interacting. Their expectations evolving you real time. There's tools out there now like trust

thread teaming lip chaos. You can take what we've done from an SR standpoint. You can now do it from a security angle. What that means is I have my code. I've got my C controls which I say are all aligned to Dora and is to whichever framework you're going to talk through actually you can now start to harness the automated testing to start saying is that control working as it should be and probably an even more fun one so previous talk mentioned around tabletop exercises you can go a step further some teams and some companies use things called game dates they're not telling their application teams, the blue teams, the red teams, what's going to happen. But there is a

plans. We're going to inject a fault into a critical system and we're going to see how it works. We're going to see how people react when they don't know it's a tabletop. When they don't know this is coming, they won't know the scenarios, but it's been signed off at leadership. Probably your head of sock would know, your head of application if it's in application side, your head of cloud. And the idea is it's a learning event. Yes, it's there to test and yes, it's there to make sure that we meet with requirements and regulations. But now we can go and inject things and see did the process break down? Was there a human fault? Because actually all the pressure

got put on the level one analyst who just suddenly got overwhelmed with information. It helps us actually impact and really work on that human error so that it doesn't happen in real time because when you do get the phone call in the night, it's really hard a to wake yourself up and b it can take you a few minutes to get going, think through everything, even logging on. So, it's a really good way to actually make it more human and from the higher up and the exec side, make them sit through it. They're signing this off. They're the ones saying, "Let's go and fault inject Make them sit through and see what those analysts are seeing, what the

application engineer on call to is seeing when they suddenly get a phone call. Let them see in the room as the water rooms are filling up, who's in there, who's saying what, what's the feeling of those teams in the room because quite often in a lot of cases, RSX and leadership, they can be quite distant and peered off because they've got other things they're doing. But it's that time to bring the human element back into the room. So where does that leave us in terms of there's an awful lot you can do? Well first thing about enforcing and mandating trust certain configurations within your container. So don't run as root limit the different name spaces that are using

use lin name spaces use things like hemetic runners or look to moving to hermetic runners which don't have access to the internet. And if possible looking at things like verification summary at testation the specific framework for that called SA supply chain levels for software artifacts they give a really good framework of scoring from one to four. Have a look at that and see where you are on there versus where you kind of want to be, which is ideally level three and start to think about how you can make sure that some of these basic fundamentals are in place because if you do let people run as roots, you give them access to the kernel or the overlay file

system, it will get abused. Unfortunately, think around what you're doing with your containment and by that are you using any form of network segregation internally on your clusters. Can my different pods talk to each other? Should they be talking to each other? How do I get a view into that world? So that could be using a something like selium and pubble. It could be you just go through the standard cubernetes, right? So you can do cubernetes namespace. You can do it then through cubernetes networking. But have a think about how you divide those up to try and start limiting that blast radius so when something does go wrong because it will go wrong unfortunately you've got something there to fall back

on. Deploying visibility in an ideal world we'd have all the logs from everything and we could all understand them start small and slowly scale. So the idea of going straight to EVPF is quite scary. There's some really good books out there from Rice who does a lot for container. Kelsey High Tower has been talking about this recently. Look at things like Falco. Look at things like Tetraon. They're open source. You can use them. Play around with them. They will give you an idea of what's happening at that kernel level, at that network interface level to say, "This is what my container is doing. Is this normal?" Don't feel rushed to get there though. don't feel that's going to

be the crown jewel because getting the basics and the hygiene right actually helps solve a lot of this. EVPF is kind of like the added gold mine that you need afterwards. And the final one is detective respond. Have you got policies codes deployed? Do you know how to respond if one of those fires? Do you have any networking any control rules fired logged and stored? to your analyst at level one, level two, level three. You know, how they're going to respond. Do you know actually how you're going to do forensics and what people may say or may not say and what the time period and notification is across different things like DOA, GDPR, UK DPIA.

They're similar, but they're all different. So again, from an analyst, I the last thing he wants to do is panic about that on the very last minute. Um, that's all. Thank you very much.

I've got a few questions. I'll just start with one and try and keep it simple. Um, as a runtime security model or securitization model, um, what you're looking at is a a large amount of sensory data. >> Yeah. But the obtaining of sensory data seems to me to be highly dependent upon your ability to understand the behavior or expected behavior of your application. So it's hard to sort of define this, but that feels to me like a lot of work to get. I mean, I'm sure there's some things you can kind of automate. Um, as I say, just put these kind of rules in. They all will work for everything, but to properly understand the expected

behavior of an application and then bring that into a runtime debug pect perspective and perception feels to me like a significant amount of effort. So, is it um, and how do you minimize it? >> It's difficult. It's a lot of information as you said. I try to think of it back through the layers. So the clouds at the hypers scale level given hypervisor to a degree have this kind of side. So the idea of defender can aggregate all the information that's happening across AWS. Same with um security hub in AWS. It can aggregate a lot of that. It applies a lot of machine learning behind the scenes to try and flag what's wrong, what's abnormal. that

comes completely built in. Okay. Applications have been doing this for a while. So they've been using APMS application for monitoring those similarly rule based but that's been there as a very very mature thing for a very long time a long time definely 10 years plus where we're starting now to your point probe that is the behavior of container isn't something isn't a question we've asked really especially if we then start to get into things like Fargate and completely serverless tools out there like Falco, Tetraon, not going to rule them all off. They do have some level of intelligence built in. So they have default rules, they have default things, but it is still very rules based. Similar to a NextG

firewall, there's only so much intelligence you can put into that machine before you have to program it yourself. That's where the crux is. That's where the security engineering, the real complex digging is. A sock analyst doesn't have time to program that. Let's be really fair. Sock analysts are massively overloaded. Everyone I've talked to is pretty much burnt. >> It doesn't feel like they should be doing it anyway. It feels like it should application side. >> Well, they shouldn't be because of separation of duty. So, a security someone in shouldn't be going writing code to change your control framework whe that's DP, your EDRs, pretty much any security tool. you're looking at someone on the other side of the fence because

otherwise you have who watches the watchman. But you're right, that's a security engineering probably a specialist person. It's not necessarily an application engineer or software engineer. It's a different domain and that is a skills gap. There's some really good tutorials out there. Cody is literally doing a deep dive into EPF because he's been learning it. Definitely go check that out. There's some really good introductions. Isa Valent who this Rice looks kind of she's written a book for them specifically on how to deploy EVPF. So it's a good starting point but think of that as like that's the end of the journey. Think about the basic fundamentals first. Do I have my cloud logs? Do I have things like Kubernetes

audit logs? So they're disabled again, incredibly noisy, but an incredible amount of data for people to understand and think about how you can use those different information points within there because you're not going to need all of it. >> Other questions? >> Yeah, you said about policies and default deny. I know some of the Kubernetes hackers in the UK and other parts of the Kubernetes uh sick response community. Have you ever come across anybody that's deployed network policies because everything I get back talking to people is yes good idea. Everybody says to do it, nobody does it in practice. I hope I got be careful what I say here just because of my job. Yes and no.

It is what should be done. So in a standard cloud model at the cloud layer that's absolutely how it works. If you log into AWS or anywhere even in XJ fires nowadays the final rule in there is deniable. So you have to actively go and patch and allow everything above that. >> Yeah. >> But at the Kubernetes layer, >> you're right. You spin up EKS and it's basically open by default and that's the Kubernetes security model >> because it's incredibly complicated. It goes back to network architecture. Do I know which body is supposed to talk to which? How is that supposed to talk? Which ports protocols? It's right the way back down into the weeds.

And I get it. It is really really hard. Now I've always people on both sides of the fence. Some teams that are using it, some are not. Some who only deploy it in certain bits. So for example, if I have a database and I want to protect the database and the crown jewels, let's build some walls around that rather than trying to worry about all the application interfaces and all the other bits above it. Let's harden off certain bits. I won't solve the whole problem. It's solve some bit. Similar to service mesh, it's great. However, >> performance overheads. >> Yeah. Um there is overheads running everything, right? Even for the EVPF, anything that runs on top of a CPU has

attacks. Everything has memory tax. Be skeptical of someone who says, I we could just deploy the service mesh. It will learn all the routing and whether that's good or bad, and then you just implement the rules. They can do that. They're just taking the data, seeing what's there, and then saying, "This is what your data is doing at right now. You've got these connections going from point A to point B, this from point B to point C. This support this protocol. You need to review it. You need to go back to architecture 101. You can't deploy these things without thinking actually talked about TCP UDP IP v4 versus IPv6. I use IPv6 kill it. And again, you can't push these things

out in live. It's going to be a slow sequential. I look at a rule. I propose a rule comes to your threat hunting. I propose we're going to turn off these ports, these protocols from this point, look at it, monitor it, monitor it for three months. That's ideal because you get a huge amount of telemetry. It's very slow. You can monitor it for a few days and then start to close the gap. Starting to close the gap slowly and doing explicit denies whilst it's not the explicit block all at the end does help. So start to close that gap. Make that window possible as you feel comfortable as an organization and again it's organizations risk as to

whether they say we're going to use it or not going to use it other I mean if 80% of teams aren't using network segregation that probably gives you a clue to where the wider community is at >> just linked to that um and building on that kind of overhead thing um you know somebody who's planning out how to deploy One of the things I'll be looking at is okay, I can put all of this in, but it's going to add processing requirements. It's going to add movement of data requirements and it's going to add storage requirements. Is there any data out there that gives me a rough idea how much it's going to cost me for the guy who's paying Amazon

or whoever I'm using how much more I'm going to have to pay them each month to deploy a let's say a half decent version of this is there anything out there that gives me a number a percentage you know if it's costing me 10 grand a month is it going to cost me 15 grand a month post adding this functionality >> not as far as I know typically if I'm building this out. >> So, it's hard to justify as an internal organization. You can't tell somebody how much it's going to cost you. >> And that's part of the problem with all of this. You've got sensor data out there. Even switching on to things like

security, it costs. Even if you're running it in your own data center on your own hardware, you've already paid for it. You're still looking at CPU overheads, power draw, calling, that's still a cost to the organization. It's a very very difficult argument for a CISO to make where you can kind of get into it bit is around that central we're going to switch on certain things. So the reason I was just asking is because if it was 10% on average it might go to 20 and given that the cost of this of the server is relatively you know it's 5% of my overall deployment cost I'd go don't care about it move on. If it was a 3x,

it might move from noise to something I give a damn about. So that's kind of why I'm trying to get my head around where on the spectrum are we with respect to that financial overhead for the deployment of something like this. >> I for me I end on the slight heavy log side. I could always always have slightly more logs there because we could go for hunting with them because you can fall back on them for legal if you need to. Heaven forbid you do need to fall back on it. >> So a more heavily regulated customer would probably want more of this because they need more defensiveness. But if you're not regulated but you just want

to be safe and secure, >> you might find yourself at the other end of the spectrum. >> Which is why I talk about the insurance side and regulations. So things like Dora, things like Miz, they talk about some of it, but they don't describe thou shalt have this audit, log, this access log, this one. >> I mean, you you know, you should be doing something of this at some level, right? C. >> Let's go back to why do the cloud providers turn the Kubernetes logs off by default? Because nobody wants to pay. Nobody wants to pay. They're big. They're heavy. And heaven forbid you start logging what's happening on the API when it's talking to

like that is just literally a it will scroll the wall in a matter of milliseconds with the amount of stuff that's going through it on a standard enterprise even small medium one. >> Yeah, sometimes people don't want to be and then you've got the processing factor on top of that. So again, it comes down to your assurance level. What's your risk posture? Typically, it's people outside of security ops, outside of the day state running the listing. Although it's going to be your head of sock and you see someone who picks up the bill in most cases. >> Yeah. >> So I've got a party question. The first question is kind of unrelated, but with shadow AI and AI related incidents, what

kind of visibility do you think analysts would want and the second question is um what does a typical workflow look like when there's a shadow AI related incident? >> Oh god shadow AI is a bit of a headache. >> Yeah. >> So start on the premise most companies give you a laptop. That laptop will have some form of EDRXDR. It will have some form of proxy. So you'll be tunnneled back to your internet or music box. So by default they can see what's on your laptop and they understand what you're connecting to. Even with DNS over HTTPS like headache coming where most people are looking for it at the moment is actually are you trying to

install something. So side loading use my on my own personal laptop. I have nothing. I write in tune and things on there just to mimic what goes on in environment. If I've got in tune on there or some other form of mobile device management that will tell me I'm now trying to s cursor or coding that helps but it doesn't fix the problem of actually I can just go to the web page. A lot of the web pages we're seeing and a lot of metric firewalls a lot of proxy controls we've got are literally just blocking on the subdomains. So even though I can say I don't want to go to chat GPT, how many organizations use Microsoft by

default, how many use Google by default? So you can't block those domains because you fundamentally need to operate as company. Where we end up getting back into is you're looking at things like your firewalls. So your firewall logs to say actually they are trying to go to chat GPT. We know that's an AI site. Don't let it. problem is domain fronting. You can spin up a new domain in a matter of seconds. There's new AI services coming up every single day. So you're looking at inline blocking off that traffic to assess what that site is in real time. Secondary part is actually I have access to Google because I use it. I may have access to copilot. Now what

am I putting in there? And that comes back to being able to see what can you see on the clipboard. So what is being interacted with? What files being uploaded and dropped? Uh forgot who it was from the FBI put a load of documents into a public cloud provider that should have been blocked by the laptop and the endpoint the DP control. where that kind of triggers into the sock is they'll see actually this person is trying to get there but most of the time that block's going to come in as it's blocked it cool move on a DLP control is only going to fire if it's hit a we've blocked it to go and see

actually in the case of the FB Honey one it allowed it so we've registered this person has picked up these files and these locations and they've put it into this URL i.e they've gone and put it in public. That's a workflow more from the threat hunting side. So you're going through and looking at are we or as your hypothesis we use AI. We use say Google or whichever one it is. I think someone's gone and done shadow AI. I think they've gone and uploaded something they shouldn't. And then you start looking through the logs. We're kind of playing catchup. Now there are some controls and things you can do to get around that. So virtualized workspaces where you rather

than it being on the edge user's laptop that sits in a cloud or a container somewhere and that controls what goes on there. So you can use AI governance modules to control which AIS you've got which frontier models you use difficult. There's some open source ones but they're still expensive running them. You've then got to connect them to your end user device. They're not always that practical. also just it's easier for a developer to use their laptop. Hope that answers the question. Thank you.

>> Thank you.

Cloud & Container Security: A Defensive Puzzle from Pipeline to SOC Operations

Related talks