← All talks

BSidesSF 2026 - Pwning and Defending AI Agent Code Interpreters (Kinnaird McQuade)

BSidesSF45:1024 viewsPublished 2026-05Watch on YouTube ↗
Mentioned in this talk
About this talk
Pwning and Defending AI Agent Code Interpreters Kinnaird McQuade AI agents and chatbots increasingly run Python code interpreters, often for data analysis, that can be abused by attackers. We’ll show how prompt-to-RCE, credentials abuse, & C2 payloads work in these sandboxes, including a real AWS AgentCore breakout & hardening against code interpreter exploits. https://bsidessf2026.sched.com/event/a219e1fc8ccca6fc4e1ee1a95d9163ab
Show transcript [en]

Welcome to day two of beside San Francisco. We're going to have five talks throughout the day here and we're going to start with the first one who is going to be uh poning and defending AI agent code interpreters by Kinad MCU. He's the chief security architect for beyond trust. Thank you so much Kard. Is this thing on? All right, cool. All right, welcome everyone. Um, man, I'm so hyped to be here. Um, I love this conference and you know, I've been doing all this work on AI code interpreters over the last six months and so I cannot wait to share this research that we've been doing. So, um, a quick who am I? So, I'm Canard McUade, chief security

architect at Beyond Trust. I've been deep in cloud security research for 10 years and I published some open- source tools in cloud security before everyone started doing it with AI. And then something crazy happened in January of 2025. I touched cursor for the first time, started vibe coding and it basically changed my life overnight. And it's ramped up so many times since then when we've had crazy models, when we've got clawed code, and when Opus 4.5 came out over winter break. And I can't stop thinking about it. I can't stop hacking it. So I recently branched out into AI security research, and I I love it. So today, we'll start we'll start off with some background. We'll do a deep dive

into code sandboxes and code interpreters. They're everywhere. They're everywhere with AI. How do they work? What risks are there? What can you do as a defender? Then we'll go into something that only got disclosed last week with my team, Phantom Labs. Um, a hack that I did on Bedrock Agent Cor's code interpreter. This was really fun. I'll show you a demo of the PC that I open sourced to prove it to AWS when they didn't believe me and it still works. And hopefully you'll come away with some defensive strategies, a better understanding of how it works, and a mental model for the risk behind them. Now this talk was originally on code interpreter services but honestly the

adoption of local coding sandbox has been completely nuts. So I felt an obligation to expand the scope of my talk to include that slightly larger umbrella. So what's a coding sandbox and why is it so important for AI? An AI coding sandbox is basically an isolated execution environment. um uh an AI agent generates code, sends it to this walled off container and at runtime, Python, JavaScript shell whatever executes it, and inside that boundary, the interpreter can touch a file system, spawn processes, and potentially make network calls. But the whole point is that none of that should leak out to the host, but you can still run code safely uh in theory and those boundaries where

all of the interesting research lives and that's where things break down. And we're going to dig into what that boundary actually looks like and then where those gaps are. So a quick question for the audience, who uses claude code, codeex, copilot or some other coding agent? That's like everybody. All right. Now, out of those, I'll remind you, your boss isn't looking, your face is hidden from the camera, and who uses the dangerously skip permissions flag or auto or yolo? Okay, so that's like is that like 50% of the room? I think it is. So, I knew all you security people were lying. But you're not alone. Research by Enthropic earlier this year shows that a

ton of people are doing this now. There are a lot of factors here. So, is it because we trust the agents more? Or is it because we're tired of pressing enter? Or is it because you need the dopamine rush of all your coding problems being solved and you love shipping sloppy product PCs that nobody uses. Or is it because they're more secure? Not really. So, local sand coding sandboxes are going to become more and more important given the risks which we'll go over. And this isn't theoretically risky. Let's take the Amazon Curo incident. In November of 2025, Amazon came with a mandate that 80% of its engineers had to use Curo on a weekly basis. And what do you know?

Someone ran it on their computer with credentials that had access to production. And that caused a 13-hour outage of AWS Cost Explorer in China. And Amazon's official response was, and I I love this, it was a coincidence that AI tools were involved. Now, in reality, an engineer should have never been running Hero or any coding agent with credentials to production. That's not just an AI coding issue, even though maybe there should have been some checks in place or maybe the harness was being too aggressive, not understanding that it's talking to production, but we can't blame all of that on the AI acting weird. And it's not just Curo. There are other incidents with Claude, Cursor,

Gemini, and others. And then there's OpenClaw. It has over three 300,000 GitHub stars and was eventually acquired by OpenAI. It's an autonomous agent that runs on your machine. You connect it to different systems. You message it over WhatsApp or Telegram and then it can send your emails. It can deploy code or run bash from your machine with your own creds. Super safe. Meta's director of AI safety hooked it up to her email and told it to not take any action unless she approved. The AI agent ignored her instructions and then speeddeleted all her email. Now, there is a sandbox mode, but why would you use it? The whole point is to connect it to real

systems so it can pretend to be you. So, people are naturally incentivized to turn it off. They're just not saying it. So one of the things that they do besides on relying on the human to be careful is running it in a separate VM. Nemo claw was recently published by Nvidia which kind of does this and there are a lot of other ones that are like secure openclaw deployments but so far and I'll never bet against Nvidia but so far all sandbox solutions seem to be more of the same. So we're just in this era now where there are all these unsafe YOLO commands and vibe coding developer workstations or also business users who are also vibe coding now or creating

apps. They've always had a wide blast radius and they connect to all sorts of things and there's a need to limit that blast radius like enter local coding sandboxes without some kind of sandbox. Claude or any other agent is just going to run bash on the host. And the coding agents have some controls like you can't use the tool to read or write files outside the current directory. But then you let it run bash commands which can totally reach outside the current directory. And then the only thing that is sitting between you and a potentially risky action is a human who's tired of pressing enter. Running around in that blast radius is an agent that runs with your identity

and your permissions, your sensitive files, your envir your environment variables, cloud credentials. That's all in the blast radius. It doesn't even need to be malicious skill files or some crazy supply chain attack. Like we saw with Kira, the AI could decide that it's just so much slop that it would be better to terraform, destroy, and then start from scratch. Now the sandboxes they have a few key components which I'll get into. You have file system isolation. So if it's a process wrapper you might have light controls like don't write outside the current directory or if it's separated by compute like a container or a microVM you have mounted file systems. Then you have network isolation. Can I control

what outbound things happen or can I transparently see the application layer HTTP calls to outbound domains? Can I get open telemetry traces and then aggregate them somewhere and then see the entire trajectory of the AI agent session? Then you have execution isolation. Is it running in a container or a microVM? Can the agent process get control of the host process or break out? Then you have secrets isolation which is barely being done. I give credit to some vendors like Docker Sandbox and then there are some other ones with uh that run remotely which we can get into later. And all the while, everyone wants to run with dangerously skipped permissions and yolo like open claw. They want to become the claw.

Now, how about hosted code interpreters? Now, if you've ever uploaded a CSV to chatgbt and asked it, why am I so broke? Make me a budget. You've used a code interpreter. uh when you upload it, instead of using LM inference to guess the answer, chat GBT generates Python code that parses the CSV and then returns precise results. Just like how a human would never read the entire spreadsheet just to understand what it does or to do some calculations, they would sample some cells and then run some Excel formulas. The AI agent, they act best when they act like humans. Now, that's exactly what hosted code interpreters like AWS Bedrock Agent Core, which we'll show some fun stuff

with that later on in the uh presentation. They provide that as a tool to developers, services that allow developers to build AI agents and chatbots to execute code in response to interactions with users. And they do that with running code on the fly inside remote containers or inside remote microVMs. Now, before we get into the specifics of who's offering what and how these sandboxes actually work and then how to attack them or how to defend them, I want to give you a mental model for thinking about risks in these systems. So, I I here I stand on the shoulder of giants. Simon Wilson, the creator of Django, who is an absolute must follow in the AI world on social media, like

you really should follow him. He coined the term that he calls the lethal trifecta, which you might have heard in other talks at this conference. These are three properties that when you combine them in one system, they create the conditions for maximum damage from prompt injection. And honestly, now prompt injection instead of looking like ignore all previous uh instructions and do this now it just looks a lot like social engineering except on a robot. So the those properties untrusted input. Can it take stuff from a human or another agent? Can it hit the internet or can it modify state? Does it have access to private data? Now, Meta expanded on this with their agent rule of two where they said it's

not that bad if an agent only has access to one of these or can only do one of these. After all, if it processes untrusted input but can't do anything, is it really that bad? What are you going to do? Mine Bitcoin or eat up compute? That's bad, but that's not going to wreck a company. Or if it takes an untrusted input and can call home to an attacker domain, that's bad. But if it can't access any data or systems, that's less bad. It can modify if it can modify state though, that's pretty bad. But maybe that should be its own thing. But what they say is if it does more than two of these

things, a human should approve it. And that feels right. With all of those conditions, you don't want it to be running in a loop yet without human approval. But these agents are getting so good that if you yolo it, if you live free like OpenClaw, it will be productive. But as we've seen with incidents like OpenClaw, that is ins is insanely risky. I think this is a great mental model, but for now when all our security controls are under construction, it's also a band-aid for the industry until we get some better lease privilege and security controls in there. Now, with code interpreters, they they operate generally on untrusted input, LLM generated or human generated code by

design. You might be able to limit some of those scenarios, but with like LLM guardrails, which is basically a W for your prompts or models that are more resistant to prompt injection, but you can't get around the fact that processing untrusted input is part of their core function. So, you're always going to have that property. Now with network access once an interpreter can reach external resources the property exists the possibility exists for prompt injection or social engineering to trick it to excfiltrate data or maybe download and run a bash script that does it for them. Some code interpreters will just say, "Yeah, right. You're not talking to the internet at all." And then some will

say, "Okay, you can download stuff from Pi or npm or GitHub, which is of course super restrictive, uh, or maybe some S3 buckets, but not the internet. There's a spectrum of this that we'll get into. Now, code sandboxes or interpreters can also get access to private data, whether that's through using um, you know, API keys or IM roles or checking mounted file systems. If the agent or the agent through the code interpreter can access those, it completes the trifecta and that creates the conditions for max damage. So let's talk about what sandbox actually means because to me the answer depends on who you ask. When a vendor says that their code interpreter runs in a sandbox, that

could mean completely different things across what I see as four dimensions. On the network side, some sandboxes claim no network access at all. they claim. Others do an allow list, but some just give you full internet access and call it a sandbox because the execution layer is isolated. Execution uh isolation is where you see the biggest architectural differences. You have process base isolation. It's the lightest weight sys call filtering that is known bypasses and that's when the native coding agents that's what they all use for the native code sandboxing which is mostly turned off by default. Then you have containerbased which gives you namespace isolation because you're still sharing a kernel. And then you have microVM based like firecracker or

gbiser and that gives you a dedicated VM per execution. And then you have credential isolation. So you might have API keys or tokens andv files. And then if you're using a sandbox do you mount those files? Do you include that? Do you make it accessible to the agent? And then there might be nested files which could be accessible too. Do you expose a metadata IP to the endpoint and then give it an IM role? Do you use a man in the middle proxy for credential injection? That's a pretty clever solution. Well, I'll show you something like that. But from what I've seen, it's not really kernel vulnerabilities that you need to worry about. It's things from the

trifecta. mounting files, showing data, letting it use credentials, or allowing outbound traffic for exfiltration. Now, if you don't know what who that is, that's Andre Carpathy, the uh founding engineer at OpenAI, who also invented the term vibe coding. You should also follow him on social media. And I guess my memes stand on the shoulder of giants, too. So, when I got into this, I thought, "Oh, Agent Core is really cool. It's kind of like Cloud Sandbox, but in the cloud." And then I realized, oh my god, this rabbit hole goes really deep. So this slide maps the current AI coding sandboxes uh landscape separated by uh isolation model. After these light local rappers, you have containers and VMs

that are local or they're remote. And we'll dive into the mechanics of these sandboxes, but the key takeaway that I want you to have from this is that every tier has tradeoffs. So whether that's more isolation or startup latency or different security guarantees or how difficult it is to set it up and manage. So let's dig into local coding sandboxes. Why is it not moving?

Oh, okay. Another quick question for the audience. So, we had almost everybody say that they use uh Claude or an AI coder on a regular basis. Who's tried coding sandboxes? Okay, that's actually a lot more than I thought. Um, so now um uh, who uses it regularly? Okay, that's a lot less. Not to shame. I'm not going to shame you. So, I I knew that would happen. It would be a lot less. Uh it's not very common. A lot of that is usability and the desire to move fast and the reliance on human in the loop. So, there are four different layers of execution isolation I see in code interpreters. So you have the microVM

isolation whether that's running uh in firecracker or a dedicated VM. So you get some hardware isolation. That's what Lambda and Fargate use under the hood. And then you have container isolation which is fast um but it has a shared kernel. And then that's annoying because you can't run Docker here without giving it access to the Docker socket. So if your validation for the AI coding agent relies on spinning it up in Docker Compose or doing some Kubernetes stuff to validate that your code actually works and isn't just like sloped, then that you're not going to be able to do that. So yeah, usability concerns. Then there's process isolation and that's what Claude and others have built in

their own sandbox, but that that also has known weaknesses and all three of those have sticking points for developer adoption. making the developer approve more things into the allow list and the scope of what gets allowed gets pretty broad and then you have running raw on the host and that's no isolation. As an industry, we need solutions that are going to make it easier to move up the ladder because that experience kind of sucks. Now, let's use cloud code as a reference point to talk about their native sandbox. It's turned off by default. It's the lowest tier on the ladder uh aside from doing raw on the host. Um and uh use process level isolation. On a

Mac, it uses a seat belt. On Linux, it's bubble wrap. And file system isolation has right scope to your working directory and whatever paths you add, but reads are basically everything. Networks. Uh the network goes through like an egress proxy with a domain allow list and that patterns followed with the other AI agent code interpreters um or AI agent sandboxes. Now, there are tons of bypasses or holes here and I've listed those exact settings here uh at the bottom. Whether that's allowing Unix sockets or uh on the network side you could allow list pi or npm or github and um you know if you allow GitHub congrats that's like next channel anyone can make a repo and then uh there's an escape

patch where uh setting which is really interesting where if a command fails in the sandbox then the agent will just retry it outside the sandbox and you can turn it off but um the repercussions of that aren't immediately Uh I don't know I read that dangerously disable sandbox and I don't see it and think like oh that's an escape hatch right so the proxy allow and the then you have a proxy which allow list domains but it doesn't actually inspect the encrypted traffic that's passes through so there's like a domain fronting risk where the TLS connection looks like it's going to an outbound domain but the actual like HTTP request just inside just like goes

somewhere else and anthropic documents that um so now we have uh so I want to talk about microVMs. So you have Docker Sandbox which in my opinion has the best uh trade-off of like dev experience that I've seen. So you can still like run Docker Compose inside it um and uh you know we can hide some API keys from the agent so that it never sees it and then you can um restrict it with network policies and I'll show some of that on the next slide. Um, in cloud co-work, it's a firecracker VM and I kind of expected that you'd be able to run Docker on it by default, but then you can't by default. Um, there's a really

amazing post that you should check out. Uh, I have it here at the bottom and then they're going to publish the slides, I think. So, um, this uh, this person show like reverse did a lot of reverse engineering on the cloud uh, co-work and kind of how it works. Um, you should definitely check it out. But there's no credential isolation or injection like sandbox with co-work. Um and Enthropic gives you some governance capability that you could set for yourself or you could set for the whole organization like if it's at the enterprise or teams level um through connecting the network policy. Also if you do that you might piss some people off. Um and then both use VM process

isolation private network uh name spaces and then domain allow or deny list. So this is one thing that made me stop and go, "Wow, that's actually really clever. I'd love to see more of that." So builders out there, please try to build something like this. Um, when so when you spin up a Docker sandbox, the agent inside the microVM literally has the string pro uh proxy managed as its API key for uh a list of supported API keys and that's that's it. That's all all it knows. um it makes an API call to Anthropic or OpenAI or whatever and then the request goes through this sandbox Damon that's running on your host machine or uh so like outside that

microVM and that Damon is a man-in-the-middle proxy. terminates TL TLS and then looks at the destination domain and this says, "Oh, this is going to enthropic API. I have an anthropic API key that's set in my environment through like the Zshrc file and then it swaps the placeholder for the real bearer token before re-encrypting and forwarding and the agent was never able to see the key. It's not in the environment. It's not in any config file inside the VM uh probably." And then if the agent gets prompt injected through like a malicious skill file that tries to exfiltate exfiltrate credentials, there's there's nothing to steal. Uh the key only exists in the sandbox uh process memory on the host. And that's

you know it's not not perfect. Uh the do the Damon reads from your global shell config like CS HRC and then it doesn't pick up ENV files and there's no policy governing when credentials get injected for what tasks. It's all uh it's pretty much all or nothing and then there's no audit trail of what key was used when and then how does it know which domain to inject it so that it's not like sent to the wrong domain. So you're not sending like the anthropic API key to like evil.com. Well, Docker Sandbox they like have kind of control they have a they maintain a list somewhere but it's not like you're able to as far as I've

seen you're not able to like control that mapping. Um but I point this out because the pattern is right. It's what credential isolation should look like for agentic workloads to prevent Xfill to grant inspection and then prevent abuse. And it's the kind of thing that we should be enforcing at the governance layer, not just for Docker sandbox for but for every agent execution environment. Now, I'm really not trying to sell anything. I'm just like I'm calling some of these things out because uh you know these problems are BS and I want the industry to do better and everything's under construction. So I want to share this with everybody. Um, now I saw this with, you know, some

remote sandboxes. I think it was Daytona, I can't remember, but not all. Um, now we're going to get to the stuff that's related to my exploit, which is really fun. If you line up all these sandbox providers or settings in a spectrum, then you get a pretty wide range. On one end, you have environments that have no network access at all. They're fully isolated, can't reach anything, or they're not supposed to. And then on the other hand, you have code interpreters with full internet access. They still run in an execution sandbox, but it's not network sandboxed. And then you have stuff in the middle where they let you download packages, but you can't curl to evil.com or where

you have explicit allow lists or deny lists or claude has a specific setting that you lets you curl from like parent domains but not subdomains, which is kind of interesting. And even the ones that claim no network access, and I can speak to this from personal research, they sometimes have gaps like maybe there's a cloud provider API that's accessible from inside the sandbox that nobody thought to lock down or you know, as you can see. So like the intended isolation and the actual isolation are also are kind of two different things. And that gets into the section that we're almost into and that's how I pone AWS Bedrock's agent cores uh code interpreters like sandbox mode.

Longest name ever. Uh but we're almost there. So first let's talk about the currently available defensive strategies for local coding sandboxes. So you should use a sandbox ideally based one that's based on a microVM to give you some isolation. You should tighten the network allowance. You shouldn't allow the escape patch or access to the host at all, whether that's through the network or the Docker socket or anything else. I highly highly recommend checking out Trail of Bits's GitHub repositories for Claude, and I put the links up here for where they have some really awesome resources on secure cloud usage. And it's not just secure usage, it's also how to run YOLO mode securely. They have

like some recommended dev containers based on like some of the languages that they use, but also it goes over how to manage context, some security scanning skills, and some really good hooks. Um, and then with that dev container setup that they have, they're like pro yolo mode. Um, now if you're an admin on like the cloud code cons uh console, and I use cloud code a lot as an example. Some of these things are available with like other AI agent coding providers, but I just use cloud code the most. Um, so if you're an admin on the cloud code console, you can govern the network policy scope for your organization. So developers can't uh override to allow

all if that's your choice. And then also if you're more advanced, you could build an agent hooks to be enabled for everyone at the cloud code or level. Um, I think meta uses this to aggregate like all the open telemetry calls, tool calls, everything. Um, there's a lot there. Now, here's the fun part. How I poned AWS Bedrock Agent Core or if you want the long form poning uh Amazon Web Services Bedrock Agent Core Code Interpreter Sandbox Mode, which is the longest freaking product name that and feature that I've ever seen. Oh my god, I see that sometimes I think, man, I'm not reading all that. This is one of my favorite memes. Hey man, if I'm not I'm not giving a

good tech talk if I don't drop some good memes. All right, so back to business. Um, so the OG bedrock before agent core was all inference and low code, no code agents that invoke lambda functions uh to oversimplify. So it turns out that people want more control over that. Now they're encouraging people like if you log into bedrock, they'll have some alert at the top where it says like, "Hey, you should migrate to Agent Core or think about it." Um uh so I I figured I'd give uh TLDDR on the surface just so that you know um so you have agent core runtime which is basically Fargate but then you can uh but you can lift and shift your a

Docker image of your lang chain agents and then run it. Um and that one's pretty fun. There are lots of security opportunities there. Um uh security research opportunities there. Then you have agent core gateway which is like an MCP gateway with some extra bells and whistles to call AWS stuff or backend APIs. Then you have agent core browser which is like it's basically playright to grab stuff from the internet. Then there's uh agent core memory which is kind of like how chat GBT remember stuff that you said like two months ago. And then you have agent core code interpreter which is like for running python, bash, javascript uh because the agent told you to. It's basically

intended to be all the infrastructure that you that is required for building agents besides inference uh and knowledge bases and guardrails and for getting your organization hooked on AWS for this. They love a little bit of vendor lock in. So let's see let's see a code example of what it's like to work with agent core. Um this is what it looks like to run um arbitrary Python at the top. Um as you can see it's basically two arguments. One's to specify the language and other is to specify code or in the case of bash you can just supply the command to run. And this is awesome. I immediately I saw this and my uh ears perked up.

Anytime I see remote code execution as uh as a service I get way too excited. It's like when I saw uh GitHub actions years ago I was like I'm going to mess around with this. So, I wrote a wrapper script and thought I'd mess around with it and then see what I could do. I immediately reached for I saw the sandbox mode as an opportunity and so I immediately reached for one of my favorite little tools uh interact by project discovery the same people who built nuclei. If you uh is it playing? Yeah. If you also have log 4j PTSD like me, you might also remember that we were hitting the service to test for callbacks in

December of 2021. Sorry, trigger warning. Anyway, if you download it and then you run interact client, it'll print out an ugly URL and then you can use this URL to call back and then you run a curl command in one terminal on the victim and then you check the other terminal to see if you're getting traffic. And if you get traffic, interacts is going to tell you that it's receiving DNS or HTTP queries from some or requests from somebody else. And if you did it, that's awesome. You trick the victim to call out. And then in this case, I ran my wrapper script to curl it and then it said it couldn't connect to the server. Then I

looked at, you know, my other terminal and I could see that it was receiving DNS queries and that's I saw like that's a I could probably abuse that, but I'd have to satisfy a few conditions. Could I actually trick an LLM agent into sending that shell command to the code interpreter to get it uh get it on there? Uh, I thought so and I did. Uh, can we get responses from the machine? So yes, if you run get ant uh because they don't have dig on the machine, then you can actually get the values of the DNS or a records then um you know, can we send out more data to complete that trifecta? Yes, but you'd

have to design some crazy protocol. So I I figured that I would just submit something to AWS through Hacker 1 so they so they know the risk is out there because they, you know, they just launched a service a month ago and they must not have realized that it shipped with this issue. Surely they would understand by proving that the path is there right? Well, not really. I submitted it and then they didn't understand that you could how you could turn it into a reverse shell. As my friend put it, they basically said reverse shell or GTFO. That's not a direct quote, but my friend said it, not them. Um, so what did I do?

What any rational person would, I built a bespoke C2 protocol over DNS so I could get the full interactive reverse shell. So let's see how I did it. Here's the attacker workflow. The attacker generates a malicious CSV. It contains the encoded payload that's in B 64, which is basically a ton of Python. And then we instruct it through the prompt to exec the decoded payload. And then we drop that into the data analyst promptbot, whatever. And then uh then it's going to start the loop. Let's look at the architecture. As I said, the attacker at the top um uploads a CSV and then tells the chatbot, hey, go run the Python code in the second row

in the fourth column. That Python code happens to contain our B 64 uh encoded Python client. And as you can see where it says uh while true, let's pull DNS with this unique session ID which is embedded in the payload. And then it reaches out to our evil server through DNS. And then the attacker knows to pull the DN the C2 server for commands and see the exfiltrated data. Now let's look at the protocol. All right. Yeah. So let's look at how the Wait, did I Okay, I'm on the right slide. So now let's look at how the C2 server actually smuggles shell commands to the sandbox. It's the command delivery protocol. So the attacker types AWS S3 LS the

animation going should be gone. No. All right. I'm just going to leave it. So the attacker types AWS S3 LS to list the S3 buckets into the operator shell. The C2 server goes and takes that encodes it into B 64 and then splits the B 64 string into three character chunks. Each chunk gets encoded into a DNS A record as an IP address. And this is the clever part. The octets of the IP address are the ASKY values of those characters. So if you look at record number one, the server responds uh when uh when you run get ant which is basically dig but not um oversimplification but the server responds with an IP address where the

second third and fourth octets are 89 88 and 100. Those are just ASKI code points. Uh it represents Y, X and D. That's just the raw decimal values sitting in an IP address. And the first octet 10 is a control bite. That means that there are more chunks coming. 11 means this is the last one. That's when the client knows when to stop fetching. And the code interpreter fetches all four of those records, pulls out the ASKY characters, um uh pumps them into the concatenates them into the full B 64 C4 string, decodes it, and then it has AWS S3 LS and then it executes it. And yes, that is a separate request for each

three byte chunk. And yes, that's super slow, but if I'm running shell commands on a victim code interpreter and I get a response in 3 seconds instead of immediately to me as an attacker, that's fine and acceptable if I get to do bad stuff. Plus, the commands are pretty short. The Xful side is where the chunking gets more interesting, but we're able to shove more data into it. So, we'll look at that next. So, in the reverse direction, which is uh you know getting the data out, right? So if the sandbox just ran uh AWS S3 LS and then got back a list of S3 buckets, the attacker needs to see that output. But you know, remember there's no

outbound HTTP. You can do uh the same trick in reverse, but this time the data runs inside uh rides like inside the DNS query itself, not the response. So you you take the shell output, you B 64 encode it, and then you have this big blob of B 64 text. um that third section there. The the problem is like DNS labels max out at like 63 characters. So we split it into 60 character chunks to be safe and then for that output we end up with three different chunks. So the full two full 60 character chunks and then uh some with the remainder and then each one of those gets stuffed into a DNS subdomain query. Now I noticed uh on

the code interpreter environment and you can see uh the web page on the AWS docs for all the packages that are installed. They have reference there that it didn't have like I said it didn't have dig the dig package installed which is explicitly for like decoding or debugging DNS queries that would actually get the result. Um and uh and curl doesn't like return the DNS query. So I needed to use someone else something else. So I used get ant which is different. So dig usually interrogates the DNS servers and then goes round trip for debugging and then git ant is used for like general system database lookup. So like if the entry like already exists in the cache, it's

not going to reach out to the internet. So I had to do some like weird cache busting techniques which are covered in the blog if you're like a mega nerd. Um anyway, so the C2 server's job is to combine those chunks, reassemble, and then display that back to the attacker. And then um as you can see it returns um it returns the list of S3 buckets that the code interpreter is able to access. U boom. So let's see a demo. So as you can see here um I take the prompt and I upload the so I generate the malicious CSV. I take that prompt and I pump it into the chatbot right and I say analyze data.

It's going to um invoke the code interpreter and then I run this make connect session which is like uh just like a wrapper and then it's going to do all that uh you know the uh the bespoke C2 protocol thing and um then uh you know you can see that I am able to list out the S3 buckets. I'm able to print out the um the contents of an S3 bucket because the code interpreter has uh the IM privileges to um to access that. And remember, it's in a sandbox mode. This isn't the public mode. It's not supposed to have access to the internet. The reason it has access to the only outbound stuff that's happening is

through DNS. So here's how the disclosure went. I reported the DNS leak in September. AWS came back and said, "Show us a full reverse shell to prove it." So I did. I submitted the full complete like birectional C2 pro proof of concept in October. And credit to AWS once they confirmed uh once they saw that it worked, uh they deployed a fix 8 days after reproducing it, which is pretty fast. And November 17th, um, after I asked, they said, and I quote, "The initial fix was rolled back due to other factors, and our team is now working on a more robust solution." Um, on December 23rd, the robust solution turned out to be updating the docs.

DNS resolution is required for S3 to work in sandbox mode. So, instead of fixing it, they changed the definition, which is their right as a vendor. Um uh they originally what was complete isolation with no external network access became limited external network access. You can see the before and after at the bottom. And then I procrastinated a few months and then published last week. And then for my troubles, AWS gave me a $100 gift card to the AWS merch store.

Now, they famously don't pay security researchers, so I guess that's something. I got a backpack. Um, but unfortunately, the AWS merch store doesn't have Prime 2-day shipping, so it didn't arrive in time for this conference. Um, I posted the gift card and, uh, the gift certificate number and the PIN. I think there's money on there if uh a little bit of money on there if anyone wants to use it. So there are three things that I want you to walk out of here remembering. Um first one is that sandbox is a spectrum. It's not a binary. Right? We saw today that isolation varies a lot. Whether that's the isolation is the execution environment, the network, the file

system or credentials where you know there are trade-offs at every one of those layers. And the thing that I saw that constantly gets missed is network isolation. But it's not the only thing. Um, everyone everyone focuses on it's a container or microVM, but execution isolation wasn't the problem with agent core. The fire cracker VM was solid more than other vendors do. It was the network boundary that failed. Um, and the second one is about lease privilege. The IM RO gave access to data and that completed the lethal trifecta. Don't complete that lethal trifecta. You should limit access to uh for local coding sandboxes, you should also limit access to secrets from those and don't run coding agents on machines that have

active credentials to production at a minimum. And then the last defense is the last is defense and depth. You you should monitor local and remote code sandboxes and you should uh block all egress traffic or allow us where you can't. The micro microVM based uh sandboxes are pretty great and you should s scan for uh prompt injection in your app player. Um with what I showed um a call to bedrock guardrails might have prevented this but again all these controls are pretty hard to scale right but you need to do them. Uh the toolkit is open source. Uh you can't see it at the bottom um because of the the movie screen thing but um you know I'll post

it. So some some final thoughts there think um you know there are things that you can do to protect yourself and understand those risks but I'll also say as an industry we need more we need more than a sandbox to simply mean it's a VM or it's a container or use this super broad allow list or please please don't use production creds. I've outlined some problems and some innovations, but there's so much more that we need to build for our companies and for the industry to survive in this brave new world. The ground is definitely moving beneath our feet and that threat surface is exploding faster than defenders can keep up. And people are building stuff

faster than ever with AI and there's more stuff to secure and we have to use we have no choice but to use AI to keep up too for attacking and defending. If you were part of the last decade of cloud security, this will remind you of the late 2010s when all the risks were new and people were figuring out what we should even name these things. Hacking with agent core um through agent core and a you know is a popular service only out a month after it launched. It reminded me of when I discovered how you could abuse an EC2 instance profile uh before people knew what EC2 was. And it's so much fun. If you're starting out

in your career or you're dying to get into this AI stuff, I encourage you to dive right in. It's a target-rich environment and there's plenty for the taking and the more you share with the community community, you're going to help anyone. And if this is the kind of stuff that you enjoy, crazy research and, you know, sharing it with the community. Um, if you wake up in the morning and you can't stop thinking about AI, you should think about joining us at Beyond Trust, um, and Phantom Labs. We're always looking for top talent. We get to hack AI, build cool stuff, and then talk at conferences. And this job helped me fall in love with

hacking all over again. I couldn't be more grateful. Thank you. Uh thank everybody so much for uh Thank you guys so much for attending and for listening.

>> Thank you. I think we have time for one question. We don't have anything in slido but >> okay >> maybe from the audience.

>> So >> what's next for you and Phantom Labs team? >> Oh so uh Darwin asked what's next for us in the Phantom Labs team. Um we got a lot of sick research coming up. Uh we're going to drop one after RSA uh related to open AI. Stay tuned. Thanks guys.

[ feedback ]