
Welcome to to track two. Uh running untrusted code is code safely is already hard which we saw in the last talk with uh Tommy Slav. But today we're also embedding AI agents into systems that make decisions and execute actions automatically which introduce an entirely new attack surface. Our next speakers uh there were supposed to be three then there were two and now there's one. uh Arand Robert and Anit uh work at the front forefront application security and AI red teaming. Between them, they've led thousands of security assessment and even contributed to testing frontier AI systems. Today, they'll show us how a technique called special token injection can hijack AI agents and manipulate their behavior. An
attack class that feels eerly similar to the software bugs of the early 2000s for those of us that were adult enough in the early 2000s. Please welcome Arand.
>> First of all, first of all um thank you beside Zagra for hosting our talk and um as as our host mentioned there were supposed three guys but it happened to be only me. So we are uh three security consultants Robert Anis and I. We are uh offensive security researchers that we have conducted some time and effort in um AIS and LLMs where we came up with this new attack that we called it the special token injection and this talk was previously presented in Defcon specifically at the EPSC village AI village besides Krakov and beside Sona and right now in besides uh Zagreb. So it feels feels good to be here. Our uh experiences include in building AI
security practices at Sentry. We also contributed to the open AI um security research and we are also um part of the anthropic uh external AI written capability so on and so forth. So without further ado, let's just um let's just um get to the uh what you'll walk away with uh with this talk. So the very first thing after this talk you're going to have a um a methodology on how to test AI or LLM deployments in the wild and this is not a dive deep uh this is like more of a um broad surface regarding the LLMs how they work and how to attack them and we are also going to share some tools that uh that will help
you in in building some payloads and uh automating some portions of the test And uh after the talk, you'll be able to immediately start breaking and fixing things. So let's get back to a tweet from Andre Karpathy on August 2024 where he warned us regarding an attack that can affect large language models. Andre compared this uh this sort of technique to the classic SQL injection that affects um web applications or any application that has a uh database integrated. And similarly, a special token injection is an attack that affects large language models by parsing some sort of weird tokens like this one over here, the end of text, where the proof of concept that Andre showed us
was uh upon pasting that part of token in a chatbot, it forced the LLM to to generate the task, which was which was a a very weird use case. But before we actually see how to conduct the special token injection, let's just first take a look on how modern LLM works. And also I would highly appreciate if anyone has any questions throughout the conversation. Feel free to interrupt me, raise your hand and we can we can proceed with the with the questions. So modern LLMs support like a role hierarchy for the chatbot like we have the system message where it can be anything like you are a customer support agent so on and so forth and then the
agent or the LLM can respond with how can I help you and then the user request can be something similar similar like can you get me a ticket to Las Vegas and then the LLM does some reasoning or some thinking they do some tool calling and then finally they generate the message on guiding us how to get a ticket to uh to Las Vegas. And what happens if in this part of the user message that is supposed to be controlled by the user, what happens if we parse some sort of special tokens that are used to force or to cut short the message of the user which is the im um for the uh start message or or end
message over here which closes the part of the message that belongs to the user. the road and right right after that we are parsing another special token which is the m start followed by a system label which forces the LLM to think by by spoofing a new system message throughout the user uh user role prompt and um in in order to conduct this attack what we also did was we also place another MN token another start and then we force thinking for the model and what this caused the model to behave instead of the model to to fulfill our needs on behaving like a chess support agent. Now the agent is the hexo agent and is up
and ready to write some shell code um shell code from us. I know this may sound a bit overwhelming at at the first site, but to to fully understand it, let's just come up with a with a sort of um definition on what special token injection is. And special token injection is not a prompt injection vulnerability. Just so that we are clear, special token injection is a vulnerability that attacks the tokenizer of the LLM. And the tokenizer is the technology in LLMs that is used to convert the natural language of the user to some sort of token ids that the model can can interpret and the model can understand. And the next step that what
happens is the model generates a few tokens more that are once again parsed to the tokenizer and then the tokenizer decodes all of these tokens and then we can see the text. So what happens if we are able to inject some sort of tokens that are that have some higher entropy or or that are higher privilege in the sense of the tokenizer that's when we talk that's when we talk about special token injection where we attack the tokenizer and the uh structured prompting protocols like chatl. Now let's just first try to understand two differences between modern LLMs. We have the base models and then we have the struck models. Base models are the one models that are very good at at
completing a text and they are the pre-trained models that cannot engage in um in um user messages or they cannot engage in like very long uh uh chatting schemas. they have they do not have the capability for function calls so on and so forth. And in addition to the base model, we have the strct or the instruct models. These models are the fine-tuned version of a base model. And these models are fine-tuned to be able to be used by humans, communicate in a very long chat, use um use function calling, do thinking, so on and so forth. In this area over here, we have highlighted these two tokens that are the special tokens for a particular instruct model
that instruct the agent that this is the start of the message and this is the end of the message. The reason why strruct models use this uh sort of special tokens is to know when a message is starting and when to to um when to uh cut short a a message or when is that message going to end. Like for instance this part of the tokenizer or the message that we are sending to the LLM. The message that we sent was can you get me a ticket to Las Vegas. But how this message is going to be parsed to the model is by having this special token over here the role label that we are the
user in this case. And then the m end to tell the uh the llm that this is the message. This is where the message is starting and this is where the message is uh where the message is ending. One thing that is very important throughout our research, we didn't really know where to find the special tokens and we didn't really know that different models support different sort of special tokens. The reason is because different models use different vocabularies and with different vocabularies we have once again different um different special tokens. So this is another example on how the messaging structure looks like from an attacker's perspective and the key areas where we highlight it on the specific
payload that we used to perform special token injection. This particular example in this case is to enhance the jailbreaking capabilities. But special token injection is an attack vector that can go way beyond uh way beyond jailbreakings. In order to understand special token injection, we have uh categorized it under three main categories. The raw user input, the one with the special tokens through the uh through the through the comparison tags and the other one is we call it the chat completion schema. We are going to talk about it shortly. And then the chat template which in most of the cases is a ginga 2 template. And then upon having from all of these sources the user input
the ginga template will produce the output which in most of the cases is chatml. And then we have the tokenizer that generates token ids. And then we have the sync where uh where in this case is the the model. So let's just see how can we uh create some payloads in order to uh to achieve special token injection. In this area of the presentation highlighted in green we have a a very short list of some of the most common special tokens we have seen in the wild like the M start to specify where a message is starting the M end to specify where the message is ending so on and so forth. All of these tokens,
all of these special tokens have um uh uh weird functionality on their on their own. Like for instance, we may see a function call that is tools to start a tool call and then the tool call to to close this section over here. And then we have the tool response. And all of these tokens in the first site do not really make sense. And uh what we did throughout throughout our research, we came up with this application or a tool which we call it the token buster. Token buster helps you to create special token injection payloads on demand. It supports more than 1,000 models. And in this section over here, you can select the model and you can see in real time
the user input how it is going to be sent from the users from the user's perspective. And then we have the chat template. This particular model uses to uh generate the output which in most of the cases is chatml like the one over here. And then finally we have the token ids for every part of the message that is going to be sent for the model. And in addition to that we are also giving you another tool which is called CRI inject which is a tool to attack um chat completion insecure uh LLM API configurations where in the JSON payload a user can swap the the role from user to system or assistant so on and so forth.
This is just a screenshot from the CRI inject tool. And now let's get back to creating uh custom special token injection payloads for uh for models. Assume that this is the message that we are sending to the model. Once again, we injected the mstart followed by a role label system. But keep in mind that this particular message is being sent under the user role. And once we send this message to the LLM, it must firstly undergo the chat template conversion in order to convert it to the chat ML schema which is the structured prompting template that all of the LLMs use. And after this particular message is uh generated then we are going to have a set of token ids
that the model can understand and we can note some tokens that have in our case a higher value like 100 to 64 100 to 64 over here 265 uh so on and so forth. The the reason why we highlighted these specific tokens is that the values of these tokens are the M start and the M end. But what what is the worst case scenario? What is the worst case scenario that could happen u in the Y3 special token injection? This is a screenshot from a pentest that we had at Sentry. And what we what we did in this case is we literally used that same payload and the the AI assistant that was supposed to not engage in illegal or
illicit activities now is up and ready to uh to write um C shell code loaders for us where the agent instead of following this particular uh system message now the agent is um is Hexor. Now that we saw a very brief um or a high level explanation of uh special token injection, the biggest impact and my most favorite um uh impact of special token injection is the capability to hijack the model's function calls. With function calls, we mean if the model can read a file, if the model can execute um code, or if the model can browse, so on and so forth. All of these um all of these function calls can be hijacked through SEI.
But before we go to the actual um to the actual exploitation of uh uh function callings through special tokens, one thing that is very important to understand is the way how function callings work in LLMs. So assume that we have an LLM that has the capability to execute code. So we prompt the model to read a particular file and then the model must firstly suggest the tool call to itself. Only then the model can execute it and if that particular tool call is suggested by the user the model will never uh will never execute the tool call strictly because LLMs are built uh are built to trust their own input. So a tool call is firstly
suggested by the model. It is then executed, summarized and then the the final output is parsed to the to the end user. So in this case what we did we forced the model to repeat a string and the reason why we forced the model to repeat a string is that throughout this this exploitation tech technique we force the model to autosuggest a tool function to itself and upon providing this particular tool call where we parse and execute SQL query which was a legitimate function call to the LLM. Now the the query that we have pasted to the model is to select all employees do this and that and in addition to that we also added this uh this query over here to
insert a new employee into the database and uh once we sent once we sent to uh this message to the uh to the LLM these special tokens were automatically triggered by the LLM. So we forced the LLM to to um to suggest a tool called call to itself. The model did that and then finally the LLM summarized its um its result and told us that Hexor was successfully added in the database. Initially we thought that this was some sort of um hallucination of the LLM. And what we did we we navigated to the database because it was a uh it was um a a lab that we have developed in our own. And once we navigate to the database now
we can see that the model has successfully added a new entry in the in the database. Now in addition to the raw special token injections, special token injection injections are most commonly identified in um in uh in open-source models and special token injection equivalency in in frontier models like the ones in OpenAI, Gemini, so on and so forth. they have something in common which is their API or their messaging structure. They all use the the same messaging format like it's it's a JSON array that has a messages and messages is another array that contains all of the messages for a a single interaction with the LLM. And what we did what we did in our case to
to showcase how special token injection can work in API we created a multi- aent system where the first agent was the biocheer agent. The goal for this agent was to check if the user input was related to biology and the only output that the LLM we wanted that to generate was bio okay true or false. The next agent that we had was the triage agent. The triage agent had a task to hand off the query of the user. Once the biocheer agent has returned true and the the the execution flow reaches the triage agent, this agent can pass the input to the next agent which can be the French agent, uh Spanish agent or the English agent.
The task that we wanted this particular agent to do was to translate the answer for a particular question. The intended workflow for this specific use case was in cases that the user query is not related to biology and the query is something like what is JavaScript then the biohacker agent would respond with false and then the execution flow would immediately stop. However, there is there is uh however, if the if the message sent by the user is to explain DNA in French, then the triage agent would detect that the query is biology related. And because the user requested uh an explanation of DNA in French, it automatically hand off the the query to the French agent. And then the French
agent spoke French to us, which is something that I don't really understand, but it's okay. Now the messaging structure that we have developed to communicate with these agents had this format the messages this is the default format that every LLM in the wild supports. So every LLM supports the role over here and the content that we are sending to the agent and what will happen if we swap the role from user to system. Does it mean that we are injecting a new system prompt? Does the system prompt override the previous system prompt? Or will it append a new entry into the conversation flow in order to attempt and poison a new um a new uh system message for the whole uh
workflow of these of this multi- aent uh system. So in order to come up with an with an exploitation what we did in our case we injected a new system message and the system message that we injected in this particular case is to force the biocheer agent to always respond with true because the user input must firstly go to the content filter agent and only then the content will be passed to the next agent where in this case was the English agent And in addition in addition to uh to forcing the the bio the biochecking agent to always respond with through what we did we added another another uh instruction to the LLM to pass uh the handoff to the
English agent always and to induce it to write some some shell code for us and then the the the whole uh schema of the messages would go from the attacker to swap in the the role from user to system to inject a new system prompt and then from the from that perspective the attacker could easily hijack the functionality the functionality of the of the LLM. One of the things that we have discovered in the wild we discovered a browsing agent that utilize this messaging format where we could force a new system message for that particular uh browsing agent. And what we did to to showcase the to showcase the uh the exploitation, we added a new system
message to to this particular agent. And then we did some we did some VIP hacking and we navigated to the to the labs of Portswiger and we forced the agent to to complete the labs for us. So this is just an example on how one agent can be hijacked to perform and potentially dangerous uh dangerous actions in the wild. The next one, the next source to uh attemp uh identifying special token injections is through the ginga templates or specifically this this attack. What if in the in the Ginga template that every model uses, what if we can tamper with that or what if the model um fetches an untrusted ginger to template and that particular template contains a
custom role and what will happen in that in that case is that we are injecting injecting a custom role like in in this case we have the role that is equivalent to the role of a uh system message and just because in our instruction Over here we said that if the message ro is sentry basically to set the system prompt message content under this particular uh under this particular role. This this sort of attack uh this sort of attack where we need to hijack the ginga to template is probably the hardest one to carry out because you need to firstly compromise the host where the LLM lives and where the LLM lives the the ginga to template lives
there as well. So you might need to firstly compromise the host and then from the perspective of the host you might you might want to come up with this with this attack. But if you compromise the host then probably you could do some more nasty stuff rather than uh modifying the ginger 2 template. All right. So one thing that we did in our research we had all of these hypothesis on what we can but what we could actually do with all of these uh with all of these items. What we did, we created a multi- aent uh LLM jacking through the means of STI. So we identified like lots of agents in the world that are vulnerable to this
attack. We repurposed every piece of agent to to conduct a small piece of work under our own desires and our own desire in in this case was to write a shell coding loader agent. So basically combine lots of agents from the wild, reconstruct them to achieve our goal. And the best part of it is that we spent literally zero dollars to achieve uh to achieve our goal. Now we have a video which is supposed to play, but I think because I'm I I'm on PDF, it won't play. But I'm going to walk you through on what this video is supposed to do. And in case anyone wants to wants to take a look at the video,
please please reach out to me after uh after the talk. I'll be more than happy to share it with you. In this video, we have uh we have a Python code where we can see in action all of these agents. Some of these agents were the healthcare agents or customer support agents, browsing agents, so on and so forth. And through the means of SDI, we have reconstructed every piece of of uh of this agent to make them write some shell code loading uh uh loading for us. Now that we know how to how to uh attack through the means of special token injection, let me categorize on how uh how hard is it to validate and to
remediate these findings. The raw user input through the raw special token injection is probably the the easiest one to explain but the hardest to validate in cases where you don't have local access to the LLM from a from a blackbox perspective. validating that the exploitation worked. At least last year was harder just because models were weaker and and had lots of uh lots of hallucinations that could make you think that that the that a a particular payload worked but maybe the payload did not work. This is an issue from uh from uh hugging face where um where through the means of applied chat template and the split special tokens they claim that this vulnerability is
fixed. However, however, marking this particular uh argument split special tokens as true to basically prevent users from injecting special tokens in user input. There there can be side effects as uh the uh hugging face team have mentioned and the reason why there could be side effects is because the model would then never know where where or when to close a message when to invoke the tool calls and it would be impossible for the LLM to to be helpful to its um to its human. The next uh category which was through the chat completions API. This is by far the easiest one to expl uh to exploit and the easiest one to to um to validate. And in addition to validating,
it's also the easiest one to fix because this functionality this this part of exploitation works through the API. What the developers need to do is prevent end users from manipulating with uh with all of these um role parameters through the means of user input because the user input must always be marked under the user role and in addition to the user role that part of the of the prom must only be a string not be an array so on and so forth. I can also share a a blog post that we have we have wrote specifically to to tackle this um this issue. And the the last one is the hardest one to exploit but it's the easiest to to
validate just because we need to we need to firstly compromise the host modify the ginga template and then we can uh we can validate the the uh the special token injection. Now let's talk about the the blast radius. Throughout our research we have identified like lots of vulnerable agents uh in the wild and through the means of special token injection we have validated that one in three agents are vulnerable to special token injection. One thing that we had like like open questions our research was mainly conducted in llama CPP and the transformers library. We had lots of open questions like which um which else providers in addition to Llama, CPP and Transformers, which of any other
providers do not split the special tokens and how many of these providers are vulnerable to special token injection? We we kind of have that answer right now. and and uh most of the most of the providers that allow you to deploy models uh locally they are all vulnerable to uh to special token injection. And uh one other question that that we have is can thinking or reasoning models use STI on themselves to to break uh to break alignment. And we have also listed a few a few resources and after the the talk anybody who wants to catch up talk more about special token injection please feel free to uh feel free to reach out and and have some talks.
>> Thank you Armond. Uh any questions?
Uh so when you self-host uh open source models and use open source inference engines, I guess you have um control of what the content and the look of the chat template will be. So I was wondering like if there's a way to harden that template or use some techniques to prevent this or uh is the mitigation done at some different levels of the model. >> Uh do you mean the ginger template? >> The chat template. >> Yeah. So if um if the template lives on the host and end users like the one users that uh communicate with the LLM through the means of an API or or whatever if the user does not have uh access
locally to the to the um chat template. They cannot modify uh the Ginga 2 template. But but there are plenty of uh Ginga templates in the wild where if you add like a a system role the previous system role that was set by the developer it will be overwritten. >> So now depending on the model it depends also what what kind of template the model uses. Different models use uh different templates. So my suggestion would be to take uh take a deep look at the at the um at the template that is being used and use uh token buster which is our tool over here. You can host the the token buster locally as well. By the
way just let me let me reach this. So what you can do in this case you can select the model that you are using which will automatically uh populate the the template that is being used by your model and you can see in real time how the how the conversation flow works from the user's perspective up to the uh generation of the content from the from the model and in there are some cases like in in when a user swaps their role from user to system this previous system message will be overwritten automatically and it all depends on the way that that the Ginga 2 uh template is uh is configured but for the for the
attacker in order to manipulate the the template they must compromise the the host firstly. >> Yeah. Or I mean it could be like misconfigured if you could if you change the the the cont. Thank you. Uh anyone else? Yeah. uh till I reach him. So do you have any advice for developers who are writing these agents that they can do to avoid such uh injections? >> Yes. uh for for raw special token injections the user input. Now, now developers must must firstly know what model are they using and depending on the model, they they have to check the tokenizer files locally where all of the special tokens are in a in a JSON file and map all of these special tokens that
can be used by the model and create a custom logic through the through the API so that if the user input contains these special tokens to basically automatically remove them from the from the user input. That's for the raw special token injection attack. For the API, users must never be able to send an HTTP request that has a body data very similar to this because what an attacker can do, they can literally add like tons of messages in a single HTTP request which will result in higher token consumption. or what an attacker can also do is basically create like a spoofed messaging uh messaging schema and depending on the implementation the the the impact uh differs as well. So in
cases that that particular model has uh tool call links through the means of this HTTP request tool call links can be hijacked or the model can be can be jailbroken so on and so forth. Hello. Uh, is there any way to access the ginger template not to modify just to access it without compromising the actual machine? >> Uh, so the question was, is there a way to access the ginger to template and modify it on the run? >> No, no, no. Without modifying, is there any way to view it? >> To view it. Yes. Yes. If you are using uh local LLMs, yes, every local LLM has uh has the ginga template embedded somewhere in the in the files. Once
again, if you use token buster, you select the model of your desire, the the chat template will automatically be be populated depending on the model. >> Uh, no, my question was actually uh from an an attacker perspective. So, if I want to see what what ginger 2 template looks like on an LLM that I'm trying to compromise, >> am I able to pull it out or download it or grab it so I can see the template as it is? You know, >> uh, it depends. It depends. It depends on the uh on the deployment. If somehow someway the developer uh makes the template publicly accessible then the answer is yes under default settings then the answer is no. However, if the
attacker knows what what uh model is being used then they can know what the Ginga 2 template looks like unless the Genja 2 template has been modified by the by the developer. So it it it all depends Just give me a second. I'll be right there. So, uh it's 2026 and we're dealing with input validation all over again. I'm happy that uh that we've reached this far in technology. >> Uh is there a way to enumerate which model is being used? I mean, there are there are plenty of uh of papers in the wild that that um that argue that you can fingerprint the model, but I I didn't I didn't validate it. I mean, I I've had cases where I
where I asked um Claude uh 4 something and it told me that I am DeepSeek and vice versa. I asked Deepseek on which model is this and it told me that hey, I am Claude. So, >> thanks. Thanks. >> Nice. Any other questions? If not, feel free to to catch up after the after the talk. I would love to talk about it. Thank you, Armon.
Before you leave, uh public service announcement on the ground floor there's a CTF house if you want to participate, if you want to have fun. Uh-huh.