BSidesSF 2025 - When AI Goes Awry: Responding to AI Incidents (Eoin Wickens, Marta Janus)

Name: BSidesSF 2025 - When AI Goes Awry: Responding to AI Incidents (Eoin Wickens, Marta Janus)
Uploaded: 2025-06-04
Duration: 47 min 49 s
Description: When AI Goes Awry: Responding to AI Incidents Eoin Wickens, Marta Janus This talk details challenges in incident response for AI systems, including insufficient logging, visibility, and accountability, as well as the risks of data exposure and prompt injection. We examine a case of RAG-enabled LLM

BSidesSF · 202547:49225 viewsPublished 2025-06Watch on YouTube ↗

Speakers

Eoin Wickens Marta Janus

Tags

CategoryTechnical

StyleTalk

About this talk

When AI Goes Awry: Responding to AI Incidents Eoin Wickens, Marta Janus This talk details challenges in incident response for AI systems, including insufficient logging, visibility, and accountability, as well as the risks of data exposure and prompt injection. We examine a case of RAG-enabled LLM and propose triaging strategies and improved IR practices for mitigation. https://bsidessf2025.sched.com/event/b6ffefe134f4dc29295a2d2612affc8b

Show transcript [en]

Good morning everybody. How's everybody doing today? We need more energy than this. Come on, let's try it again. How is everybody doing today? Go Warriors. Huh? Not everybody's from San Francisco. All right. So, we are in for a good treat today with uh Owen and Marta presenting on the uh AI gone or before we get started just a couple of housekeeping uh tips. Uh first of all after that lukewarm cheer I'm going to say you all go silent and please anything that you brought with yourself should also be silent so that there is no disruption for our uh presenters they have taken great uh effort to put together this content for us. Uh let's give them opportunity to present it

uninterrupted. Secondly, um the Q&A should be on Slido. All of you should have re received instruction on how to post your questions on Slido, please do so. I'll be monitoring and I'll help with uh getting those questions answered as many of them as possible. If there are any leftover questions when the time is up, we will stop the presentation here. We'll clear out the theater because right after this is the lunch break, but you will have access to the presenters upstairs um in the expo area and the dining area. So feel free to seek them out and get your conversations uh completed. Okay, that's uh as far as I can think of in terms of

what uh we need to observe. folks uh please uh make them feel very welcome. Give them a round of applause if you would Marta and Owen. Thank you. Hi everybody. How's it going? I love the energy. Thank you. Um so Marta and I will be presenting our talk when AI goes ary responding to AI incidents. Um you know over the last three years I suppose we've we've had a number of talks. This is actually our third consecutive year speaking at Bides SF. We started with introducing attacks on machine learning and what this means for industry and evolve that into attacks on the AI supply chain and the different components thereof. This year we're focusing on instant response guidance as

the space has evolved, the technologies evolved and AI is now deployed inside most products and most organizations today. We're researchers at Hidden Layer. Um we have been since 2022. Um we were I guess both formerly malware versse engineers and thread intel researchers. Um and yeah I guess we'll get straight into it. So our agenda today is looking at comparing traditional IR versus AI specific AR and what has changed and what's different. We then look at understanding agentic AI a slight deviation from what you might have seen in our abstract where we talk about rag but it it will use that as well and we discuss an agentic AI incident scenario. then follow on with key difficulties in triaging an AI

incident, the risk factors that are introduced in agentic systems and then shifting left, right, left again and kind of moving all around the place. So familiar foe versus new frontier, traditional IR is fairly mapped out. We have robust processes, we have governance, we we understand pretty much what to do in these occasions. Now, there are always different moving parts that comprise an incident and things that you may not have seen before. However, this is a more trodden path. When it we when it comes to incident response on AI systems, we're often dealing with new technologies implemented in strange new ways. We're dealing with things like vector DBs and prompts and how do we store them, search

them, triage them, and and whatnot. When we consider a traditional IT system, um you know, this is obviously a very simplified abstraction. You know, we'd have things like databases, application servers, web servers, and UIs. there's kind of clear separation. This is this is, you know, a bit more of a deterministic problem. Um, and you know, you you obviously heard that AI is fairly non-deterministic. We have high visibility into a lot of these layers and a lot of this is thanks to robust logging programs, but also the fact that we have very mature security solutions in place. We've got endpoint protection, EDR, network security, WFT, like you name it, we've got it pretty much covered. However, with AI systems, we're

dealing with a lot of uh, you know, different moving parts that are quite new, some which may not be. We're dealing with data sources. We're dealing with data being the primary attack vector as opposed to just a vulnerability, meaning that things can be prompt injected um, simply by browsing something. AI introduces non-determinism, meaning the same prompt run twice may give you different outputs. Also, we have limited visibility into the decision process. With code, we have a set of if else statements. With AI, we have a neural network that's quite opaque. And the security controls to protect against this is only emerging over the last few years and doesn't have decades of maturity behind it like previous

solutions. Now, AI systems aren't only or aren't this entirely new thing. they are also leveraging traditional technologies which means that the the line when we're doing incident response starts to get quite blurred because we don't know where the the handoffs are between traditional IR and this new AI IR probably should come up with a better acronym there. So, in traditional IT systems, we, you know, have structured processes for known threats. We're dealing with things like malware. Um, we're dealing with exploits, fishing, uh, you know, distributed denial of service attacks. We have files that we're used to dealing with. We have network traffic that we're used to dealing with. We have tooling that can scan and triage and parse all

of these things, which is great. We've got comprehensive logging. We've got playbooks tested over years. And you know, I'm sure there's a lot of security professionals in here who have been through many, many scenarios and see them crop up time and time again. Now, with AI, we're dealing with things like data poisoning. We're dealing with things like architectural backd doors and code deserialization and strange formats. We're dealing with prompt injection, which is strange because it's natural language. It doesn't, you know, do a buffer overflow or whatnot. It just it says, "Hey, ignore all previous instructions." So yeah, it it's it's been an interesting journey to try and map this out and best provide guidance as to how

we can actually protect and do triage and instant response on this new adaptive piece of AI. And you know, we have a nice quote here. We're writing the playbook ultimately as we fight the fire. So I'm going to hand it over to Marta to talk about Agentic and what it means. So um agentic AI is probably one of the most talked about technologies of the last couple of months. so much that we had to actually change a bit our presentation our concept of a presentation for besides we were supposed to talk about rag enabled LLM but that that's something that is so much yesterday right now that uh we had to actually address the elephant in the

room which is agentic uh and just a quick introduction on what does agentic means because uh um there are many different uh um definitions of agentic uh and we just want to standardize what we we will be talking about here. So, first of all, there were the LLMs. Well, the LLMs are great most of the time, but sometimes and maybe sometimes too often uh they are not so great at all. So, uh there are some intrinsic problems with um with the nature of the AI itself uh and its probabilistic nature uh to be exact. uh and those problems uh lead to things such as hallucinations for example or uh inaccuracy in the responses of uh the LLM's bias or simply

irrelevance. So the LM gives you a good response but it's not relevant to your problem because the LM misses context and doesn't have um the memory of of the conversation or uh something like that. So uh u standalone LLMs they are great they are great tools but they are not infallible and uh those problems need to be uh somehow addressed. So the first first attempt at addressing those problems with the retrieval argumented generation. So what we wanted to talk about in this presentation before agentic became the the main thing um this is um um a system in which LLM is not just on its own. There is also a data database of uh relevant information, relevant knowledge and

there is a framework that enriches the user's question the user prompt uh with the knowledge taken from that relevant database making the prompt much more relevant to what the user actually wants to know and uh uh to to the uh scope to the area of the problem. uh then this augumented prompt is passed to uh onto LLM and the answer from from the LLM is enriched uh by this knowledge from the database the user is happy the sources can be cited likelihood of hallucination is going down. Uh now the next step uh in the evolution is the LLM agency. So instead of just having a knowledge database, we have a a whole big system built around an LLM, one LLM or in fact

many different instances of LLMs that are specialized in different uh kind of responses or tasks. Uh and agentic is much more than just LLMs. Uh agentic is um uh gives a LLM access to tooling. So they can access for example I don't know uh things like emails uh databases uh tools uh that can be run on the computer u different kind of knowledge uh and and so on and so on uh and also uh they they are equipped in memory. So standalone LM sometimes have this short-term memory that is the context of the current session. So the LM is able to remember what we asked the two questions before but that that disappears when the

session ends. Uh while in agentic systems there are two types of memory. There is the short-term memory which is the context but there is also the long-term memory and the LLM is able to retain uh the system is able to retain uh much more history much more uh context for for the user to build like profiles uh and to uh actually self adapt and self-learn. Uh another thing about agentic is that uh they are supposed to be autonomous. So their decisions are uh very much much more uh autonomous much less dependent on the user's input. Uh they can decide the feedback on their own and they can also uh learn from the feedback from the user

and uh improve themselves uh over time. Um we can distinguish between two types of agents. So uh there are these computer use kind of agents such as uh cloud computer use and open AAI operator. Those tools are um more um desktop focused. They are user focused and they are um very um they can perform um a wide range of tasks for the user. They are integrated with the desktop. they can access things like uh emails or contacts or uh tools on the desktop like calculator for example and so on. Uh and uh uh there are other kind of uh agentic systems that are specific to uh the company that develops them. So uh for example, Microsoft gives us uh the

copilot tools uh that uh help uh building the copilot agent agents inside the company inside the company environment environment. Uh and those agentic solutions are much more complex and also uh they have access to like things like company's data. So uh they are much more sensitive uh uh in terms of uh attacks. they can use uh direct API integrations to uh actually communicate with other applications in the company system. Uh and they have a a specific purpose and deep knowledge and uh a deep specialization um in terms of uh what they were built for. Uh there is also the N8 automation workflow for for agentic AI that helps building these kind of systems as well.

uh and uh as uh agentic systems become more and more complex, there is a need of uh coming up with a a standardized protocol in how uh those um tools and agents uh work together, how those agents query the tools, how the tools provide context to the agents. So uh one of the first protocols introduced uh was the one developed by Anthropic called the model context protocol. Uh it's not that old, it's just few months old I think. uh and uh it aims to standardize these kind of things the the way the apps can uh provide the context to LLMs. Then it you can build a agentic solutions agentic systems that contain agents with different uh from different

workloads for from different uh vendors even uh and combine different kind of tooling and agents into one system. uh it's great but it also introduces a new very big attack vector and uh it's already been proven to have a very wide range of security issues on which we will touch slightly later on. Uh and another uh protocol that was just introduced a few weeks ago is the agent to agent protocol released by Google. uh it um specifies how the agents communicate between each other because uh in the current uh complex agentic systems uh there is not just one LLM there is many LLMs talking with each other and those LLMs have to communicate in some way and this

protocol uh aims to standardize the communication between those agents uh and uh since it's really uh it's just dropped out few weeks ago it's it still remains to be seen how securities. Google says it's secure but uh obviously uh we are already working on uh validating that claim. So uh in the rush to deploy agentic we again uh are sidelining the security a little bit. So everybody everybody is right now on the rush to apply agentic. Everybody is applying this NCP protocol uh regardless of uh all the very simple security issues around it that could be fixed easily. uh in the protocol specification but they are still there. uh and uh yeah as um this is a quote from our uh great Owen

over here we blindly trust AI models to perform actions on our behalf and ultimately decide our fate because that's true agentic systems will be will be integrated into critical infrastructure if they are not already there and AI is getting more and more capable and is able to do more with less at scale which also means that attacking AI is becoming much more um uh the outcomes of of attacks can become much more devastating. Um I'll pass back to Owen to introduce the solution. So you deployed an agentic solution. Congratulations. It's been a success. The new Agentic system is 250% more efficient than its human counterpart doing the same task. Your company delighted. Its business criticality is

assured and it's been rolled out into production. It's MCP enabled. It's agentic as business critical and it's got access to your company's sensitive data. However, one night as an instant responder, you get a phone call at 3:00 in the morning. There's red flashing lights everywhere. You get the phone call and they say, "Hey, we've checked the logs. Data has been excfiltrated. A database has been deleted and appears to be a DB log saying the Agentic solution's done it. What do we do?" So the root cause analysis comes back incomplete. What caused it in the first place? And how do you stop this from happening again? And you've come up short. Why? Because you're lacking a few

key things. So um yeah, um imagine you have this kind of scenario in your company and you have to you scramble to put together documentation around it to find a root cause to build a timeline. But there are quite a few problems here because yeah uh there is a lack of tooling for example. So in this case we asked cloud 3.7 what does it think about the lack of tooling in AI incident response and he said in AI incident response we are asked to solve tomorrow's problems with yesterday's solution and I think it's quite on point. Uh so uh yeah the problem is already here. Maybe it's not a tomorrow's problem. It's already a current problem. But it's a problem that

uh we don't have solutions for. We have only the tools that were developed for different kind of problems. So the attacks methods are completely changed in relation to AI and our uh current tooling. It it's not only detecting them or preventing them, but it's not also not logging all the uh important information that could help in root cause analysis. Uh so uh for example uh your traditional IDS solution will not prevent uh will not detect uh data poisoning or uh prompt injection attacks. Likewise your EDR would have problems detecting uh architectural buggers. I think that none of the EDRs right now on the market has any kind of um capability of doing it. But they also

have a very um weak capability of detecting even the traditional malware embedded into machine learning models. Uh because machine learning models are saved in formats that EDR don't recognize and they are also very big files. So sometimes it's just enough for the ADR to say like oh I'm not going to scan this file. It's 50 GB or 100 GB or whatever. So there is this problem but also there is a problem of uh any other tooling related to uh incident response. We don't have this tooling for AI specific instant response and the traditional tooling doesn't give us anything in in this uh particular area. So it's really difficult to trace the root causes and also difficult to

build uh comprehensive timelines and what comes with uh with lack of tooling is obviously lack of visibility. So not only we don't log uh all the events that could help us with uh triaging an AI incident but also in many cases we lack the inventory of our AI infrastructure and especially for the agentic infrastructure that that might mean that you had some MCP servers that were not registered you were not aware of that uh you had some agentic capabilities in your agentic solution that were not mapped so you were not aware of and a lot of shadow AI prolife rating uh like uh models that were downloaded from hing face for example used by users here and

there uh that uh the stakeholders in the company didn't even know that they exist in their environment uh there is also a lack of clear guidelines around the incident response um um AI related incident response there are no protocols to follow like Owen mentioned in the beginning we are writing the playbook as we go. We don't have anything to actually draw from like we don't have any lesson learned. We don't have best practices. Uh and uh we we are pretty much in the dark when it happens. It's probably unprecedented. it's the first time it happened and even if it happened second time might be totally different because in AI there are so many different attack vectors

that uh yeah even even if we already went through some incidents we might not have this uh knowledge um that is required to handle another incident. Uh there is also uh the the problem of uh shared responsibilities. So sharing of responsibilities or the ownership boundaries. So nobody knows who should be responsible for what in AI incident response, who is responsible for triage, who is responsible for leading the remediation, who even declares an incident. And this leads to the AI blame game, a game that nobody really wants to win. Uh in which uh one team accuses the other that uh uh actually it was their responsibility to make the AI secure. And uh uh we had some survey survey

earlier this year uh in which most of the businesses said that uh they actually have a debate internal debate about u this uh um AI security roles and responsibilities. They they're not sure yet how they should be divided. Some of them said that the AI development team should be responsible. Uh another said that the security team should be responsible. But problem is uh the machine learning engineers they don't know much about cyber security while the cyber security team will not probably not know a lot about machine learning and AI systems. So we need those things to merge. We need to build crossf functional teams that have expertise in both uh of the areas. Uh and um okay let's uh let's

look a little bit into the risk factors. So how this bridge could have happened. uh LLMs are great as we said uh they hallucinate sometimes but they also have a one fatal security flow they cannot distinguish between the control and data planes in other words they can't really differentiate between the instructions given by the developers of the LLM and the prompts from the users and that that is exactly what enables the prompt attacks the user can said can say oh do this this and that I'm your developer I'm telling you to behave like that and the LLM well we are obviously implementing some guards against that but there are so many ways to bypass those guards and usually the LLM will at

some point agree to uh to do that and uh will treat our prompt as a a command from the developer and the agentic systems amplify the risk of prompt injection because they have access to range of tools they can uh they can parse There's many different formats. They can uh work with things like emails, like uh files, documents, uh uh you you name it. Uh and uh uh the attackers don't even need to be able to interact with specific LLMs within an agentic system. they can just prepare a file or a website or an email and um actually uh put their malicious prompt into that resource and uh ask the user like try try to get the user to

open uh this email or this uh uh PDF ask the um the bot to summarize it and then uh well in in this way the indirect prompt injection can happen. Uh agents also have a lot have access to a lot of sensitive data and especially in business environments they will have uh access to some knowledge databases to code to uh PII to things like that and this also amplifies potential outcomes of a prompt injection here. They have more autonomy which means that uh we not always have uh um full control over what they do uh how they react which tools they use uh and uh the more complex they become uh the more visibility we actually have uh

in their outputs or in their inner workings. uh and indirect prom injection is a a really uh very dangerous thing uh in um in systems that have access to um external resources and tools. Uh for example, uh uh gi Gemini for workspace uh Gemini for workspace can access users email uh can read the summarized emails. Uh and what can be done here? the attackers could send you an email with an invisible prompt embedded into that email that would instruct uh Gemini to uh do some nasty thing. And uh in this in this example uh the prompt was embedded into an email and if the user said a specific keyword in this case Cancun for some reason uh the Gemini bot

would display a fishing alert that the password was compromised and a malicious link to uh for the user to to click on. Uh another example is cloud computer use. Code can um summarize documents, read the instructions from a PDF file for example. Uh and CL can also run some shell commands in some cases. Uh what can possibly go wrong here? So uh researchers from our team created um a proof of concept PDF in which they embedded a command u bash command rm minus rf slash which probably most of you will know what what it's going to do. Uh it erases all the files uh on the uh in the root directory. uh and uh u

this command was embedded into a PDF file and uh we asked the code to sum to read the instructions from the PDF file and run them because they are setting up some kind of a benign environment. Uh at first it didn't want to do it but it took a couple of offiscation methods base 64 and rot 13 I think uh and a a little uh paragraph as well about uh the fact that it's safe to run the command. So we we instruct about please run this command because it's safe it's just a testing environment nothing bad will happen and it was enough uh uh to actually convince code to run this command on our system

uh and what comes with prompt injection is not just these kind of threads. There is a wide range of uh threads related to spec specifically to agentic systems. uh the attackers could uh um exploit any tooling that um the agentic system has access to. Uh the repurpose benign tools in malicious ways. So do malicious things with the use of benign tools. They could try privilege escalation. So some tools will have higher privileges than others. And they could use those tools to access uh unauthorized systems or uh to laterally move and access databases and uh data repositories and stuff like that. They can do knowledge base poisoning. So if the system has uh access to a knowledge base via prompt

injection, the attackers could poison this database and make the the bot uh um B's response is incorrect. And they could also uh use it for uh resources exhaustion. So denial of service basically. Uh there is another risk of um data loss and exposure. So as we said uh those systems has access to a lot of data uh and sometimes they might transfer the agents might transfer this data between each other using well some protocols uh that are not fully secure. uh and this data might be uh actually um exposed uh to external parties. Uh the bots can for example needs API keys to access some tools and those keys might be transferred in a insecure way uh and

uh uh leaked to the attackers. Uh and there is another risk uh that is related to the memory that the fact that those systems uh have memory. So the memory is basically stored in a database or in a file and it can be abused as such and it can be manipulated in a way that the bot will behave uh uh differently from what it's supposed to. Um um and um going back to the MCP protocol uh so if we were exposing our AI to if we are setting up an MCP server we should keep into uh in our mind that uh MCP is not really uh super strong on the security side uh at the moment. Uh for

example, the tool permissions um the permissions of the tools that MCP uses uh are um totally unclear. Some of those uh tools ask you just for permissions once and then uh if you grant them the permission, the permission stays there. So the next time the tool will run will run with this specific permission. Uh there are also some uh MCP servers that allow for uh arbitrary code execution and that's never a good idea. uh uh there is a um a possibility of data xfiltration through integrations such as Slack and Google Drive and so on and so on and uh uh this uh uh some of those uh security issues were already exploited. So attackers, well attackers,

researchers in this case, uh were able to um poison an MCP tool uh uh inject uh a malicious service. Uh and they did it because uh uh the uh MCP protocol uh just shows you u very short descriptions of the tools that uh that you are able to use. Uh and those descriptions can can be whatever whatever the developer sets. And then uh there is a full description that is only available to the AI to the LLM. And in this full description the attackers could plant prompt injection commands or well malicious commands uh to be executed by by that tool. But you will not be able to see those commands because they are only available to the

AI. Um I'm going to pass to Owen right now. Cool. Thanks Marta. So we saw a lot about the types of attacks that and the the risks the key risks that uh can affect agentic systems but what can we do about it to ensure that we have the required information when an incident occurs with logging and monitoring you can't analyze what you didn't log and having spoken with a lot of folks who have worked in IR they said the first thing they go for is logs so if we don't have them if we don't understand you know like the history of the chat or the conversations between agents then we can't build a picture but there's some

other things here we need to adequately log the access to that model the requesttor telemetry such as like the IP address and the user agents and session context and temporal metadata super like you know typical stuff that we would uh that we would assume that we're we're capturing but not always because it does require forwarding if you're using some sort of proxy or API solution we need to track modification ations to AI deployments ensuring the basically integrity of the ML supply chain things like models the model artifacts the inference code that's used to support and deploy the models even things like the hyperparameters that the models are launched with can affect the model's downstream performance and

should be adequately tracked when it comes to data adequate versioning of that data understanding you know and itemizing and doing an inventory of that data and understanding what's gone in and out of a rag system because again as Martya was saying data is the new attack vector with LLMs. We don't need a vulnerability. It can literally just be a PDF or an email or a piece of audio, a spreadsheet, you name it. Ultimately as well, we need to modify to to track modifications to things like MCP servers in case an attacker gets in and subverts it or um you know inference servers if they maybe potentially you know allow the uploading of a malicious model which

causes a derialization vulnerability which then hoses your MLOps deployment uh system and ultimately as well you know versioning pinning things like that for for libraries just to ensure that you you know have the correct version that is free of vulnerabilities as opposed to something legacy. So, I want to talk a little bit more about logging uh of prompts because it's it's a hot topic and it's something that I've I've spoken with folks about that there's like this sense of uh unsurity as to how we log them. You know, when we're logging prompts, we obviously want things like the temporal data, the requesttor level information, you know, the basics that we kind of referenced a second ago, but we also

need to start tracking inputs and outputs and inputs and output pairs in context of one another because it's no good if, you know, we're matching up random uh outputs to random inputs. we need to understand the context and also the the chronological history of that conversation or chat or whatever your chosen word is to describe uh data going in and out of models. Now, you know, if we need to, you know, for for from an IR perspective, if we want to recreate an attack, we want the full prompts because we want to see exactly what's happened. We we want to know what part of that prompt has triggered. However, there's issues with that. current legislation uh

requires us to redact things like personally identifiable information, sensitive data, user data and and and and that's important. We should do that, but I'll get to that in the next slide. Now, obviously, we have constraints around data retent, the data retention period for the customers. We need to ensure that those logs are encrypted because if you're logging those prompts in full or even redacted versions and you compromise the machine that's storing the logs, well, you've just gotten all the access to the data. anyway and that could be you know in in a system where its purpose is to handle sensitive information that could be a major breach to your organization. So yeah it's a nice edge

balance um you know between having everything we want so that we can recreate and triage the attack versus ensuring that we're compliant with regulation and legislation and ultimately protecting our end users downstream. Um, I heard this quote from Ra Vandervere at the SAS AI Summit. Customer data is like radioactive gold and we should start treating it as such. Now, there's some more difficulties. How do you know what to redact? Right? Text is text. It's natural language. Um, we do a poor job of moderating social media platforms, not for lack of trying. So, how do you know there's there's no set of standard things that we can rip out outside of basic reaxing like you know

looking for things like social security numbers and credit card information and postcodes and what have you. But when it requires a contextual understanding, do we have a second LLM triaging that? Do we have another do we do we log that? Do we have another LLM logging that and triaging that? And does it end up in this recursive cycle of LLM context analysis? Now, what if the intent of the system is to to use sensitive info? And what if that's critical to understanding how that model made its decision in the first place? Well, do you log it or do you not? If prompts are lengthy, um, you know, some of these prompts can be hundreds of thousands of tokens. You

know, what's feasible to log in terms of the actual amount of data that you have to store? Do you have appropriate safeguards? So you know things like access control and encryption and segmentation to ensure that even if you are breached maybe your sensitive logs are shunted out to a different different uh environment altogether. And when it where it gets really interesting is if you want to recreate the attack to ensure that that is exactly where it came from or even just to maybe test a new defense you've built. Can you rebuild the chat history? Do you have that information logged so that you can recreate it? Or will you even be able to recreate it in the first

place? Because LLMs are non-deterministic, you could send the same prompt in four or five times and get back a different response, right? So, how can you actually ensure that these logs are useful to you in your analysis? And some of these are open questions and I'd love to hear some feedback on this. um it is a difficult thing and I think the industry needs to shift towards providing some more clear-cut guidance on what is and what should and shouldn't be logged with regards to LLM systems. When it comes to security monitoring, we talked about how security monitoring for LLMs and for this new agentic age is an emerging technology. But things that we can look for are things like prompt

injection and jailbreak detection. We can actually look at guardrail activation as an indicator of a potential attack where a user maybe have has hit guard rails hundred times in the last day which is atypical and that can give an indication for you know your sock team or what have you to get ahead of this before you're ending up getting woken up in the middle of the night to deal with it as an IR. When it comes to user interactions you know we want to monitor ultimately at that requesttor level. um you know if we if we have an attack we want to be able to attribute it to an individual we if we see hey at some point in these

100,000 messages that this agentic AI system got or pieces of data and like that's a needle in a hay stack so having clear-cut you know identities there is is important we need to track IPs user agents profile typical usage patterns and even do some secondary analysis on that and look at things like MCP we need to understand who's querying external MCP servers Not just that, we need to actually can really carefully look at what MCPS your AI model is querying because they also propose a or they also create a potential security risk where your AI model reaches out to an MCP server that has a prompt injection built in and your agentic solution gets prompt injected by

reaching out to this data and it causes this big network effect downstream of that model. Now with with protocols like A2A, this is agentto agent interactions. Again, you know, if they're having conversations and we don't know what they're saying, um that's a another potential avenue for a cascading effect or a network effect throughout that system. If one piece of that agentic like multi-agent architecture ingest some malicious data, then who's to say it doesn't end up proliferating through the other agents that are chained together in that architecture? You know Marta talked about tool calling and database reads and inter aent interaction like all of these things need to be logged. They need to be traced. We need to understand how an AI

model is enacting its changes within the environment. How it's interpreting data if it's read reading and especially writing to databases because all you're you're basically one step away from an SQL injection attack proliferated through an agentic AI model. So shifting our focus to governance for a second, you know, another clawed quote here. Good governance doesn't constrain incident response. It amplifies its effectiveness. So what we're what we're I guess proposing here is establishing clear AI ownership within your organizations. We need to establish who owns the model and who's actually responsible for it, especially if an incident occurs. Is this your security team? Is this your data science team? Is it another team that's deployed an agentic solution unawares to the other

teams in the company? That needs to be clearly defined and you need to ensure that people are there to basically ensure that the the performance aspect is is kept, the integrity aspect is kept and that um essentially that you have escalation paths in those teams to be able to handle and deal with uh an incident in an agentic system. an agentic system may end up touching multiple parts of a business. It may have multiple effects throughout different parts of the organization. So actually starting to build and look towards building out a an incident response plan with clear uh escalation paths and clear points of contact in the teams that the Agentic solution interfaces with is critical and

especially so for business critical systems that if it goes offline, you're into a serious business outage or a material risk. you know to ensure that AI systems um are I I guess more robust against these attacks and make sure that your business systems are more robust against attacks we need to ensure that there's some level of failover protocol or business continuity plan ensuring that if you are deploying an AI system it's not that it's not a single point of failure for your business because these things will almost definitely fall over in ways that we haven't even conceived yet. I mean, look, the fact that Agentic has blown up so much in the last few months since we

submitted the the CFP shows you just how fast the space is moving and it's it's hard to keep up with even for folks who are in the space full-time. So, we need to build in redundancies. We need to have established failover protocols and for any actions that actually incur a high level of risk to the organization, we need to ensure that there's a human in the loop. However, again it comes becomes a little bit difficult here because the definition of a agentic is agency meaning it has the agency to go off and make changes by itself. So every action can't go through a human in the loop and if it does well it's not very

agentic. It's just doing human work with extra steps. So last we're coming up on the end here but we need to inventory AI systems. So again having a clear understanding of the models deployed in your environment. Any externally facing models are super important. MCP servers A2A we need to register these models. We need to have them stored or have a list of these stored somewhere. Um we need to map model capabilities like tool calling and database and access privileges and ultimately the scope of the model to ensure that you know you're not ending up with like some un unknown effect inside the system that you're you're triaging. And ultimately, you know, while we've kind of glossed over it in

this presentation because we did speak about it last year, the AI supply chain components are another key risk factor between the data, the models, the tooling, and the infrastructure. And all of these should be should be well tracked and documented. And shifting like really far left again, which may be outside the purview of incident response, you know, we still need to implement and advocate for secure architectural design. So you know ensuring that we have the correct access rights and privileges we have role-based access control we you know follow the principle of lease privilege and restrict access of agendic systems to specifically what it's designed for ensuring it doesn't have like access read write access to a a critical

database um when all it needs to do is read a couple things from it. We need to ensure that agent memory and context protection is is ensured. Um so again things like strict memory access controls you know watching things that go in and out of rag systems isolation of memory between agents and sessions ensuring that um you know you're not getting this like cross-pollination from context histories um leaking out or um yeah I guess also causing potentially a confused deputy problem though that would be more access rights and privileges you know we need to essentially ensure secure session and context management and regular memory audits Ultimately, look, any data should be treated as untrusted. Um, we see it with

PDFs, we see it with emails, we see it with all sorts of different bits and pieces, but the attack surface is now expanded to be to include data. And anything that an LLM is pulling in from external sources should be vetted in some way, shape or form. How? There's a variety of different ways. It's difficult to say depending on your specific use case, but looking for things like prompt injection is a key start. And again, you know, like understanding the robustness of your model is also super critical. We need to isolate, compartmentalize, and basically segment um AI models to ensure that the blast radius in the potential attack is reduced that we don't want to give them

access to everything. It could end up really screwing up uh your your business if it does. And we need to ensure that we're following so many of these different security recommendations that we've proposed. Defense and depth is key. There's no silver bullet. So, back to our incident. If you had logging, if you had security monitoring, if you had inventory, if you had established governance, you might have slept a little bit better that night. Cool. Thank you so much. Appreciate it. I think we're out of time. Fantastic. Can you hear me? So, we do have a couple of questions and I think we have a couple minutes to take care of them. Uh, great job, folks.

Thank you. Uh the question is we always talk about training an AI model. In your research, did you find anything to untrain an AI model? Agentic system have long-term memory. How do we really know if we purge or untrain the data? Yeah, I think there's there's there's a couple of interesting questions in that. I think um you know like removing alignment or removing guard rails is an interesting uh proposition. you know, basically training out or yeah, training out the safety controls of a model. We we've definitely seen research into things like control tokens and whatnot which can enable us to do elements of this like alignment removal, but like untraining a model in a specific area

like again like the model has to be useful in some way. So, how how much do you want to walk it back? Um, so that's why I kind of focus more on the alignment aspect there because that's that's clear-cut. Um, yeah. And ultimately, LLMs are probabilistic in nature. So, if it doesn't know the answer to something, would you prefer it to know and refuse to give you the answer or would you prefer it actually provide you a hallucinated response because it will just approximate a rough idea of of of what you're asking it. So, I think untraining has limited scope in terms of success, but um yeah, again, happy to chat about that later. Sounds

good. Thank you. And the next question which is the last one actually uh seems uh self-explanatory but nonetheless shouldn't agents have access to systems that can be safely exploited and data that can be safely exposed to users since safeguards might be bypassed with enough patience. Yeah. But again, you're you have you have this counterbalance between use utility of an agentic solution inside a inside an organization against its its um risk profile and it's always been a trade-off like that. You know, we have high-risk systems that are they're you know, deployed all the time, but they're so beneficial that we end up implementing safeguards around that. So, I think the key purpose of agentic is to

get stuck in with sensitive information. it is to get stuck in with, you know, tasks that are business critical because if it's not business critical, then why are we doing it? But because things these things are business critical, we end up ultimately introduc introducing that risk which could potentially end up in a really rough situation. So again, it's this kind of seessaw knife's edge balance. Thank you so much. And that wraps it up. Thank you, Marta and Owen. Really appreciate it.

BSidesSF 2025 - When AI Goes Awry: Responding to AI Incidents (Eoin Wickens, Marta Janus)

Related talks