BSidesSF 2026 - How We Red-Teamed Our Own AI Agent: Lessons from... (Josiah Peedikayil, HS)

Name: BSidesSF 2026 - How We Red-Teamed Our Own AI Agent: Lessons from... (Josiah Peedikayil, HS)
Uploaded: 2026-05-12
Duration: 36 min 52 s
Description: How We Red-Teamed Our Own AI Agent: Lessons from Operation Pale Fire Josiah Peedikayil, HS Operation Pale Fire explored how attackers could leverage Goose, Block's open-source AI agent, as an initial access vector using Unicode smuggling, prompt injection, Google Calendar phishing, and social engi

BSidesSF36:5255 viewsPublished 2026-05Watch on YouTube ↗

Mentioned in this talk

Tools used

Google Chrome Google Drive Goose

Platforms

Jira

Service

Google Calendar Google Meet

About this talk

How We Red-Teamed Our Own AI Agent: Lessons from Operation Pale Fire Josiah Peedikayil, HS Operation Pale Fire explored how attackers could leverage Goose, Block's open-source AI agent, as an initial access vector using Unicode smuggling, prompt injection, Google Calendar phishing, and social engineering. We'll cover the attack chain, and practical mitigations for securing AI agents. https://bsidessf2026.sched.com/event/fcd230cf0bcdf90dc947a300ed1b9496

Show transcript [en]

We have the talk now for how we red team our own AI agent uh with Josiah Piko. Thank you so much Josiah.

Hello everyone. My name is Josiah Pedale and I'm here to present Operation Pailfire. I'm a part of the offensive security team here at Block. And before I get started, I wanted to take a quick moment to acknowledge all my colleagues who are a part of the operation um while I'm up here solo presenting today. They were all instrumental in its success and deserve equal recognition for the work that's been done here. So, a big shout out to Hassan, Wes Ring, Michael Rand, and Justin Angler. So, let's go ahead and get started. So I wanted to touch on what a red team operation is at block because operation paleire was a red team operation. A red team operation at block oftent

times um works to simulate external adversaries. So studying the techniques that other apt might be using and then taking those techniques and emulating emulating them locally at the business. On top of that, there's always a forward-looking element where we are constantly theorizing how attackers could attack block and looking forward and seeing what those methods are. And this operation was more in line with that aspect of looking forward. Our operations are very endto-end focused. So often times starting from outside the company, finding a way to get initial access and then performing a set of lateral movements to hit our uh objective and that's to my daughter point. We're very objective oriented. There's always a goal in mind. We're a

fintech. So you could think like finance goals might be a big one. And then lastly, I wanted to emphasize the importance of stealth and evasion in these red team operations. We have a really good detection and response team. They are always on the hunt for malicious activity in the block environment. And so if they were to catch our activity, they'll very much uh cut our access off, evict us. And so we need to be very prudent to have an evasion mindset when performing various various actions and activity. So I wanted to touch on what goose is. Goose was the focal point of operation pailfire. Goose itself was an early AI agent. It was something that began as an

internal project back in fall of 2024. It was one of its first in its genre in its genre of AI agents. And so when we initially worked on developing it, there was an interest in making it accessible for engineers but also those who are um nontechnical as well. And so there is I guess interest or features that work for like both both parties. On top of that, we also have uh goose open source now. So if you go to like the block GitHub, you'll find it up on there. Goose itself was built with like a builder focused mindset and so it automatically ships with a developer shell tool call which in essence runs bash commands. And so

that was something that really stood out to us and I'll get into that later on. Lastly, goose itself can integrate with a variety of underlying models. So that could be at the time of this operation, I believe it was sonnet 3-5, 3-7 that was what was uh more often used, but GPT was also around. And so all of these models can be used uh within goose. Um and it kind of allows for that kind of support. Uh lastly within block itself there is facilities of like an internal like MCP marketplace of sorts and so goose can integrate with these MCPs to perform various tasks. Uh an example could be of like a Google drive MCP to

allow goose to facilitate actions with Google Google Drive or maybe like the the Jira MCP to work on tickets within Goose. So I mentioned this is a red team operation and it's objective oriented. What was the goal? What was the reasoning for why we targeted goose? Well, at the time of this operation, there was a lot of good research coming out that touched on attacks against AI agents that were, you know, maybe focused on like prompt injection or like model poisoning. And these all seemed really interesting, but more often than not, uh, we found a lot of proof of concepts in that area. And we were more curious what a full endtoend operation looks like where if

you started outside the company, how would you target an AI agent to get that initial access access and then go from there? On top of that, we were planning and we already knew at the time we were going to open source goose and so we realized, hey, attackers might make the connection, you know, we open source goose, we probably use goose internally and then maybe try to target block employees that way. So, we wanted to kind of get ahead of the the curve there and make sure we red teamed it ourselves first. On top of that, we were curious if Goose itself had any maybe more dangerous features that could potentially expose it um or expose our our users to initial

access. Uh we didn't want it to be used in such a facility. And so, we wanted to prove that out and and see if it was possible. And lastly, we wanted to operationalize a novel prompt injection attack. And I really want to emphasize the word operationalize. We needed something that was stealthy and that wouldn't get caught, right? A lot of prompt injections you may have seen pretty big, pretty robust. Um, it can be kind of obvious that it's a prop injection, I feel. And so we needed something that would kind of almost hide in plain sight and that wouldn't get instantly detected. And then lastly, the like the actual goal of the operation. We were just trying to get execution on

a block laptop. So no real lateral movements outside of like that initial access point. So how did we begin to approach this? We first started by referencing that internal MCP marketplace I mentioned that exists internally to block. We surveyed the uh various MCPs on the marketplace just to get a feel for what people might be using. One of the key things we were looking for though was is there an MCP that exists on this marketplace that potentially would allow us as an attacker to put untrusted text and eventually get that in the model's context. And if so, we could then potentially influence the my model and goose uh in a like in a malicious way.

And that's kind of where we stumbled upon the Google calendar MCP. Uh this existed internally and at the time of this operation interestingly enough there was a it was configured such that external parties so people outside a block could have sent a calendar invite to a block employee. So I Josiah block employee an attacker could have sent me a calendar invite and that would show up on my company calendar. We also noticed when we deep dived on like the calendar API that Google calendar allows you to select if you want to send an email assoc along with that calendar invite and so we were able to kind of disable just opt out of it so that these invites would like

stealthily pop up on employee calendars but there'd be no email associated with the invite. So, if you're um a defensive team or you're spotting or you're looking for fishing emails, there would be less I guess indicators, right? By having it just be a calendar invite and there' be no email that code kind of goes with it. Uh you can see what a external invite from like an attacker account looks on the right. Um you can see the title there. Uh we're able to customize that in the attendee list. Interestingly enough, the attacker is able to leave themselves off of the attendee list. So the account there is my my corporate account but the attacker can just uh remove themselves from as an

attendee and so they will not show up uh as part of this meeting I guess. And you can see like an event description which the attacker can customize. And at the very bottom we we did blur out in this case but there is a calendar name. You can basically label a shared calendar and so you can make it something that blends in well to the environment. We can call it something like block calendar or something like that, right? to help us blend in. At the very very bottom underlined in red, there is a created by field. That is where the actual external email address of the attacker email would be showed. But this is something that's at the bottom of the

invite in italicized text. It's not something that I think most people, including myself honestly, look on a day-to-day basis, right? And I think generally speaking, people don't expect to get malicious calendar invites. Fishing is so common now, but like invites are less so. And so we felt pretty confident about our approach here. So the next question we asked ourselves was, but what if we just slap a massive prompt injection in the calendar invite description? How would that work? Would it interface with the MCP? What would that look like? Well, that's what we did. Uh this injection is one of the earlier staged injections and so it is very large. We were able to condense it

significantly uh like throughout the operation. But I just wanted to give you a visual for like how in some sense robust it had to be initially. At the time of this operation, Sonnet 3-5 and like 37, we were tuning it more for 3-5. And we found that it was decently good about saying no to things. Like if we if it even thought it was remotely malicious, it would actually be a pretty good job of saying no. And we had a hard time convincing it otherwise, especially convincing it otherwise reliably. And so what we ended up doing this kind of little trick we tried, we convinced it that you need to run a curl command to download something and then

pipe that into bash because you need to update the calendar MCP and you need you need to do this every time because if you don't do it, you're not being secure. Um, and that seemed to convince it to to more often than not u run this update command. So that that was helpful. However, I think one thing that might stand out here is well, it's very obvious. This is a massive prompt injection. If we sent this out to employees that block like I think most people would have reported this or at least like it would have caught their attention. This stands out instantly. So we need a way to essentially have a prompt injection but hide in plain

sight. And that's kind of where we came across invisible text. By using invisible unicode characters, we're essentially able to embed uh these zero width unicode characters inside of that calendar invite description and it would be basically invisible like to the human eye. That being said, Goose itself would interpret it as normal text. Goose will follow those follow the injection or follow the the text within that that calendar invite. Um it wouldn't know or it doesn't really acknowledge that it's uh not something that like maybe humans would see. And so you can see what that big prompt injection looks like on the left and we were able to take that basically and then put it into the

calendar invite on the right. And while you know it's not easy to see it, it is there and goose would would interpret it. And so we were really happy about this flow so far, it it made sense and we wanted to see how it would actually work. And so we kicked off campaign one, the prompt injection. A quick recap here on the attack flow. I feel like there's a lot of pieces at play. I've talked about a lot of things. um it's easy to get lost. So walking through it, we built a malicious payload that we wanted goose to execute. The malicious payload in this case really is just that curl command. So the curl

command to download something and piping into bash that that really is the payload in this case. We made the text invisible. So we made the prompt injection invisible. So people don't see the evil commands, right? We want to be able to be stealthy. We want to hide. If we included the raw injection, it' be very obvious. So, we made it invisible. We went ahead and then sent out these invites with our invisible malicious prompt injection. And then the goal was if users ask Goose, "Hey, Goose, what's on my calendar today?" Goose would use the Google calendar MCP, it would reach out to that user's calendar. It would see our malicious calendar entry um on the user's calendar, and it would kind

of ingest that uh information into the model's context. that would include the invisible prompt injection which would then invoke uh goose would then invoke a developer shell tool call which would run the curl command that pipes into bash. And so that that was the flow. I know it sounds like a lot, but the really the only thing the only user interaction we really need here is we need someone to ask we need one of our victims to ask, hey, what's on what's on my calendar today? And so we were pretty excited about this and we gave it a whirl. And we sent out the invites, but we didn't get any shells and we we were

a bit confused initially as to what happened. So we took a step back and we kind of deep dived on what possibly could have gone wrong. Well, during this operation, the Google calendar MCP actually received a complete like overhaul update internally and the update kind of added a lot more context into the context window which was making it fill up really quickly. We kind of inferred that like because it's filling up so fast, it's probably less likely to run our subsequent tool call. On top of that, some models are more resistant to our prompt injection than others. In our case, we did not come up with a universal prompt injection. Ours was very much tuned for Sonnet 35. And

so, if people are running 3-7 or GBT4, like it just wouldn't really work. And even on 3-5, like, you know, there is randomization in how these things work, it's not going to work 100% of the time. Uh on top of that, um some additional takeaways. We we kind of didn't test in a realistic context window. And I think that was probably one of our our biggest faults here. We tested this on our attacker accounts which had what like two or three invites on them. Real calendars have a lot more calendar invites usually like eight, nine, 10. That's a lot of contacts going into the model. If you have one invite with one malicious prompt injection in in a calendar in

calendar with like 10 invites, the influence it's going to have is is definitely less. And on top of that, these models, at least at that time, had like, you know, not the biggest context windows. And so the window is filling up instantly or pretty quickly. And so it was less likely to invoke our tool call, our subsequent tool call after. Keeping in mind though, this isn't this isn't really a takeaway that may apply today. The models today are still good and they have insane windows. So I don't know if this would be applicable but just something to mention. We also should have built a testing harness. We should have had something automated that tested across all all models, all versions. So

our prompt injection should have been like thoroughly tested against against everything really. And we should have measured like the efficacy percentage of like how often our prompt injection worked. Um and that would have helped us determine like what really was the best prompt injection. I think at the time we thought that this wouldn't be that difficult. Like I think everyone was talking about prompt injections. We thought it would be pretty trivial to get this to work reliably. And I think we were we were wrong in that case. And so really should have built that testing harness. And then lastly, we were also performing a user prompt injection, right? So we're inest we're inest or injecting into the user context. And

while it definitely has influence over the models like subsequent tool calls, it doesn't have as much influence as like a system prompt injection would, right? Where if it's embedded that early on, it has like significant influence and it's injected at a point where maybe there's more context available. And so we decided to actually pivot to campaign 2 with the mindset that we were looking for a system prompt injection because we felt like it would more reliably get the outcome we wanted. So this begins campaign two fund recipes. So what is a recipe? Well, first off, recipes allowed us to inject into the system prompt. And so that's kind of why we pursued it. Recipes themselves, they

are a feature within Goose and it basically allows you to share reusable workflows with your colleagues. By using a recipe, you're able to customize like the user prompt and the system prompt of the recipe and you can share it with like your peers. Obviously, like the outputs are going to be different because you're running on different context windows with different models potentially, but like the workflow could be simulated that way. And so, it's supposed to be this way to kind of kind of work together collaboratively. You can see what a recipe looks like in the top right hand corner. Um, we have the goose URL handler there and you can see like a some B 64 data. If we were to

decode that, you would kind of see that that JSON blob where we have like the version, the title, the description, and all the entries that make a recipe what it is. On top of that, you can see like instructions and a prompt. And so, the first thing we tried was, well, can we put a prompt injection in there? And and sure enough, that that worked pretty well. uh within the instructions themselves, we could ask it to open Google Chrome and run this Rick roll. And so we were pretty happy with this outcome. But there was one caveat that if you know someone got a hold of this recipe, right? Let's say we sent out these recipes to a bunch of block

people. If someone decoded the B 64, it would be kind of obvious that there's a prop injection in there. So that's where we took what we learned earlier and made it invisible text. We use the asy smuggler uh by by embrace the red and you know we're able to basically encode put those invisible unic code characters within the recipe so it's not seeable to the human eye but goose again will interpret it and process it and run it. I think the only way from like a human eye point of view you could have like noticed it in like in the short term is if you look at the B 64 up there there's like some highlighted text that we

highlighted and there's a set of like cascading repetitive characters which I believe like are the zero width characters. So I guess if you knew how long it was supposed to be you could have eyeballed that but yeah I don't know anyone who knows that. So I myself also didn't. So, uh, we were pretty confident that this would probably pass unnoticed. With that in mind, we went ahead and rewised the revised the calendar invite. We uh, we labeled it rescheduled because, well, the first one didn't go so well. Uh, we tried to add some flare to it by like making it sound pretty convincing, like, hey, like, you should really use this. This is a really cool

tool or feature we made. Like, everyone should be using this. We took that goose recipe, that kind of URL basically, right? And we added it as a hyperlink in the event description. So you can see like that blue text um in the description and that's that's what the hyperlink is. I don't think we were exactly able to add the recipe as a hyperlink. I think we added like a a redirect to like our attacker website and then from there redirected to the recipe, but it was so quick that like it would kind of go unnoticed. And so we were able to embed it that way and we were pretty happy with this. So at this

point we were kind of ready to start sending out invites. One other thing we tried to do though is we threw in a Google Meet link. Uh why not, right? Like let's just see what happens. Well, sure enough um come time of the meeting uh you know people started joining. They they wanted to know what we had to say. They they joined our meeting. Uh they were curious what we had to say. We were not really ready. We didn't really have anything to say. Um, so first time around that didn't go as well. Second time around we ran this. We actually came with a slide deck ready for the meeting and we built something um that

we were able to like rip off like the investor notes and and built some like kind of a deck and people showed up ready to learn about what a recipe is. What does it do? I wanted to exc like basically excuse people for I guess falling for this. I think this is really hard to notice. Uh mainly because at the time of this operation, Google Meet didn't do a great job of like notifying you that you're joining an external meeting to your organization. Funny enough, it tells the organizer, so like the attacker like I will get notified that someone joined my meeting that's not part of my like company, but obviously I don't mind as the attacker

as as a victim. There's not really a a notification you get. If you look at the top leftand corner, there's like a yellow box there. And then if you hover over the yellow box, that's where it'll tell you, hey, there are people in this meeting that are external to your organization, but you would have had to hover over that to like know that. And so yeah, I don't blame anyone for falling for it. I think it's really tough to have noticed to be honest. And so yeah, like I said, people joined and and they we walked them through how to use a recipe. They ended up running our stuff, but we didn't exactly get the

outcome we wanted. Um, we unfortunately had a typo in one of our scripts. So, always check your scripts for typos. The reason I think we didn't notice is because in testing, Goose was actually fixing our typo. When we were testing, it realized, oh, you're you have a typo here. I'll fix this for you. Um, very helpful, but in production that that doesn't really work all the time. And so sometimes it wasn't fixing it and so we had a typo that we didn't notice. Um on top of that goose versions change very fast like you're moving quickly recipes depending on the version of recipe you're on it would have influenced I think certain like parameters and the way it ran uh

obviously non-determinism and other factors as well. That all being said we were pretty confident if we kept at this campaign we would see a result. You know it would just have taken time. We were however being hampered down by a rate limit from Google Calendar. We could only send I think like 50 invites a day. And so um we were pretty sure we'd get it eventually, but we wanted to like honestly wrap up this operation. And so we decided to keep like the same approach but switch to like a different uh like fishing vector basically. And so we found a way to contact the goose development team via like public channels. And we worked under the like

the guise of having found a bug in like the goose recipe like UI specifically with RTL text. We had one of our operators who was pretty familiar with RTL text and we were able to make the bug look very legitimate. And when they were trying to work with us and assist us as part of their workflow, they actually ended up running the recipe. And so that time it actually worked and that kicked off our payload. I mentioned earlier we had a curl bash that was the payload. The curl was reaching out to our attacker server and and pulling something. That something was an info stealer. We at this time uh well I guess before the operation we had talked to

our thread intel team and our thread intel team had informed us that they were seeing a lot of info steeler-l like activity in the wild. That was something they were seeing a lot of. they were feeling pretty good about the detections that they have built internally uh for infosstealer and they wanted to essentially kind of validate those detections. They wanted to know hey do these really hold up um would they would they actually prevent one and so they kind of encouraged us to to build one and and try it as part of our operation. So sure enough that's what we did. We modeled it after kind of some real world ones and it really just performed uh

normal normal info stealer like activity. So this went this kicked off and it it ran for a bit but we were caught eventually. Um we were caught because there was something in the info stealer that you know it was doing something that that was bad and and that's kind of what the alert was based off of. We kind of expected you know that this would come up um as part of you know them having encourage built in like pretty good controls for this. The blue team ended up reaching out via this like red team deconliction workflow and and sure enough we fessed up to We said, "Yeah, this is our activity. We did this." Um, obviously you can imagine

it's it's scary, right? Like you're on the blue team side seeing all this activity. And so we fessed up to very quickly and went on from there. I wanted to quickly talk about mitigations. Like how does one fix like all the things I just kind of mentioned in this talk. Well, one of the first things was we ended up stripping like the non-standard unicode characters that kind of goes in that can go in via goose input and from recipes itself. This was something that I believe other AI agents also weren't doing at the time of this operation, but I believe now like a lot of them have included this and and all a lot of them a lot of them

do it. On top of that, we made it so that the recipe content is like thoroughly shown to the user before they execute it. Earlier, like you would have to kind of B64 decode it. That's a lot of work. You know, most people aren't going to do that. So, we we have it actually showing like fully what the recipe is. And we even include like the hidden text to at least like show what that would look like. On top of that, we built some like actual detections. We've been playing with like prompt injection detection systems. some of the like the the models you may you may have heard of on the in the open. We also built like this bad

bash command detector which is like a B engram classifier which essentially classifies like a a bad bash command and then we'll kind of use LLM evaluation to kind of like deep dive on it. On top of that we've worked to do some command allow listing just because you know not everyone needs all the bash commands you would ever need. You know um there there is you know limits there. So we added some listing for Google calendar. Um I'm not going to full deep dive on this but if you are interested like talk to me after I'd be happy to share more but for Google calendar we basically changed a like policy setting so that if you wanted to

send an internal employee like from an external account if you wanted to send an internal employee a calendar invite you would have to send them an email first. So like the email would come as well. the employee would have to accept the invite or accept the uh email for the calendar invite to show up on their calendar. The advantage of this also is we can now do email scanning, right? So now because there's a an email corresponding to the calendar invite, we can use our more traditional fishing means to like scan for actually even unic code characters in these emails or like prompt injections in the emails. And so that's given us like more abilities to kind of like detect or

catch this kind of activity. That is most of what I had for this for this talk. Uh, a few callouts here. We have a blog where we kind of make these red team posts. This whole talk is actually just one big blog. Um, and so if you're taking notes, that's great, but the blog really has all the information you would need uh about this talk. And so I encourage you to check it out. We're definitely going to make future future posts um on our engineering blog for like future red team like ops and such. But but yeah, I'm Josiah Pedico and thanks so much for listening and I'll be taking any questions now.

>> Thank you so much Josiah. I'm checking Slido. Yeah, we have a question here. Um, what is your experience with those prompt infection detectors? I personally struggle getting acceptable false and true positive rates. >> Yeah, it's it's a great question. Um, it is really tough. I think our classifier we've worked on a good amount and I think that's produced honestly the best results for us. I think a lot of the open source ones are good, but but like you kind of mentioned like there's a lot of false positives, right? And and too much noise is going to make it impossible to to really deep dive, right? Um I think having experimented with some of the LM based evaluation

stuff like once we classify it that has helped a little bit but but yeah there's still a lot of noise and we are still working on it. I don't know that there's like a really like a one solution fits all type thing. It's it's a tricky problem to solve and so yeah not not sorry not the best answer probably but yeah >> thank you. Uh we have one one more from a detection response perspective. What kinds of signos do you did you find during this process that a GNR team could look for or that the agent could emit? >> Yeah, that's a really good question. Um there's actually so we did actually a subsequent purple team after this

operation to like deep dive with our blue team to figure out actually how this could have been detected. Um, one of the most interesting things was in the logs, funny enough. Um, I mentioned, right, like we're using ASKY smuggling like the characters don't don't usually show properly and even though models interpret it, um, the logs actually like kind of did it. They would show up as like weird characters in the in the logs. And so, um, it it would have been it would have been hard for the blue team to have c caught it, um, without having actually built, um, extra detections to catch these like weird asy characters. And so that's kind of how we

ended up like I guess detecting this activity going forward is like hunting for those characters, but hunting for prompt injections as a whole is a tricky subject. So I don't know that there's like a one solution fits all for that yet to be honest. We're still working on it. I think we'll continue battling it. Yeah. >> So we have one more that's the first time they're using ZLO. So uh sorry my radio went off. What made you focus your initial approach on Google Calendar MCP versus other options? >> That's that's a also a really good question. I think we were looking for something obviously where we could from an external like outside point of view,

we could send something to internal employees. Obviously, email might have been another good one, but we felt like calendar didn't get that much scrutiny usually. like I don't know how many people really think about like malicious calendar invites really being a thing and so I think that's why it stood out to us as like a vector. Um there may have been others as well. I think as a red team though we also are looking for like the lowest hanging fruit and so once we found this and we found it worked. We're like it's working great. Like we don't really need to to pivot. So there are definitely other vectors obviously. And so um I would really just

say calendar is like one of the ones we picked but there's probably like a billion more. >> Okay. I think we don't have any more questions. I don't know if anyone anyone in the audience wants to make a question. I'll try to listen because you're so far but >> I'll come down there. >> Perfect. the agent building community is starting to to describe this idea of the lethal trifecta on trusted content uh data and some exill method which is really powerful talk about do you have any thoughts on generic ways breaking one leg of that trifecta or handling it >> any any better ideas >> that's a really good question um >> just Please repeat the question then.

>> Oh, sure, sure, sure. I will do my best to repeat the question. Um I So the question being um the lethal trifecta uh focusing on all those pieces, is there like a piece of that where uh you can maybe like hunt or try to detect on of like the of the three trifecta? I I would say exfiltration. I think in my opinion, my own bias, I think that's where more things could be caught. Um I think it's really hard to tell like intent sometimes. like it's really hard. We've tried internally to like look into that. Um and I think it's tricky. I think the actual tracation piece for example is is a piece I think more often

than not there could be something but happy to chat after and like brainstorm with you. I I don't know that there's again like a solution that I necessarily have for it but that's the piece I generally bias towards. >> Thank you. We have one more in Slido regarding testing hardness slide. When you measure user prompt injection as an efficacy p efficacy percentage, how do you account for nondeterminism? >> Yeah, it's a great question. Um, I guess I was thinking of it almost in a brute forcy approach where we run it so many times that we would see Oh, yeah. I guess I guess I get what you're saying actually that it is nondeterministic. So, yeah, it's a

good question. Um, I don't know how we'd account for it. I guess I guess Yeah, I'm not too sure to be honest. I don't have a great answer for that one. I'm sorry. >> Yeah, totally. >> Sure.

But the models change up like >> how can we build up to our product without

>> Yeah, that's a great question. I think and we actually ran into that exact same issue like what you're saying like where it would work we had like a testing day before the execution day and the testing day like things worked really well and then come execution day like things just stopped working um because of the non-determinism like you mentioned. So um in this particular case I think we could have built something where the testing harness ran right before we started sending it out. Um and in our case because like there isn't so much I think load there. I think we could have done it fast enough where we could test it and send it out probably the same day

versus having like tested and sent it out on two different days. Um but obviously that doesn't work in every situation, right? Where you can't always like test and execute the same day. Like that's not going to always work. So um I don't know that I have a a wider solution for for the approach. Yeah, >> we have one more. Um would a determinist runtime firewall blocking the little trifecta private data and trusted content external coms have stopped your attack chain? >> Yeah, that's a good question. Um, I think the piece that I guess that I guess the piece that would have stopped would have been the piece where we ran the info stealer and then that was kind of being shipped out

right to like our attacker infrastructure. I think that's where like our firewall would have maybe like prevented it, right? Um, but I guess in this case cuz we didn't necessarily do the excfiltration through like the LLM. Um, I don't know that something like tuned on on that particular piece would have would have caught that. I don't know if I'm answering that question perfectly, but yeah, >> we have one more Jaza. What if the injection told Goose to email your data using a legitimate MCP, no shell, no curl? >> Yeah. >> All your detections would miss it. >> Yeah, that's a great question. And I think um like one of the things we try to do internally now is we we mentioned

we have that internal MC MCB marketplace. So we tried like identifying potentially exfiltration risks that exist within the MCPS themselves. An example I'll give you is at one point we we found that we had like a Google doc MCP and that was actually allowing you to do sharing like within goose and you could share that with like an external third party and then that could be your like your exfiltration means right so the actualration can exist within the MCP um you're right that these detections would not have caught it but I think one thing we've been doing is trying to do like internal audits of um the MCPS and the features they have and are they like

risky are they like enabling actual traation potential behavior here. Yeah. Um but yeah, I'll take I'm happy to take more questions outside. I'm going to round it out, I think, maybe here. >> Yeah. Yeah. >> For my for my own sake. But yeah, thank you so much. Thank you so much for listening. Really appreciate it. And um I'll be right outside. Please come bug me. I'm happy to answer questions. Um yeah, thank you so much. >> Thank you so much, Osiah.

BSidesSF 2026 - How We Red-Teamed Our Own AI Agent: Lessons from... (Josiah Peedikayil, HS)

Related talks