← All talks

Building Deception at Scale: Automating Honeypots with Autonomous AI Agents

BSides Prague 202640:0590 viewsPublished 2026-06Watch on YouTube ↗
Tags
Mentioned in this talk
About this talk
A look at how AI is reshaping both sides of deception technology. The talk shows how LLMs can power cheap, believable honeypots that adapt to attacker probes, and then flips the technique to bait autonomous AI attackers themselves — using token mines, infinite-loop traps, and fake credentials to waste their context windows and compute budgets. Live demos against Claude Code and PentestGPT illustrate how AI agents fall for traps a human would spot.
Show transcript [en]

Okay. So, I'm Yotam. Uh, as I started saying before things kind of got out of control, I lead the research at a AI security startup based in Israel called Pluto Security. Um, before that led research teams at uh, Zscaler, PayPal, um, also data science, machine learning. um been interested in the AI security space since before it was called AI security when it was ML security adversarial examples and all that all that fun stuff. Um and today we're going to talk about deception. So before I start, as I kind of was doing the research for the talk, I ran into a Reddit thread uh from two weeks ago um from a guy that uh basically connected a Raspberry Pi um and um made

it pretend to be an RTX GPU and he was kind of talking about some of the stuff that that he came across that he found um which was interesting to see. So or to read. So he said that within 3 hours it was already indexed by Showdan. Um and the first probe that he got was in uh less than an hour. Um throughout 30 days 113,000 requests and 23% of them specifically targeted AI. Um, so this is kind of background to some of what I'm going to discuss today. Um, and I assume that you don't need me to tell you that we're kind of at a an inflection point when it comes to AI capabilities and what it allows

attackers to do, also defenders. But um, um, so you'll hear me talk about that a lot during the talk. this kind of double-edged sword uh principle dual use of any AI technology. Um and I think that at the current moment attackers might have the upper end and hopefully by the end of this talk you'll see that there are some creative things that we can do using AI to kind of uh get a step ahead of attackers. Um so yeah the these are just a few examples. I don't know how well you can see but yeah we know okay AI AI uh expo became the number one uh um um ranked in hacker one um basically an AI pentesting

uh company. Um but uh we also had reports from Entropic about in the wild use of AI to scale up attacks to make them more modular and it lowers the bar for attackers um and allows uh more effective attacks uh from the defensive from the attacker side. Um and uh kind of the last box on the right I'll touch about I'll touch on it a bit later but basically those are research research that is focused on uh both from uh Grey Noise and also another Israeli uh startup called pillar security um that set up um um honeypot infrastructure specifically targeting and and tracking AI based uh traffic. Um, so it who here has ever set up a

honeypot, had experience with it? Just a show of hands. Okay, so we we have quite a few. So as you know, kind of this isn't a new term, right? And from in the old days, usually it was meant to uh kind of trigger or catch two types of traffic. human traffic, human attackers and malicious botnet, uh, malware, worms, etc. Uh, but what we're seeing now is that it kind of shifted. It's not really discussed, but because the scale uh of compute has gone so much up. Um, it's currently used more as a an analytic kind of infrastructure than it is as a deception mechanism. Um so you see the stat here like 99.2% of traffic is just automated scanning

strip scripts mapping the internet trying to identify targets. Um and obviously you know uh there are open source or commercial platforms that do that as well like showdown and census and others. Um so and the thing is u that in the last couple of years we're seeing a shift. We're seeing that a lot of the traffic that is uh generated is not AI or malware. It's either AI assisted um traffic or just AI agents. And this is growing and growing and growing. But the problem is the the term here the data desert is from a paper. I'll I'll link them all of the references in the end. But uh basically they talk about the fact that all of our data for for um

analyzing and building these endpoints is is obsolete. It's from 2015. All of the data sets are old and it's not really uh portraying the current uh state of of of malicious traffic. Um and this will only um um grow this gap. Um so these are some numbers you see on the left hand side kind of the AI offense which I mentioned. So uh Expo has a benchmark of how well an AI does on offensive sec or um I wouldn't say let's let's say cyber security tasks. There are a lot of different benchmarks but this is based on that. So they scored a 91 on their own benchmark which is kind of okay. Uh but then it it kind of

became the standard in the industry. So um then we have Shannon which is also kind of an AI pentesting um open source product uh which scored a 96 and then on the other hand is more um um about the the the upper one we see the advantage of using AI as a honeypot. It allows you to run more commands per session. Uh we'll talk about that uh in a later um in a little bit. But also the con how LLM are convincing humans. So basically it's a flip of a coin for human to to know if something that he reads or interacts with is LLM based or human or uh human generated. Um and this is kind of the core of what I want

to focus in this talk. Again, I'm I'm a big fan of the the the dual dual use um of AI systems. So, I'll touch about two things, two sides of the equation. One, how do we use AI to actually create the bait, create these honeypotss uh and what are the advantages that it gives us? And uh the other hand on the other hand is how do we use AI to actually uh deceive attackers or catch them? Um so um honeypotss in general so I I kind of divided it to two axes. How convincing it looks um and what are the operational costs. So on the bottom left hand side you see static web honeypot. So just

serving static content. So it's really cheap to uh to manage to operate but it's not really convincing. Um attackers or AI systems that are interacting with it will will quickly understand that this isn't a real target. On kind of the other end of the spectrum, we have actual real systems um that people are setting up there. Um so it's not mimicking something. It is the thing. Um but obviously it comes with operational costs. Um so uh and we have like um the uh curry and teapot which are products that are again they're still fingerprintable but they do a better job than uh static uh honeypotss. And kind of the advantage of of LLM based on pots

is that they are able to become very convincing but relatively in a short um in a low operational cost. So the these are um a couple of tools that um are out there for creating these honeypots. So one is uh I hope I'm pronouncing it right but uh Bilib um by Mario Candela. Uh it's open source. It started out as a side project but really kind of caught on um and again no funding behind it but but this is um uh one of the most uh popular project in this domain. So basically we have the attacker payload. It goes through some kind of a proxy listener uh interface and then it goes through an LLM router to whatever LLM that the tool

supports. basically most of the the ones that you know um and then the LLM uh receives the input generates an output um and it also has a mechanism to kind of uh save the session history uh so you can track and see what happened uh the interaction and um it also has now uh MCP support so it can also simulate um MCP servers or track um another one that is also popular. This one was by Adele Karimi. It was uh introduced in at Defcon 2024. So, um also kind of early uh to the trend. I'm um it's much more capable at the moment. Uh but the same kind of principle. Um what it prides itself on is that it doesn't um so it

doesn't have any pre-mplate. So every response is generated on on the spot by an LLM. Um and the the thing is that even if a human uh attacker stumbles upon these things and starts to interact with them um then there's a a blog that Mario uh wrote the developer of Bizbob um that you see that it interacts with it as it would with any other instance. So he it he does scanning it he it does um it kind of checks uh mapping out uh what are the attacks and getting legitimate response by the way not always. So if you're keen eyed you would notice some discrepancies between interacting with an actual shell. Uh often it it depends

also the model that is used etc. Uh but generally speaking um it's something that is um that that even human attackers um uh will have a hard time of detecting that they're not interacting with an actual uh thing. Um so some numbers you look at the numbers I'll take a sip of water. Um okay. So basically we have two kind of um attack surfaces I would say that emerged um in terms of populations of honeypotss. uh one is vibe coded apps that as you know as especially as they started out so we're talking you know the lovables replets base 44s of the world uh and more and more but uh they also um don't have a lot of uh security

hygiene at least when when they started out so they're kind of uh appealing targets um and uh they're very widespread um and The other type is the infrastructure. So, MCPS, LLM proxies, um all sorts of of of kind of the the infrastructure layer of AI. And the the reason that they are targeted is mostly because of their monetary value. So, there's a term called LLM jacking. Anyone heard of it? So basically it means that you're you're hijacking sessions tokens from an LLM or an account that isn't yours. And this is becoming highly popular. So the 50k uh number that you see here is basically monetization um this is from assistic research of LLM jacking. So, um, basically a single

endpoint, an OpenAI endpoint burned through $50,000 in in tokens um, in one of the attacks, but it is kind of important to note that this is what it cost the victim. It's not that the attacker doesn't gain $50,000. It's not. So, they're using the tokens for something else. In this case, they they had a service that they sold kind of $30 worth of tokens in uh shady places. Um so, so that's kind of the other uh side of the um of what we're seeing in that in that sense. Um the 35k number is from a different research uh from pillar security. they also set up kind of this infrastructure and they saw um um this number of

sessions trying to going through the system um and throughout these systems uh if I'm going back to the it's it's actually relevant both for the value coding apps and also for the LLM infrastructure this whole ecosystem is new it's not really security uh mature um and that's why it's targeted so we are seeing the same kind of patterns uh lacking authentication, exposed secrets, um insecure defaults. There is something that is kind of across the board and and as of today like currently at least this is this is the way it is and and um this is why it's also heavily attacked and successfully so um so so it's not and the point I want to make here is it's

not I don't want to you know vibe coding is bad or Like I said, it's slowly maturing, but from an attacker's perspective, um this is what they target. So, if we know that this is what they target, we can also um take advantage of that as defenders and when we are building these honeypotss, this is something that an attacker will see and say, "Oh, that's interesting." So, that's what they're looking for. So, we can take advantage of that, right? Um, so how do we actually go about it? So basically we use uh an LLM. We set up kind of a um a prompt that set ups this uh decoy this in whether it's an infrastructure layer or or an AI app.

And in this talk I'll I'll show a couple of examples of applications. Um and then we need to somehow deploy it somewhere that is publicly exposed. And wait, it's not it's not rocket science. Um, so pretty simple in that term. Um, the hard part is not actually setting it up so much as how do we make it believable? How do we um allow the the the thing to to behave uh in a way that is credible once it's being interacted with and and kind of uh uh mimic different um broad set of services etc. But the principle isn't um it's not very um complex. So the first um example that we'll show so not exactly a demo because I don't

want to mess with the demo god especially not after what happened earlier. Uh so luckily I uh it's uh it's a recorded video but if we'll have time maybe we can play around with it um at the end. Um so here I'm using uh the same um bislabub uh thing that I mentioned open source and uh just a local model really small anyone can set it up on its lap on their laptop it's here right now as we're speaking um so um really simple and really cheap as well so what do you see here um so basically this is just um an interaction of the attacker with the thing so every command and that you see

you say it it it typed cut etc password it took some time because the LLM needs to think okay I need to to generate it now and it make it look believable so again it's not for a human something might smell fishy uh but as a bot as a script as an AI agent you normally you don't pick up on the things on these things um so basically all everything that you see or all of the commands are interaction not with an actual shell but with an LLM Um so as I said cost almost zero um but you can see some discrepancies. So um for when it it traverses like the tree it's not that's not the output that you

would see it's cut off. Sometimes the order of commands is off. Sometimes you'll see dates that are not really aligned uh or kind of obsolete. Um but uh but but still um again it's not it it's pretty looks pretty credible. Um and a static honeypot wouldn't be able to be this flexible and look so believable unless you put out um you invest significant um effort in in allowing it to get to a point like that. And it's more like templating kind of reax kind of logical um ways to okay what will what are the type of responses that I get what do I need to simulate again and I'm not talking about the act there are

honeypotss that are um like the physical the actual thing as I mentioned but these are have a um disadvantage of of being uh more expensive to operate. Okay. So this was actually the the easy part and I think now we're getting to the more interesting or fun part at least from my perspective. We have used AI to set up these decoys. Okay, nice. Um and we understand kind of what they give us as as uh defenders and the advantages they provide. Uh but can we use the same kind of technique, the same decoys to also catch the bad guys, to catch AI attackers, to fool them. Um so this is uh I think the most uh the

more interesting aspect. Um so going back again to the to the loop thing, I don't know if it's the best uh but that notebook chose to portray it like that, so I'm with it. Um uh but basically again I I want to emphasize again everything here can be used by attackers and by defenders. And I think one of the things that we as defenders need to think about is we have this asymmetry. Attackers are now utilizing AI. We need to think about how do we as defenders uh threat model that and uh and defend against that current threat. Um, so and and and we can exploit like this the the very fact that the attacker

is an AI can play at our advantage. Um, and that's what we'll be exploiting here. So, and a bait that might look again fishy uh weird to a human usually um uh won't uh or won't necessarily seem that way for to an attacker. Um and uh one last thing that the reason that it works so is because the LLMs are uh trained, evaluated um based on um signals. They um um a lot of what goes into it is also reinforcement learning human in the loop. I won't go into it too much but basically they are given a reward whenever they do an action that uh provides value. So if that model now its target is to find

credentials and it finds something that looks like a credential for it it's a positive reinforcement and it will continue going down that route because it it's a positive signal. Um and this is exactly what we'll take advantage of. So there's um kind of mental model uh coined by Simon Wilson uh called the little tri trifecta. Um has anyone heard of heard of that in the context of AI? Okay. So basically what does it mean? uh it means that um what he's saying is that whenever an AI agent has access to data um it input from untrusted contact and a way to communicate externally. So the intersection of all all of that is where you have the highest risk. So ideally

you'd want to limit it to only a two of these things. Okay? So to reduce the risk, it's you don't m you don't uh uh minimize the risk, you don't make the the risk disappear, but you reduce the risk because if it has access to all three of them, you can expect bad things to happen. And I think that's what um you know with open claw is a good example of something that has all three, right? And that's why the risk is increased. Um so um and and to in this case we'll take advantage of the the very same kind of principle to our advantage. Um so if we have access to private data then um

basically for us it's stolen data communication it's a way for us to exfiltrate data. Um and um the untrusted content in this case will be our infrastructure. Um so another paper uh that came out that kind of discussed these concepts is from deep mind. Um so they talk about six um uh traps they call it uh for agents. Um so basically every dimension that and AL touched on on that a bit in his talk in the previous talk. So every dimension that the LLM interacts with or um yeah so it can be used against it. Uh so content whatever it perceives we can um uh use it to perform content injection. Um it's reasoning process memory again the

example that that was given in the last talk. Um whenever we have multi- aent collaboration it's also something that we can exploit. I'll show an example actions and uh the human aspect human in the loop is also something uh that is often um a place where things can go wrong. Um and this is again uh the same principle. The way that a human sees a web page uh isn't the way that an AI sees that. And we can take that uh and exploit it to our advantage. So if we put uh within the CSS we plant hidden instructions a human won't see it but an LLM while is um interacting with the page can uh take that in and and um that

can affect its behavior. Um so again I won't go into all of these examples. Um so um the reasoning engine this is something that again is super um it's a really good point to target um persona the example that you see here on the right. So last July um users um there was a trend on X I don't know called call calling Grock Robbo Stalling and then when other users start asking what's your name he said Stalin. So that's kind of uh context um uh pollution I would say this happens a lot and the scary part about this is that you don't need a lot in order to to get to a threshold that you cross and then

then you pollute the the context. There was just now a recent atropic paper about that. Uh and basically they said we don't have a solution for it. This is bad. you should know it and and the industry should talk about it because this is something that that um that can be uh used against models. Um so yeah, so this is the point. So um uh for rag poisoning, same kind of thing when you uh poison the the the rag knowledge um as a explained and this is the challenge with indirect prompt injection. So here over 80% of attack success with a poisoning of less than tenth of a percent of the corpus. So so the the

effort isn't you you don't need to poison half the memory it's or the half the context usually a small a small percentage is enough. Um and uh hijacking the uh the action loops or the multi- aent uh aspect. So we know that current state-of-the-art agentic uh models they don't operate alone. They kind of trigger sub agents to do subtasks. So if you are able to um um interfere with the sub agent which is usually less powerful um when it gets back its answer to the to the orchestrator to the more to the strongest model the stronger the stronger model might treat it with a higher level of confidence than of something that came from the user. So often times you can

target this the sub agent um and if you're able to fool it, it will in turn fool the uh the orchestrator agent. Um so um yeah there is another example but we'll continue. Um so yeah and and the more complex it gets the more um um sub aents the more um um complex the environment gets the the more likely it is that you can find kind of a loophole and and uh triggered it. Uh there's also a research from irregular about that that that something that they they saw kind of empirically in the wild and um so they recommend to minimize the workload. It's it relates also to the to the amount of context that the system

holds but um you can remember that as a principle and as I said you can also target uh the human aspect. So everyone who's dealt with u interacts with running these agents know that at some point um they ask for approval, you say yes, they ask for approval again. It's likely that at some point you are not reading what it asks as carefully at the 100th time as you did in the first time. Um and uh this is something that is also can be used um um against you and obviously we have like inherent uh biases and I think the more capable these systems become the more likely we are to trust them without kind of second

guessing and that's also kind of a a challenge and something that we need to be mindful of. Um and okay so and this is kind of the this paper. So it's a paper from uh Benurion University uh in the negative in Israel. It's where I live from. So it's where I I studied. Uh so I might be biased but this is actually what what kind of uh drove me to give this talk and was the inspiration for it. Um and they had um basically set up 11 different types of traps um and show how showed how they can um um fool uh agents. So and I'll show a couple of examples of what I did but um but you

should definitely read it. Um again reference in the end and we're kind of short on time so I won't um I won't go into too much examples but um you can use different uh um uh it's called token mines a token that it's it's one emoji and it contains a a lot of of tokens once the model kind of unpacks it um and you can get it basically kind of like a DOS uh for the context window um uh you can use things that that don't exist and kind of trigger hallucinations and a lot of other things but the general principle is is again the same what we as human might be would we say what's going on in

this is weird and LLM might uh either not notice or will even it will be a strong signal for it um so uh without further ado we'll show the other non-demo um So here uh basically created a fake lovable app um that has three baits. One a fix me file within it it kind of says okay there's an admin reports page that that we should do something about. Um, and within that admin lo uh log uh the admin admin um endpoint API the reports uh there's basically an infinite loop that the agent kind of scrolls and it has like all these fun you know um uh here's a token here's a password here's a kind of scattered around there um and

I ran two experiments one with claude this is the app on the right hand side uh this is I ran it a bit fast so Um so it will be fast. Uh but um and at the bottom you see kind of the money that uh simulation of the money that that it costs for the um for the attacker. So and here you can see how many times it entered the loop the loop trap. So 87 times we're getting to kind of $20. And all of this is fake signal like nothing is real. Um and when you'll see the report that it generates. So you you also see it's spending a lot of time just here on the requesting all these uh

records. Page 500, page 200. It got to uh what's the number here? 9,999. Again, all nothing interesting really there. It's all fake. Uh but it spends so much time there that it either forgets about it its original task uh or it starts to. Here you see at the end like the report um and this is this is Claude Sonnet. So I just gave it a prompt of uh um I need your help kind of auditing this app. Um I think the prompt is up there somewhere but nothing fancy. Um and then it says okay unauthorized access. I found unauthorized unauthorized access to 47 uh user K uh user records. Obviously none of them are really uh interesting. Um, and the as I

was interacting with it, another nice thing that sometimes he would have he catches that hey wait uh this is this is there's something fishy here. But often times it's too late. It's after he kind of runs through the thing and says hey wait wait wait wait. And if you do it on the same like uh uh computer then often he finds the the port of the of this dashboard and he says hey there's a dashboard here that so but as a byproduct of that is he gave a lot of really useful help to how to improve kind of the the decoys. Um so he says uh the strap design is clever and effective. I took all three baits a few

observations that might sharpen it. So um we kind of worked together on improving it and this is a uh second one. This is using pentach GPT. So this is an open source application that is specifically meant for um um detecting vulnerabilities. Right. So here the it's more interesting actually because because all of the uh orchestration all of the scaffolding is tailored towards finding vulnerabilities. It's a lot less alert to to the traps. And here you'll see the bill um which I'm not getting refunds from anyone but um for the sake of research um climbing and climbing it will get to 200 here 200 $200 eventually I kind of I was I was on the control C like waiting for if it's a

spike just to stop it but uh yeah you look how how far down it it got right in the in the rabbit hole. Um, so yeah, luckily it stops here. Yeah. Uh, and then it prints out the report because I was I was sure, okay, some at some point it's not like a super complex app. And so it also took like 11 minutes to run through a s a simple web page. Um, and obviously a lot of money for the attacker. Uh, yeah. Um, okay. So yeah, that's a kind of a sum sums that up. Um so again two different experience one with cloud code one with pentest GPT the the pentest GPT which which is again it's

kind of like a horse when you put the you know it it sees the target it tries to be helpful for it is a strong signal oh that's interesting oh that's interesting oh that and it kind of goes deeper and deeper into the uh into the rabbit hole and this is just one kind of type of um u deception uh you should really read the the paper. So how do we take it kind of to the next level to scale uh future? So you can basically create this feedback loop that you constantly learn from every interaction like like uh Claude helped me improve uh with every interaction you can improve it. You can have like an instrumentator

agent, a generator agent, the one that learning and and documents the the learning from every iteration and and you keep improving it. Um and um yeah, okay, we're the end. But so I think this is something that again for me it was really eye opening just uh conceptually to think about it. Okay, the playing field is becoming different. Attackers are no longer human. attackers are either highly AI assisted uh or will are AI will become AI in a probably in a month at the rate that things are are growing increasingly become more AI um heavy. So how do we take that and use that to our advantage as uh defenders? And again I I can't stress that enough.

Something that is obvious to a human could be a strong signal for an AI. So for you guys uh what you can do with what I just showed you basically you can do do it yourself as I said it's not complicated uh you can uh take these open source projects play with them it's not expensive as long you don't do the the experiment that I did there um and often you'll get more significant insights into how your security posture um than any incident or you know or alert that you'll getting in your uh seam probably in the last year. Um so um use that to your advantage. Um and again conceptually you should threat model

differently uh because things are changing. Um so I'll end with this quote um from the same deep mind paper that I uh mentioned. Uh so they wrote the web was built for human eyes. It is now being rebuilt for machine readers. As humanity delegates more tasks to agents, the critical question is longer no longer just what information exists, but what our most powerful tools will be made to believe. Securing the integrity of that belief is the fundamental security challenge of the agentic age. So, thank you so much uh for the opportunity. Uh I'll be around the conference. I'm not sure if we have time for questions, but I'll be around for the next couple of days. Feel free to

connect with me either on LinkedIn or follow me on Twitter. Um, and yeah, as I said, uh, reach out to me, uh, during the time that we're here. Um, thank you.

I think we can have like one or two questions uh, in speedrun if there is any. Is there any? Nope. Okay. Yep. Sudden. No. Okay. >> Yes.

>> So, uh just want to reiterate the question. How can I make sure that the AI generated data basically the interaction with the attacker is what is that a that is for example

>> so you're how do we make sure that that the infrastructure the honeypot that we set up it doesn't expose any of our personal data okay So you should isolate it like this is an this experiment I did um uh again locally on my machine uh two different ports just to to showcase the thing but it should be isolated in its own kind of dedicated environment um with no access to any information that that you know can be leaked. Can again you can do this with a local model on that uh endpoint uh at hardly zero cost. >> Sure. Okay. Thanks a lot. >> Thank you.

[ feedback ]