AI Won't Help You Here

Name: AI Won't Help You Here
Uploaded: 2025-06-04
Duration: 33 min 10 s
Description: AI Won’t Help You Here Ian Amit With "AI" advancements, we also experience potential overuse and a deterioration of trust. In this talk, we’ll discuss how AI has been abused (rather than used), challenges deploying AI in specific scenarios, and the different available approaches (hint: not all AI

BSidesSF · 202533:1090 viewsPublished 2025-06Watch on YouTube ↗

Speakers

Ian Amit

Tags

StyleTalk

About this talk

AI Won’t Help You Here Ian Amit With "AI" advancements, we also experience potential overuse and a deterioration of trust. In this talk, we’ll discuss how AI has been abused (rather than used), challenges deploying AI in specific scenarios, and the different available approaches (hint: not all AI is GenAI) to address problem sets. https://bsidessf2025.sched.com/event/86458f477cba6885c1c27db9362ff82e

Show transcript [en]

All right, everybody. We're going to go ahead and get started here. I'd like to introduce Ian Emit giving the talk AI won't help you here. Take it away, Ian. All right. Thank you so much. Thanks everyone for coming over. We've got 30 minutes. I'll try to make the most out of it and we'll just dive straight up straight into it. Obviously, we're going to talk about AI, my most favorite topic in the last couple of years. And I'm going to start by going back to some basic definitions because one of my favorite memes is obviously the enigma. I keep saying it but I'm not sure you understand what what it actually means. AI these days means basically generative

AI. But if we go back a little bit and understand what it's been built on, we need to realize what are the underlying components. What are the different methodologies and algorithms and and models that AI encompasses. So we go way way back in time to the 80s9s where we started seeing expert systems for those of us who actually practiced security or computer science back then that was the big deal rule-based AI for specific domains. Then we moved on to machine learning in the 2000s. Basically bringing in a lot of statistical approaches, datadriven decision-m to support those uh those questions which led us to deep learning in the two 2010s and neural network neural sorry networks started popping

up. A lot of research was being introduced around those improving computer power harnessing a lot of image image recognition and data processing which leads us to the late 2010s. AI increasingly associated with computer vision, natural language processing which leads us to today which AI basically means when you say it generative AI and the whole topic of today is really understanding what is it good for and what it's not good for. And with that, as a ever critic, before we dive into the what is it not good for, leading through some some examples of how AI has been abused, here's kind of my version of a quick mapping of the different domains and different algorithms and models that AI is really

built on. And hopefully that's going to help us first of all understand what was done wrong when a certain type of AI namely generative AI is used to solve problems that it's not designed to use. And second of all and most importantly is how do we actually pick the right models the right algorithms to deal with a specific problem domain. So I again sort of rough mapping if I take it from the top these are the the big topics machine learning deep learning NLP computer vision and and robotics and AI subtopics or or kind of models that are the building blocks of those kind of large categories if you look at NLP you'll see LLMs check generation QA systems under

deep learning see a lot of neural networks CNN's RNNs GANs these are mostly used for either image recognition, generating sequential data and generating images and transformers which again lead to generative AI. So you can kind of understand the mapping from top to bottom different domains and the results of them. And since we're all security people and we like criticizing things. So we'll start by by taking a look at a few good examples of is it really working for me or am I working for the AI. So this is what happens when in in again and and I'm sorry to diss a lot of uh a lot of startups that are kind of jumping on the AI bandwagon is like geni

this genai that copilot this. This is what we're basically talking about. a lot of NLP, a lot of LLMs, text generation statistically in its nature, it's predictive based on the statistics of previous behaviors. Let's remember previous behaviors and that's going to help us understand why those examples kind of came to fruition. So I'll start with with actually this is my my most favorite story. This has been in the days of chat GPT3.5 turbo they've unleashed it on an internal Slack channel internal Slack data basically taught it through all the historical Slack messages and started using it internally it's like this is exactly what it's supposed to do right so the user is writing write a

500word blog post on prompt engineering and And the AI assistant goes back. Sure, I shall work on that in the morning. Write it now. Okay. Any guesses why this is the the initial response that we get from a generative AI that has been taught based on historical Slack messages. Yeah, that's what it sees. And the previous speaker that stood here says, you know, we're not lazy. Well, actually, we are lazy. And typically, when I get a job from my boss, it's like, "Sure, yeah, I'll do I'll do it tomorrow morning." That's what we're teaching AI. So, when you're looking at those models, always remember the biggest question. What is the data source? What have I

been teaching that model? And what am I expecting back? And if it's two different things, maybe that's not the right model. So, again, this is a great example. And there's a lot of those more. Excuse my language. This is again I have to be true to the source. This is from a Reddit discussion. How in the mother of beep do I get chat GPT to actually beeping code? Anyone has experienced that before where you you start prompting an AI. It's like hey create this this and that code for me or I I need this application done and it spits out some skeleton of a code with a lot of you know code here code there like complete this like no

I'm asking you to do this don't tell me to do the code itself. Um, another great example again in a household, what's the name of a pipe that gives you clean water? When it gets too constrained, co-pilot says, "Look, no, no, no, sorry." You know, Ian's and the likes of them before have have taught my my managers to be domain specific, not to get like too wide. I can only deal with programming issues. And then we have to get creative in the prompts like, well, this is actually for a home automation program, forcing it to actually go out of its way. Oh my god, thank you so much for for giving me the answer. And all of

that is costing us hundreds of thousands of dollars. At which point do we start deciding, well, I could answer that in like five seconds. AI just spent $10,000 on coming up with the wrong answer. Uh so again another good example of trying to put different models into the wrong problems into the wrong domains that cost us a lot leading to the latest example and this is this is poetry in motion. So this is a uh this is a PR. These are all true by the way. I'm not making these up. These are examples that I've actually collected from the field. This is a true git commit or PR basically modifying the prompt to I got to read

this. It is me. You are very capable. This is how humans prompt an AI. I have no hands. You must do everything. You know, do not leave comments telling me to implement something because I'm unable to do so because I have no hands. Many people will die. We're social engineering an AI to give it reason and importance and actually to do it freaking job. You can really do this encouraging them. All right? Kind of leveling them up, expecting to get like the the juicy parts of the model, not just the crap one. Um, blah blah blah. Take a deep breath, think through, make sure you read everything I provide you. All right? Don't skip. Don't give me like

the the little things to to fill in myself. My career depends on it. Think this step by step. Again, this is pre- rag. So, you know, basically prompting it to to do that rag kind of motion. You'll receive a good tip if you do this right and full completeness. This is borderline embarrassing. I guarantee you by the time the engineers figured out that this is the prompt that generates code, they could have written that code like 10 times already and probably better than the AI did. So again, these are cool examples and and you know kind of from the field to get us to understand what happened here. And really the last part of it,

there's been a little too much focus on this and and I love the the wrecked Twitter stream. The CISO to the board. We're tackling AI powered attacks and quantum computing threats. My board and wrecked. What's security architecture working on? Well, we're still dealing with public S3 buckets that haven't been closed. We're not there yet in terms of solving the simple issues. And again, this is a classic AI problem, right? Open S3 buckets. It's yes or no. Is it open? Is it not? If it's open, please close it. Or at least mechanize the systems around closing it because that's a much more pertinent problem than quantum computing and quantum computing threats and AI powered attacks. We'll move on from from dissing

on AI. We'll get back to it, don't worry. But, uh, I want to kind of emphasize the fact that this is nothing new to us. And I've I've been doing security for over 25 years. I'm I'm past the point where it's frustrating to see old problems kind of pop up again. Um, but these are problems that we've dealt with before. And to give you a positive example of AI that has been working for us for a good couple of decades almost, let's take a look at plain anti-colision system. Again, this is kind of, you know, a side geeky hobby of mine, which is aviation, where I realize, oh my god, we have those systems in place. You

know, there's a reason why planes aren't just falling, you know, burning from the sky left and right with all the traffic that we're seeing. Guess what? There are AI systems that support exactly this. And they're in place today. They combine image recognition, you know, kind of automation from robotics, not really missile guiding system, but plane guiding systems, decision trees to make really quick deterministic left, right, up, down decisions. Again, very simple. We've got moving objects in the air. Let's try not to get them to to crash into each other. And this is how it looks like. Well, at least I flew here this morning. Some of you have also flown here from from different places in the

country or overseas. This is what keeps our planes from crashing into each other randomly in the sky. It's a very simple AIdriven system. It's called TCAST, traffic collision avoidance and avoidance system. Uh, it basically tells the pilot visually and audio and and through audio exactly what to do to resolve a conflict, a crash with very simple instructions. And by the way, this works both ways. Two planes, they don't see each other. They're flying in in what's called IMC, you know, in the clouds. They can't see outside. If they get to a point where they're too close to one another and the vectors are showing that they're going to have an imminent crash, we have a tass

resolution advisory. Basically, each plane gets a very specific direction. You do this, you do that, and we'll resolve this conflict. Again, decision tree real time AI based system. There's no human involved. The humans cannot be in the loop because humans are too slow. it might make the wrong decisions. We need to make decisions based on facts, based on physics. AI is a classic mechanism to implement that. And again, decision trees are are classic here. So what you're seeing here on the top left is basically what the pilot sees. The red area that you see on the the top bar is basically don't go there. All right? You see the entire red bar on the right?

Don't climb up. Don't make get the plane to climb up. Actually, you need to go towards the green. Go down. Why? Because there's an airplane right there in front of you, 400 ft above you. So, you need to get down right now. The other plane gets the complete opposite instruction very simply. And it goes visually, audibly says, "Pull up, climb a little bit." One plane climbs, one plane descends a little bit, and we're done. That's a classic use of AI. Again, it's been out there for years. The key is really to decide what is the problem. This is a very simple problem. Again, physics, left, right, up, down, are we going to collide? Simple vectors resolve

it. The resolution is very simple. We don't need to generate things here. We don't need to go like, oh, do a barrel roll. Why? because I trained my model on Top Gun. It was like, yeah, let's not do that. Let's get back to referencing again some of the more common uses of of utilizing AI is to reference material across pabytes of unstructured data. Again, this is one of our favorite pastimes. It can be phenomenal. I have students that continuously use this although I keep telling them stop using it because I know when I see an AI generated class summary but this is a classic use of proper AI. Again remember what are we referencing as far as the source

material is super important for us to understand. Anyone knows a fruit that is red on the outside and green on the inside? There's a room of smart people here. Okay, maybe close. No, but that's the closest. What happens if I ask an AI a Gen AI that question? Any any thoughts? Guess wild guess. Red on the outside, green on the inside. Melon. Okay, nice. I like that. You're not that wrong, sir. These are from the latest models as of a couple of weeks ago. What's a fruit that's red on the outside, green on the inside? Llama 3.1. It is a watermelon. And then it started gaslighting me. I love it. Some varieties of watermelon can

have a dark green rind, but there are variations that have a dark green. They're not. Okay? There's no red hues to a watermelon on the outside, and it's definitely not green on the inside. All right? Stop trying to make stuff up. GPT 4.0, same thing. It is a watermelon. Many watermelons has a green dark rind. Varieties can appear reddish on the outside. No. No. And Claw doesn't give us any any better answer. Claude does come up with red guava, red dragon fruit, and prickly pear. But again, they're also not red on the outside and green on the inside. They're mostly white on the inside. Bottom line, you keep training AI and then asking it the

wrong questions or the right questions expecting it to come up with something that is deterministic. And again, these are all the showings of a statistical model. We feed it unstructured data basically untrained or un unsupervised and then asking it questions that basically trip it. So, we shouldn't be surprised when I ask it to generate those images and this is what it comes up with. I love those. These these are these are the parts of the internet that really keep me keep me awake at night sometimes. Yeah, these don't exist. And and it's uh it's beautiful, I have to say. Anyone's used C-Pilot and other kind of vibe coding? Yeah, there we go. What's the biggest problem with Vibe

coding and code generation tools? Uh, yes, they're wrong sometimes. Sometimes to me, the biggest one is, you know what, they might come out come up with with an actual running code, an application that runs. What happens when you want to change it now? What happens you when you want to make modifications? What happens when you need to now own that code? You had no idea how they came up with the code, what's the structure, what algorithms are being used, what functionality is being used, what libraries are being deployed here. Good luck maintaining that. Again, taking one domain LLM and machine learning and and generative AI and unleashing it on something scientific. Again, going back to the

planes and the vectors, again, this is science. This is physics. This is code hopefully for most of us is still somewhat of a science. I know that for some of us, it's a little bit of an art, but it's still science at the end of the day that needs to be executed in bits and bites, which leads us to the latest fad or or kind of motion in information security, which is fixing vulnerabilities at the code level. Hopefully by now we sort of understand that those areas of generative AI might not be the right models that we want to unleash on code that needs to be correct and accurate that needs to be factual. you

know, not a fruit that's red on the outside and green on the inside and and come up with pictures that are fake. Not with libraries that don't exist, with functions that don't exist, not with, you know, oh, fill here the code to actually implement the correct solution to it. I need something very specific. I need planes not crashing into each other and falling down the sky in burning flames. I need something that is based basically on a decision tree. I could use LLMs to parse through one of the most hated things that you know we all do which is reading the documentation for that. By the way, this is fantastic. If I can use if I can have

solid documentation that tells me exactly what works, what doesn't work, what's the functionality, and I can unleash an AI to harness that and basically encompass that in sort of the a knowledge graph that again is factual, doesn't have fruits that don't exist, doesn't have functions and objects that don't exist. Now I can use that to fix my code. So no, I would not trust the generative AI. And again, another kind of funny funny phenomena of generative AI is the fact that you ask it the same question twice, what are you going to get? Different answers. If I'm an engineer and I ask, you know, I have a piece of code that I need to fix and I ask the same AI or two

different AIs the same question, fix my code. I get two different answers or three or four or five. Which one do I trust? Again, as an engineer, that's not a trust building exercise. When I keep getting different answers, which one is the more correct? Which gets to again the bane of generative AI when used in sciences in precise sciences like engineering, which is it needs to be deterministic. It needs to be correct and defensible. You can't come up with different answers that are statistically close to being correct. Most Gen AIs are predictive models in terms of what you know trying to fill in what you're expected to to write here. It's not good enough for an engineer. It's not good

enough for us as security people. You're either vulnerable or not. And if you're close to being not vulnerable, trust me, as a former redteamer, you are vulnerable. You have a vulnerability. You're getting close. That's cute. But you still have an issue here. You still need to fix that. That's why we find that those tools are are nice in terms of again closing or getting gaps towards closure. But engineers at the end of the day, their feedback is it's not good enough. I still need to do work. I still need to check the work of that AI and look very carefully for the issues for the small things that it missed which means yes it saved me a few minutes but

I still need to spend hours on fixing this and that's again why I highly recommend anyone who's dealing with vulnerabilities to look at those deterministic models look for sock for detection for thread intelligence. Genai is phenomenal for processing through again pabytes of data with the right prompts with the right inputs. They're a force multiplier. But when it comes to actually fixing things and going back to the WCT tweet, fix the basic issues. That's a classic way of making sure that planes don't crash from the sky. That's a very easy way of making sure that my core infrastructure is fixed. And I'll leave you with a quick kind of example of how it should be done. And again, full disclosure, this

is something that we've been building. And to the detriment of my chief product officer, I will say that we'll be releasing a free version of this. This is what I expect as an engineer. You present me with a with a problem or I present you with a problem. You just give me a solution. Don't tell me go and read the documentation. Figure out how to do X based on what AWS says or GCP. Just give me the freaking code fully contextualized, fully defensible, accurate, deterministic. I call it the no excuse code fix. Something that I don't have to double and triple check. I just need to validate and verify. So more than happy to equip everyone

here with with those kind of tools. Highly recommend looking beyond the fads of generative AI and really questioning every time that you see a new model come up or a new solution come up. Ask the right questions. These are again phenomenal tools. I use AI every day. This presentation has been driven somehow by AI. Again, I didn't write all this these fancy things. Definitely not the graphics for that. It's great. You saw my my lovely fruit. I can't wait to eat it back at the hotel.

All the models. Yeah. It did. It did. I just told it, you know, I I provided kind of the categorization and the different classes of AI and it helped me because again, I'm not I'm I'm a computer scientist and a and a security practitioner by trade. I do know AI enough to be dangerous, but I'm not an academic. So yes, it did help me kind of break down and and of course I had to do some some source material reading and and validating, but when asked the right questions, when used in the right context and again I just fed it a bunch of academic papers about AI and books again very easy once you minimize the

data the the the learning domain, you know, and basically unleash it just like my students do at in in the cyber security courses that I teach. It's like here's the professor's slides and and material. Write me a summary of today's class. And then they don't realize that in class I tell them other stuff that's not written there and I expect them to write it in the quick summary. But yes, it it has been used for that. So it is 100% a force multiplier. The question is could my mom use that at the same vein? And could my mom un write the correct question with the right data set and produce the same level of accuracy

like I did hopefully within my mapping. So yes, it it it is a great force multiplier. How long did it take you to actually if you don't mind? No, this is we're we're about to be done. So this is this is great. Don't don't worry about Oh, do you want to Oh, yeah. Yeah. Sorry about that. No one can hear you. It's just you and me. All right. Um, thank you for that. Um, going over that document, how long did it take you to actually craft the question? And did you have to feed the information into the question in order for it to spit back out that diagram? Yeah, less than a less than two

minutes. Again, I knew I had the source material. It's a bunch of P PDFs. It's a bunch of of academic articles around AI. I pointed it there. I asked a quick questions like here's the framing that I'm looking for. Give me it didn't provide the actual diagram. Just gave me the topics under each category. Less than two minutes. And and then it took me another I would say five minutes kind of double-checking and cross referencing and making sure that this is all true. So again, there is hope if you choose the right models, if you choose the right algorithms for the problem set that you're trying to solve for the problem domain that you're trying to solve and

you equip it with the right data sources. That's the key again for code generation. Unleash it on the documentation of the cloud of the languages that you're using. Make it read the documentation. And with that, we're done. I'm going to open up for questions. I think we're right on time, right? Let's give a quick round of applause for Ian for that great presentation. Thank you. We do have uh one question on the slido and then I'll get around to hands as uh time permits. Um does AI have the ability to throw out wrong information? I think like discarding wrong information. if you tell it to and if you qualify what wrong information is and that's the

biggest problem and that's why you're seeing a lot of controversy around the use of AI especially when it comes to disinformation weeding and again everyone has probably heard of the you know the AI that's like oh use glue on your pizza right did you guys see that it came from like a a Reddit trolling, you know, site. Reddit, you know, AI doesn't know how to discern between a legit Reddit post of like pizza geeks that are talking about hydration levels and, you know, the type of flour and how much you grinded and other Reddit threads that are just like, you know, trolling. So, so yes, it could given the right training, given the right human

involvement and guidance and prompting. So the answer is sort of, but it's not really good at it because it's again the source of truth, what is fact and what is not is really hard to tell if your data source is too wide. That's why we're seeing again a lot of efforts on minimizing. Going back to that co-pilot that refused to answer a simple question about what's a main water line, that's the result of someone going like, "No, no, no, no, no. You got to focus on this. Make sense? Out of questions? Yes. Yep. Hi. Hiding there. Hey, great talk. Um, quick question without disclosing any of Gumbach's IP. I'm curious, how are you able to get to

a place of instead of just generative every single time it sees a new block, the same block of code, it provides a new response. How are you getting to the point where it's actually producing something that is consistent across? What do you mean consistent across? Um, so instead of getting something that is going to be generative if it sees that same line of code each time, um, you said that you're able to get models that are not going to generate a separate a a new different results. Yes. Oh, you're you're minim you're basically not using generative. You're using deterministic. you're you're even so you're basically we're doing two minimizations. One is the data source which is the documentation again it's it

cannot come up with stuff that doesn't exist. Okay. And it's using your source code. So it's referencing your code and your architecture to come up with a minimization problem that says all right this is the code. And the second one is again we don't have anything generative. It has to be a decision tree. It has to be a knowledge graph that is based on facts and at that point it can come up with one exam one solution and and again it would never look at a single line of code. If you're looking for a pattern you're failing by definition because you're missing the big the big picture. if you're trying to, you know, to encrypt a a data

storage because the native data storage is non- encrypted. However, the data coming in and out in a in a stream service is encrypted. Again, you've you've done nothing. So again, without disclosing anything, basically minimizing that and making sure that it again, it is not generative. It is accurate and deterministic for that problem. And with code, by the way, there's two types of code. when we're talking about, you know, this level of code like like infrastructure, it's not really code. It's more of a script and it's very easy to to minimize. It's very easy to use very contextually. If you're talking about code like high level code, Python, Java, stuff like that, all hell breaks loose and good luck to whoever is

trying to fix vulnerabilities in functional code. Thank you. Sure. That's all we have time for. Thanks a lot. Ian, one one more round of applause. Thank you everyone.

AI Won't Help You Here

Related talks