← All talks

AI: Best Janitor or Worst Superhero?

BSidesSF · 202434:5794 viewsPublished 2024-07Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
AI: Best Janitor or Worst Superhero? Adrian Sanabria Emerging technology follows a common trend: we glimpse what it *could* be in the future, and overlook the less exciting success it could have *right now*. Why? Hype gets funded. Feasible ideas don’t. We should use AI to solve mundane problems, not critical ones. I'll explain why and how. https://bsidessf2024.sched.com/event/c7e278166b1b754ca9eec0c09906ce8b
Show transcript [en]

so we have a next session with Adrien who is going to talk about AI worst superhero or best janitor Adrian and uh during the talk if you have any questions please post it to slido or wait until the end of the session uh if you have time we'll go through it but over to you thank you thank you who here saw the keynote this morning yeah from Caleb yeah yeah so I I watched that I was wondering if if uh this was going to be the point to that and I I think uh some of it's going to be I think we agree on some things uh but as you can tell from the title this

is the whole thing you know like you know you you can leave now if you get what I'm saying with the title here basically by over pitching what AI can do you can do damage to the technology to what what is a useful technology clearly um tons of people are using it to how how many people are using generative AI on a daily basis today weekly basis are there more hands yeah yeah so clearly it's a useful technology and I think that's one of the key things here is that um we can reach out and use it which has not been the case in the past you know when we've been pitched AI on things uh you know machine learning

models things like that not nearly as accessible nobody built a chatbot front end for these things uh that that made it quite as accessible as this is today we didn't have tons of Open Source you know large language models you can run off your uh discrete GPU laptop so for this I actually used I used a lot of AI to build decks at least to get the ideasa started and go and this is an example not only the slid but the images here were AI generated and you can probably tell if you've seen AI generated images they kind of pop out as AI generated um I didn't ask it to put cats in there I have no idea why the

cats are in there and I could get it to do it exactly what I wanted which is kind of a theme with generative AI at this point like it gets like 80% of the way there but the title is worst s superhero or best janitor which was part of the prompt for this image but there's no like worst and best like the like the superhero was doing janitorial stuff like the superhero was carrying a basket I I couldn't I couldn't get it to get it right and that's kind of a theme uh with with generative AI at this point so and maybe maybe the AI just doesn't want to show the AI doing a bad job at

stuff maybe that's uh one of the issues that I ran into so if you don't know me you can Google me uh find a bunch of stuff but I do a podcast uh I run bsides Knoxville uh it's our 10th year this year uh you might have noticed on the back of the booklet we're like the third down from the top so it's coming up real soon May 24th if you're anywhere near East Tennessee it'll be a good time and tickets are available and faculty over Dian and uh that's pretty much me right now so as we we've established uh it can be useful you know but we keep you know seeing these pitches hearing these

pitches uh trying to elevate it to the point where it's uh uh you know the way I'm thinking of it is it's it can save time but you know it can't it can't do Miracles right you know it can't uh go quite as far as some people are promising and I think to understand why this happens in security and and this is just AI happens to be the big thing now uh if you've come to RSA or or besides in the past you know you've seen the buzzwords come and go and uh I think Jay Jacobs Wade uh Baker and Alex Pinto did a talk a few years ago at RSA where they took all the titles from all the RSA RSA

events ever from from all the talks and kind of compiled and showed the the trend of buzzwords over time and so we're going to talk a little bit about why that happens uh from a marketing uh and a market perspective and uh what generative AI is actually good at what we should be uh pushing for it uh to to use it for and so this is this is um I think most people in here are going to agree with most of this you know so kind of the point of this is to give you some argument points some talking points uh you know an idea of uh H how we can be good stewards of the technology you know

without getting it uh you I think the real danger here is somebody promises too much at an organization and it just gets cut out you know like the execs don't want to see it you know it failed at that big project that one time we're not going to use it ever again and uh turns out this has happened before and um and yes so this is basic Bally the cycle and we're going to get into the cycle a little bit uh so VC's like moonshots you that's kind of the whole thing behind VC funding is we're going to fund a bunch of stuff hoping that that one of them is a moonshot that you know brings back some big returns and uh

a lot of overhyped tech fails you know to the point where we're all probably burned out hearing about the claims none of us believe them anyway and uh and then useful Tech gets shunn and we'll get into some examples here um so a more not a cyber security specific example uh but Watson was a big example of this you know swung hard and uh and missed with a lot of it uh especially in healthc care uh but pretty much everything Watson did and they're trying to bring Bring It Back Now with llms uh they're trying to rehash the Watson the Watson name so this is a this is a great example of that and uh I actually gave uh an early

version of this talk at a uh at a very small internal uh offensive security uh Summit uh uh conference and uh learned from the keynote speaker there that this has actually happened before with AI it's happened not once but twice uh ai's been over promised so much that apparently back in the 1980s and 1990s uh the best way to get your your grant money denied was to mention Ai and now here we are uh last year I was working for a company called veilance security we were one of the 10 companies in an innovation sandbox and we actually got dink for being the only company not mentioning AI you know so things kind of come around full circle here and um and

you know this time is a bit different and we're going to talk about why so the reason hype happens uh especially today where you know trying to do outbound sales and marketing uh email is burned you know nobody nobody wants to respond to an email you know good good luck trying to sell something over email or even over LinkedIn these days you know people kind of move as they burn one medium and people are just bombarded with stuff to another and they really feel like they've got to sell 1,000 to be able to sell something that can do 100 right with the idea that yeah 1,000 you know we'll get to 1,000 in two or three years it'll be on the road map

but just to get somebody's attention you know the the general thought process you know behind your marketing and sales folus FKS is that you have to overpromise uh on on what you're going to what you're going to sell them and uh you know generally you this is the path we see it take uh where it really happens fast with a lot of these new categories that pop up uh so other Industries you'll think see things go CD EFG um and you'll see that in our industry as well but the Acquisitions really start around uh seed even sometimes these days uh occasionally you'll see companies that haven't even come out stealth and already they're getting Acquired and and there's some problems

with that which we'll also get into so this is a general process uh that we'll see uh a market a new uh a market go through so not just an individual company but uh but an entire Market will kind of create this Vision you'll see one company you know maybe have this idea start it you'll see a bunch of I I don't want to call them copycats cuz often times the best companies aren't the very first ones to enter the market uh you know but other people will iterate in the same idea or do something similar to that same idea and then uh buold promises you know once you get the the budget you can really kind of

plaster those promises everywhere you know the kind of stuff that you you've seen when you arrived in the airport uh just walking around San Francisco you it's on the side of cars it's it's you know this advertising is everywhere and then um in other markets what happens is we get this managed disillusion disillusionment stage here where if you look at the Gartner hype cycle this is the the the big crash after like the height of all the hype uh and often times it doesn't happen in security and that that that's that problem I was talking about earlier which if we move to the next slide here if you get acquired before we go through that

difficult part of okay you know what can this technology do what can we Salvage uh from all the promises we made you know can we Salvage something useful and the answer is often yes but if we never make it to those stages if all the Acquisitions in that market occur you know often times uh you'll just see markets disappear you know there's no story there's no like I remember them getting acquired by semantic but then poof nothing like the product just wasn't available a year after that for some reason or was never available after the acquisition you know was it a problem with due diligence you know who knows but often times uh ELC product

that that fundamentally don't even work make it pretty far through this stage and get Acquired and here's some examples where useful technology gets overhyped and it really hurts that market hurts that technology and uh and we see it get reinvented years later you know so application control at one point was promising the end of malware you know we're going to create a static list of all the stuff that can execute and then malware won't work right you know and and as often we see with these cases uh you know the the idea technically yes that works you could do that but uh that that's just not how businesses work you know your devs are going to be

installing new stuff on a daily basis you can't manage that uh static list of of application control you know what's allowed and not not allowed to to execute on devices and same thing with micro segmentation scaling that really tough problem uh but we still see companies going after some of these and you know uh either they find a niche where this technology makes sense you know when you know the promise was this is going to be used enterprise-wide then with Knack we saw it used for guest networks you know it found a nice niche there and in some other places um and you know maybe devices that don't change often but um but this kind of cycle we

see this happen over and over and over and usually it's one of these three things it's usually the amount of Labor put on the buyer is too much uh this is we've seen Sim reinvented how many times now uh because uh somebody forgot to mention you you need Sim developers you need uh you know people who can build parsers custom parsers and maintain those parsers you know you need people who can build reports oh and by the way when log stopped coming into the Sim it's not a feature of the Sim that'll let you know that the logs stop stopped coming in so you got to build that yourself uh I've been through this a few

times myself and uh scalability yeah it's a good idea but you scale it to a company of 100,000 people you know or you know a couple thousand devices all of a sudden it doesn't work well or uh complexity itself you know just uh in the lab it worked great we released it into the real world and there were just a thousand edge cases that you know just just made it completely unusable so usually one of those three things so what's different here I'll have to keep an eye on time because there's a lot to it is a bit of a rant there's a lot to say here what's different here is with a lot of these

Technologies particularly in security they haven't been available to really sus out and test like this has you know open AI decided to just thrust this in front of consumers for free and then they added on the $20 a month option for the for the better model and some additional features and you know that that's really the test right you know G give me access to it let me use it fremium products open- Source Products stand the test of time because you can just grab it and use it if an open source project is popular if it's got a bunch of stars if there's a bunch of people using it it's because it works right and on the on the commercial side

of things uh you know it's it's really a guess you know it's a proof of concept maybe you've talk to the the company for two months before you even get to the proof of concept and you know before you even get to to that moment where does this even work where you find that out where you learn that and it's very late in the stage it's very expensive to get to that point so yeah generative AI is different because of that there's so much open source stuff out there the the metal Lama stuff uh that I I think we will see this stick around in some form doesn't matter how overhyped it gets uh it's

going to be here to stay in some form for good I think uh and mostly because there's so much open source stuff out there already so many people are building stuff sharing what they're building that momentum is already there it's not going away so strengths and weaknesses I'm not going to spend a lot of time on this you've probably seen this three or four times today I don't know how many of these slides you've seen Caleb if you saw the keynote did a great job of of talking through what it's good at what it's not good at oftentimes I sum it up as as math and facts it's not great at you know anything that's deterministic

where there there's a right or a wrong answer not great uh but we've got solutions for that we can give it fine-tuning data we can give it grounding data we can do give it uh system prompts uh there's a lot of different features now uh Rags you know to to prevent it from hallucinating so already in this technology that's the other interesting thing here is how quickly it improves you know things we were making fun of we were laughing at uh 12 months ago is mostly a solve problem with a lot of this stuff you know and of course there's still problems today and you know maybe 6 months from now 8 months from now uh you

know there'll be some answers for that as well so just have some examples here where it can nail stuff like you know it's two 12 in uh 12in pizzas more or less than one 18inch pizza this is something hard for the human brain to imagine and answer this question right like unless you can just do all this math in your head and I just said it was bad at math but one of the ways they solved this problem is now chat GPT and some of the other El M out there will write a python script because they're good at writing code but they're bad at math so somewhere in the system prompts it's telling it hey uh your chat GPT and

your bad at math don't attempt to answer any math questions instead you should use Python import the math library and write a script to do it because to an llm a script is just another language which is why it's so easy it TR it it's so good at translation like it doesn't care if you use five different languages in your prompt question it it understands the question you know cuz it's all just language um yes so it can do that and lots of useful stuff where you're asking it to summarize things you're asking it to put together text and this is kind of you these are the things you have to understand before using it to really get

any kind of value out of it you know a lot of times I where I'm seeing this fail especially with Microsoft co-pilot for example is where there's not a good understanding of what it's good at and what it's not good at what it can be trusted with and what it can't be trusted with uh so somebody will try something and they'll be disappointed they'll try something else and they'll be even more disappointed you know and this quickly goes down that rabbit hole of this technolog is trash you know so introducing it into security we we have to understand this really really well and there's no replacement for especially because it's so accessible trying it out yourself you know like

I've got apps on my phone you know it's it's pinned in my browser you I'm trying these things out almost every day often times I'll take a prompt and I'll plug it into Claud chat jpt and co-pilot uh even though co-pilot Microsoft co-pilots using open AI libraries I get completely different answers from it because they've implemented it differently they've got different system prompts different fine-tuning data in there maybe it's connected to my my Microsoft graph data um just to see you know what's the difference between these models like like how do I get the answer who gives me the best uh answer for this particular use case and the results can be vastly different even you know in in

the case of uh chat jpt and co-pilot when they're using the same models behind the scenes and again with this you know I asked a bunch of individuals who knew the Sim Market really well who' you know been in security for 25 plus years uh to do this you know answer the same question they answered it the same way they broke it down into into three uh erors and they described each era the same way you know so stuff like this really good at again code just another language to it you know write me python code that isolates an ec2 instance you know you're doing some kind of uh putting together some sore uh workflows really good at

these kinds of use cases does a great job at it write me a job description for detection engineer I remember I was uh on a on a uh a call a zoom call it was an i Zoom call with somebody and we were chatting and in the chat he was like yeah sure it could write a job description but not something really specific like a detection engineer I was like well let's try it let's see you know and right there you know during you know in the zoom chat you know I go and try it and I take the result and paste it back in wow okay no it actually does a pretty good job at that there's some

really specific stuff that it can do I've seen it create config files for uh early stage Sim startups with custom figs and and you know like their their own query language and stuff like that but somewhere in its training data and and this is even without the ability to go use Bing or you know uh search the internet uh it's consumed enough data on that consumed their their docs. company.com or whatever and it it can kick out uh a good config you know something you can work with yeah another another example here you know but then as as soon as we got to the visual stuff you know can you visualize the difference you know so

this is the same prompt from earlier two 12-in pieces by the way two 12 inch pieces is more than a 18 inch if you if if uh if you didn't see the the output there um but yeah it just got really had a really really hard time with that and and again you've got to try this stuff to know like I I see a lot of people out there saying yeah but it's not going to be good at this or it'll be great at that but until you typee in that prompt and look for the result and this is a crazy thing with this stuff I can type in the same prompt 5 minutes apart in the same tool get two different

answers like they won't be completely different answers uh but I've had cases where it's like I don't know and then five minutes later it does know you know so very weird be I don't think we've had any technology that works quite like this um yeah so using offline stuff we're going to see a lot of that coming out Microsoft and Intel have a part parnership where they're working to have a version of co-pilot that's isolated to your machine that can work air gapped uh you can do this today you can go to hugging face you can go to uh there's a ton of tools out there where you just download a model on your own laptop and

you're not using a public service or anything and you can you can do all this locally but they tend to be not as good quality at least for what I can run on my laptop uh you know the beefier models take a lot more horsepower to run uh you know so of course that's going to be completely hallucinated I think one version of this said I worked for the NSA which never happened um but even the best version of this still gets this wrong because this is a hard problem for humans you know if you do a search on my name you'll probably find places where my bio says that I worked for teni security you know because it's you know

those web pages haven't changed you know even though I changed jobs and uh that's no longer true like getting that timeline is really tricky really the only place you can know for sure is to uh go to LinkedIn that's probably the only place you can know for sure and it doesn't have access to that and for whatever reason co-pilot even though Microsoft owns it also doesn't have access to LinkedIn so like there's certain bits of data that it can't get to here uh that required is still have a human in the loop to double check this so Rick Ferguson uh if you don't know him he's been in the industry a long time uh took a took a picture said

guess where I'm at I was like okay this would be a fun example you know to throw an llm at since we've got multimodal llms where I can have it spit out a picture I can give it a picture um you know there there's all kinds of different ways you can interact with them you can give them an Excel spreadsheet a PDF you know so I asked it where was this where was this picture taken you know let's play this uh uh Geo challenge here and uh said hey without any specific landmarks or identif uh identifiable signs it's a impos it's not possible to determine the exact location just from the image so I replied it's a

photo of a landmark you just said you needed a landmark that's a landmark that that's I don't know how else to describe that that is a sculpture uh that that is a pretty unique Landmark like you don't see those in most cities you don't see giant bowling pins and bowling balls and so I told it that and it said h of course you're right here you know here's where it is here's the sculptor that made it here's all the information you asked for in your first question so again this this weird technology like I don't know how we use this in any kind of automated manner you know and you see this in people's prompts where it's almost like

they're stroking the ai's ego to get it to give quality answers it's the weirdest thing you know we've never had technology before that needed to be told it was smart and pretty to give you the the right answer but now we have one and I'm not not even joking that is what you will see in a lot of people's prompts uh and they'll say things like I'll lose my job if you get this wrong like like really cranking up the pressure and it does give you better quality results it's crazy um so this quote came from casy Ellis but he couldn't remember who that who told him this so I you know all quotes are attributed to Mark Twain so I

thought I'd just attribute it to to Mark Twain but I think this is again this is something that kind of hurts the technology we see a lot of people like you know AI looks really bad when you ask it to do things that it's bad at and a lot of people laugh at it you know but still very useful technology um but yeah unfortunate it gets a gets a bad WP people who really know how to use it are getting a lot of great use out of it and again it's not solving any problems it's just making people more productive you know so again to use the janitor superhero thing like like it's it's not

performing Miracles it's just saving time apparently that's not sexy enough sorry and my other point here is when you point it at something high stakes and it fails that's a good way to get it burned so instead we've got so much low stake stuff in security uh that's worth pointing it at and it's all things that people hate to do and we're going to get into some examples here and and this is this is really kind of the Crux of the talk this is what I'm trying to persuade you about is at least now when we we're still trying to understand these things the models are changing week to week uh you know best to point it at low stake

stuff that's not going to get it burned and and a lot of it's boring stuff and these are some general examples uh you know I think customer service is a no-brainer here but there's other problems that we don't have time to talk about uh there but I I do have an example on later um doing anything with text you know understanding text summarizing text uh great at this kind of stuff um something I'm really curious to see it do is you know uh the ability to pump a peap into it which is something we're we're seeing a lot of llms that are uh purpose-built you know so there there's some purpose-built uh llms out there now for finding vulnerability

uh for reversing binaries into source code into functional source code that can be then uh you know recompiled into the same uh identical original binary so we're seeing a lot of custom llms that could be very interesting here but just stuff like you know filling out a a questionnaire like nobody cares if you get some of that wrong like the people you're pointing at that kind of work are just copying and pasting from the previous questionnaires anyway which is exactly what an llm would do if you just here's my last 300 security questionnaires you know because the new one you get in is always worded a little different you know you can't just uh you

can't do it programmatically necessarily uh but in llm would be great at that job nobody wants to do that job nobody wants to have two Excel spreadsheets side by side and copying pasting between them um I mean if you do you know uh maybe you don't want to publicly share it but you can disagree with me on that maybe it's somebody's happy place but it's not mine um so yeah here's some examples and actually AI came up with all these examples all on its own this is this slide is entirely AI generated and it did a good enough job that I just left it I'm not sure about the password resets as much but I just left it I was

like that that's a pretty decent job that was entirely generated so yeah the the service uh that is the no-brainer use case here you know pointed at your knowledge base pointed at your support scripts you know we should no one should ever have to wait on hold again you know but then there are issues with it as you know Air Canada has has kindly pointed out for us um uh basically somebody talked to it and it promised uh it created a refund policy that didn't exist and Air Canada didn't want to pay the refund and they took him to court and the court said NOP the policy the AI made up sticks so here's some of what I'm

hearing from Enterprises uh um there's a lot of misunderstanding about how it works a lot of people think whatever they type into an AI prompt is somehow going to end up back into training model that somehow someone could one day get that same data back out that it could leak that way I'm not going to say it's impossible but much more likely that you know the fact that Microsoft built all this stuff overnight you know bugs and software bugs and SAS is much more likely the way that your your data is going to get leaked if you use this stuff um yeah so a lot of people are really piloting the co-pilots uh so to speak

these days uh not seeing a lot of people going going real heavy on it on just general use cases uh I'm running out of time here so funny joke not going to have time to read it and then this is uh if you don't if you only take a picture one slide here's where I've kind of uh put together some guidance on how to in the tldr this is you know like I said have it on your phone have have a pin tab you know always be asking yourself you know this thing I'm doing right now that sucks is this something that uh you know this technology can do for me because it really is the little things and and I've

had these little moments of Joy over and over where I'm like no it couldn't could it and then I plug it in there and it saves Me 2 or 3 hours of time when it's already 1:00 in the morning and I've got to get something done for the next day and to me that is kind of like a

superhero entirely hallucinated he he never said this I I searched high and low he never said this but I left it in because it's hilarious and that's it

nailed it look at that on the dot yeah uh any I heard no heckling I was promised heckling there was no heckling during that yeah up there all right we can take a few questions go back to the joke go back to the joke okay AI company we trained this dog to talk it doesn't actually understand language but it kind of sounds like it's having a conversation by mimicking the sound of human speech CEO awesome I've fired my entire staff how quickly can it start diagnosing medical disorders all right do we have question all right excuse me he's running the mic I can't see so uh my question is basically if we're going to start using this for

stuff that you know sort of mundane and start relying on it a bit like you said from day to day it can be or even minute to minute it can you have a different response so how concerned are you that that is going to sort of become unreliable as time goes on and maybe things start to do great in quality yeah it's a hard question it's it's um so I mean it's it's it's part of QA for this stuff right you got to think about how you QA it um again it's got to be something it can't be something that's uh deterministic it can't be something that has a right or wrong answer it has to be something where it's

going to be good enough like answering a question in a third party risk management questionnaire or some kind of vendor questionnaire um but you still have to keep an eye on that stuff at least spot check that stuff and make sure it's it's not just going off off the edge so yeah I we're going to have a hum we're going to have to have a human in the loop doing quality assurance checks you know you got to have some kind of quality assurance process um from now until you know until it gets more reliable as I looked at your technologies that kind of failed I saw Sim as an example um Adam Vincent has started a new thing

with some DARPA funding trying to make Sim output more human readable um that seems like one of the areas where AI could actually possibly fix a broken are there other places you think that AI could do that yeah actually actually one of the popular ones that I've I've heard and seen is there's so many tools out there you know Velociraptor uh elastic you know they each have their own query language osquery and you got to have like a cheat sheet pinned up on your cubicle you know to understand and write the query language like all that is no longer a problem uh potentially with this like you can have the llm write that query language for you and now you

don't have to learn 14 different query languages because none of your tools speak the same query language so that that's that's a big one also I think hi what was the most interesting application or thing that you've created using um you know um AI so every Tuesday we do trivia uh at a bar and there's always a uh a booze clue that gets posted to Twitter and there's a uh uh this day in history question and so I use AI uh to pull up a bunch of examples of interesting things that happen on that day in history and uh text that to everybody on the trivia team all right I'm sorry if that's disappointing but that I was excited

about that it saves me a lot it literally saves me like 90 minutes every week yeah um I think we're above time do you want to like take it offline and ask a question if you don't mind heck away mentioned that you said it doesn't get case the Mondays but I think some testing has shown that answers are worse when it thinks it's winter time and if you tell it what yes answers are worse in the winter time how does it know it's winter is it checking a clock with every but if you tell it's ju it's July a good job so research has shown that that more specific longer prompts uh saying you're going to lose

your job your children are going to die like like whatever you know apparently the season needs to be in there too your prompts going to be even longer uh it's it's also summertime everybody's having a great time you're very smart and pretty it gets you more consistent answers there's another great one like that it the year is 2200 the cop onal whoa well that's a jailbreak yeah that that's that's more of a jailbreak right that's hilarious though that's great all right awesome uh thank you very much Adrien uh appreciate the talk