AI Security: No hype. Just hacks

Name: AI Security: No hype. Just hacks
Uploaded: 2025-01-26
Duration: 44 min 55 s
Description: BSidesDFW 2024 Track 2 Session 4 - 02 Nov 2024 AI Security: No hype. Just hacks AI here, AI there, AI Everywhere. Who is using AI in your organization and how? Where is your information going and how is it being used? What can you do about it? Don't go in unarmed. Learn about the latest resources a

BSides Dallas/Fort Worth44:5572 viewsPublished 2025-01Watch on YouTube ↗

About this talk

BSidesDFW 2024 Track 2 Session 4 - 02 Nov 2024 AI Security: No hype. Just hacks AI here, AI there, AI Everywhere. Who is using AI in your organization and how? Where is your information going and how is it being used? What can you do about it? Don't go in unarmed. Learn about the latest resources and techniques used to attack and defend Artificial Intelligence in all its forms (yes, GenAI isn't the only thing out there). @pestopublic Cowboy hacker emeritus of Ninja Networks since 1995. I'm also a 20 year veteran infosec professional, a father, and a husband. I'm 214 native born in Deep Ellum and a graduate of Penn State and UNT. I study eastern and western philosophy and mandarin. I've worked in AI security for four years. MoT.

Show transcript [en]

all right everybody so speak uh pesto is gonna be giv a talk about AI security no HX he's gonna be talking about AI where it is in your organization and who and how it's being used I'm GNA keep it brief yeah you bet appreciate that thank you very much uh thank you thank you everyone for coming it's so good to see you um I am super excited to be here um I'm going to briefly introduce myself and then we'll kind of go over you know a high level agenda what we're going toh talk today but uh thanks to you know UTA and thanks so much to bsides for putting this on and uh again thanks for

everybody sitting I I appreciate you coming to see me that work it did so a little bit about me um a lot of folks I've seen around here probably got started in the same way I did which was back in the uh IRC um in the mid to late 90s um I was extremely fortunate you know I met a lot of good friends uh who call themselves ninja networks and I learned a lot from them um maybe uh familiar name some of the more senior Defcon heads here uh they were uh big back in you know 20 years ago or something they did a lot of real cool stuff at Defcon if you're ever uh interested in that stuff uh head to

ninjas dorg and check it out there's some pretty cool stuff um and you know I've been a professional in information security guy uh ever since uh started on firewalls did a lot of Linux and Unix security lockdowns things like that um I made a change about 10 years ago and uh switched to doing security analytics for Insider threat hunting um about maybe six or seven years ago I did a bsides talk on um uh ethics in Insider threat monitoring um and uh about I don't know four four or five years ago I went back to school I got a degree in analytics and I started learning about Ai and pivoted then to AI security so I've been doing

um AI security for about four years now since about around um 2020 so I think you'll find a lot of people going forward I suspect there's going to be two different groups there's going to be groups that have an AI background that are getting into Security in this space and then there's going to be people like me who came from a more traditional information security well I guess not traditional anymore not traditional is probably you get a you know degree or whatever but you know come from uh information security space getting into um AI so that's kind of where I'm at I love talking beide it's my third year back when I talked about ethics I did a

talk a few years ago on AI security and uh this is my third one so uh really happy to be here uh my son's a student here at UTA he might be in the audience now but I'm not going to point him out and embarrass him because we have a Q&A section and he might get revenge so all right all right so my inspiration for this talk was uh recently in the past couple of years there have been a lot of AI security classes and I thought finally I've been doing this on my own reading white papers doing it the hard way I'm going to take some of these great classes that are out there and um I took

one from a uh what's my quote here a well-known security certification organization right I took one from UD to me and I've taken a couple from some vendors and they are the worst classes I've ever taken they are really bad they do not teach you anything it's like they all read the same like top three Google results and just like made two-day classes and workshops out of it um so I challenged myself to do a high level and real world AI security talk um that if any of you you know go to these classes you can at least you know kind of be armed or have a good foundation to sniff out some of the garbage they're slinging

and there's stuff you can do in your network or organization you know just starting today um so we'll start out talking about threat modeling and we'll do a highlevel uh threat model for AI we'll talk about how AI threat modeling is different from traditional threat modeling and then we're going to take a break and take some time to refesh refresh and reflect and uh during this time I don't want to turn this whole talk into a vendetta against a certain uh large and well-known security certification organization charging $1,500 for a class and getting a fundamental question completely incorrect but I do want everyone here to be aware of a common misunderstanding regarding a particularly uh thorny

problem uh concerning AI I'm going to leave it uh mystery for now I'm going to reveal it a little bit later it'll only take a couple of minutes and it'll provide a good segue to our next section give our brains a little exercise have a little fun and then we'll wrap it up by talking about some of the tools and techniques uh that we can use to test that means attack um Ai and mitigate some of the risks that we've identified in our threat model uh we do have a Q&A portion at the end but feel free just jump in anytime you don't have to raise your hand just yell at me um you're not

being rude um we got to use that time whether we use it in the middle at the end it doesn't matter to me so just jump on in if you have any comments or questions all right all right we're going to do a warmup little threat model before we start uh talking about AI um I'm sure many of you are familiar with this if any of you were at uh the talk on threat modeling earlier we're going to try to cover some you know little different aspects of it so we're not rehashing stuff but um so here is an exact diagram of how everything on the internet works uh you know not really you know it's

just a very barebones high level uh diagram but I think maybe most of the important bits are here or at least you know enough for for our conversation uh we're going to spend just a couple of minutes doing a quick uh threat model for for a a make believe application we've got you know a web or an app server here connect it to the internet um connect it to a database on the back end where it's reading and writing and then you know users are coming in from the internet to the uh front end so a threat model if you haven't heard of one is a tool that we use to understand what security requirements

are needed for a project for an application for an organization really for anything in our in our case we're just going to pretend that we're turning up a new app and we want to do a threat model so uh we draw a diagram a very crappy one in my case I'm legitimately bad at drawing diagrams it is not a necessary skill for threat modeling don't let that deter you just uh just go in I mean uh Brendan this this morning talked about he seeing threat models on the back of napkins it's absolutely true you can do it anywhere it doesn't matter um so we got we got a diagram we're going to start adding threats like right

on top of the diagram we're just going to brainstorm and think of things right and we're just going to stick them right on the diagram we're not going to do that here I've got three because you guys are going to be way too good at it um so looking at these threats right once we've got the threats all down we assign mitigations to each one right so that we can correctly evaluate the risk all right let's get started it's worth noting you know kind of hisor we used to demarcate you know lines of zones of trust untrust semi- trust red yellow green whatever you call them rails since zero trust you know came in this may be effective for your

organization or it may not I don't really do it anymore but I drew an example Ju Just in case uh we demarcated zones of trust here so we have the red indicating the internet I don't if you can tell but there's orange on the back there to show that basically this is saying we don't trust anything past the red and uh in front of the Orange is a little bit shady and then back here where the database is we kind of want that green zone if that makes

sense so we've got the diagram we got all the zones we're going to stop and talk about stride for just a second right uh stride is a meth method that we can use to do uh threat modeling it was developed by Microsoft in the 90s um it was maybe the only good thing developed by Microsoft in the 90s I don't know but um you can read the acronym there on the side um the idea is that all the threats that we're going to identify in our threat model is going to fit into one of these categories on the right it's going to be a spoofing tampering repudiation information disclosure denal service or escalation of privilege um you write the

threats as they come to you and then you kind of fit them into here you don't really take the spoofing and say spoofing here spoofing here SP you can do whatever you want but I don't think that would be very effective um you just basically think of how we can attack this and then you kind of list it in that stride you can do a table or a list or however all right so I've got a thread actor over here if you couldn't tell that's that's a thread actor that's what I look like when I'm at work um and for the purposes of you know this talk we're going to just pick like uh three

you know threats and uh we're going to imagine this is what I do when I threat model that either I don't know what mitigations are already in place or I just pretend it has no mitigations in place so um there's a chicken in the egg thing here be like oh well you can deny a service but you already know know your network whether you have you know big honken routers there or if you have you know a SAS provider to protect you from dos I still throw it on there and then I list those mitigations at the last step your mileage may vary all right the first one we're going to start with we got a tampering threat here uh on the

database from a SQL injection attack a classic a good oldfashioned SQL injection attack right so we're going to pop that right there on our threat model again if you don't have a diagram you can just literally just write it or whatever we got a Dos on the front end you know you never know what's going on there and for good measure all right we're going to throw an information disclosure threat from someone sniffing unencrypted traffic unencrypted requests three pretty simple uh models um are threats there's you know of course a lot more um but I'm not going to go into the details of threat modeling that's just kind of an example of what we do now that we've enumerated the threats

right uh we need to look at what mitigations are either in place or what mitigations we kind of bring to the table in order to kind of lower the risk of some of these threats actually being uh realized um a very thorough threat model will help you determine like I said those security requirements that you're going to need so for example to mitigate this um information disclosure threat you can require that use TLS or mtls some kind of encryption standard to mitigate that threat once you've got that mitigated right um it doesn't go you know go away but the risk from it kind of is lower at that point because you can list that as a mitigation

there's other mitigations that you can choose but that's just an example now a highle threat model right is going to be better for informing basically the risk management process and the amount of detail that you're going to go into it it just depends on things like the sensitivity of the data uh the um the visibility of the project how much resources you have but in you know my experience uh for for big projects we kind of do it high level and if you're having you know something like where you're turning up an app you would probably go into a little bit more detail but honestly differs from you know whatever your needs are for that

threat model all right so now we know what threat model is we know how it's used um let's talk about our AI threat model now um we're just going to focus on threats to the model so big disclaimer here uh we're not going to focus on the environment that you need to build in order to support your model and it's not because it's not important it's just because that's more what we just talked about all that stuff still exists when you deploy AI that threat model doesn't go away the stuff we're going to talk about now is the stuff you have to do in addition just to for the care and feeding to protect the model

itself um and in in a pool development you have rag databases right you have a whole mlops uh pipeline but we're just you know going to pretend that that's already taken care of sorry all right so what kind of attacks do we have against AI models this diagram these diagrams now there's two uh are from nist um this to me is absolutely amazing I'm I'm not saying it's the most confusing uh oh I'm I GNA go forward most confusing diagram in history there have been others right um but we're not going to talk about that we're not going to go into all of these threats in detail either there are just too many of them um but we will talk about

mitigations uh that we can use to manage the risk from all of these attacks later we're just going to kind of cherry-pick a few of these attacks that are kind of representative of the rest at least from a mitigation standpoint so I think you'll still get a pretty good idea even though we can't go into you know each one in detail so on the left this is a diagram I'm actually pretty familiar with right this is a taxonomy it says of attacks on predictive AI systems right it calls them predictive you'll also hear the term discriminative what that means is you're not talking about gen you're not talking about um a generative AI the generative AI or

geni is on the right usually when most of us hear AI we think about now the stuff that's here on the right okay we think of chat GPT we think of mid Journey we think of whatever they're using to make music I don't know but uh anytime you're creating something new new text new art new music that's a generative model and that those are on the right um on the left here are the predictive AI systems so we don't hear as much about about these these days but they're super important this is like computer vision for your car right when you do that you don't create anything new but you still really want to be able

to tell a stop sign from like a speed limit 55 m per hour sign that's categorization it's done by prediction so a predictive model isn't wrong it's basically predicting that that's going to be a stop sign um so it's not wrong but just a different term discriminative you know whatever all right that out of the way let's take a quick look at a diagram hopefully it'll come up there it is look at that again exactly how AI is deployed this is a um bad but um hopefully sufficient AI model diagram note the training data on the bottom left okay um training data is used to create the model it doesn't really connect to it your model doesn't really like reach out

to the training data every time someone queries or anything that would just be a database pool but it is important to our threat model so I kind of just put an There To Remind people um that that training data is going to come into play in our threat model but it's not actually you know live on the network I know it's a bit confusing but we needed our diagram and it's a good reminder of like that special relationship between the training data uh and the model now for our demonstration we're going to keep it dead simple just like we did with our Ed web app um we'll have users connecting from the internet to query

the model um again it's not what real architecture looks like um I've extracted the the most important bits I think for easier presentation but uh everything we talk about will still apply you know everything we we do in the threat model it's still going to apply no matter how complicated your mlops delivery is it doesn't matter I just kind of you know scaled it down for the talk so I didn't have to talk about every little piece all right we got a threat actor we do here he he's back and we're going to start a threat model all right so first we're going to add an evasion attack to the model um an evasion attack allows attack to uh

submit specially crafted input right in order to get a desired output you can think of this one example is if you were using um a model to predict whether someone's credit score you know is good enough to give them a credit card for example right you can understand that if I had terrible credit but still wanted a credit card it would be beneficial I could benefit right from tricking the model into giving me an approval even though I don't really uh Merit one perhaps um that's a example of an evasion attack so you're manipulating expected input into what's called an adversarial input that's what the the hack is that's where the you know where you would put the the buffer overflow if

this was you know C code or whatever that's the adversarial input right uh the model can be tricks using uh into giving basically incorrect output and there are tools that we're going to talk about later that can help you create adversarial input or adversarial examples so that you can try it out with models um that you can download you know uh off the net um what else there are tool okay yeah um so for um evasion some of these attacks are um white box attacks right you actually need access like an elevated access to the model you can't just be like well I'm going to go you know take down uh Google photos you know with an

evasion attack unless you had some kind of elevated access now that sounds like well I don't need to worry about that but um there's an important caveat it's not a showstopper and I'm to tell you why um later one more thing all of the uh OMG chat gbt said something it shouldn't have said um those are all evasion attacks too those are a special one they call them prompt injection attacks there's a couple other names I call them prompt injection but um those are also examples of evasion attacks you're you're putting in special input to get a a certain answer back all right let's look at an example of an evasion attack do I have one

oh yeah this is one in the wild this is years ago this doesn't work don't worry they fixed it right but this is um somebody shined lights you can see that person on the right they shines little white lights on the road making it turn it looks like a lane is turning and of course the car can't tell the difference between light and paint it looks just like the road is turning and just for a second it starts veering you know toward the uh trees there so this is a perfect example of an evasion attack this is a perfect example of why traditional information security is not going to work for AI right there are certain models that no

matter how many firewalls you have no matter what sdlc you have it's not going to fix a guy shining a light on the road right so that's why we need to take special care when we're deploying AI models especially you know obviously big important ones like self-driving cars okay gen Ai and classification models are especially vulnerable uh to evasion not every model is a lot of um more simple uh predictive models you don't have to worry about too much about evasion and robustness but um anytime that you're doing uh classification or generation you'll want to make sure that you uh put that in your uh thre model all right we'll talk a little about model extraction model extraction

super important okay it's an attack this attack only requires basically that I can query your model I need to query it a lot right but I don't need any kind of elevated privileges to your model or anything like that I can just you know be a normal user um the attacker basically queries your model so many times and stores the output in his own training data right he uses output from the Target Model to train his own model now this can be done believe it or not if with enough access and enough time and resources to where it effectively duplicates the model you get the exact or close enough by you know several decimal points uh result from the Target

Model you stolen the model right you didn't have to go in and download anything you didn't have to hack an account and all that stuff you just queried the model and then you have um basically copied it um now the the thing about this is yeah it sucks this guy stole my model he didn't pay for it he can go use that model and stuff but there's another thing remember when I said that you needed like special privileges you needed that white box to use um to make adversarial input if you have an exact copy of the Target Model you can now use your own model to build that adversarial input to the Target Model and you can do it you know you can

at your own Leisure basically perfect it as much as you want because you've got the your own model uh your own copy of the model so model instruction is super important like I said for a lot of reasons one is just they're stealing your model which is a crummy thing to do um two they can be used to create um the aders serial examples that we talked about earlier all right what happens if we're training our model based on model inputs by users uh some of you probably know this things like uh the one that always comes to mind is Spam detection right you click spam not spam and it it trains a model and they've

always used models for this basian models you know ever since you know Dawn inter interal I don't know but for a long time but now it's done with more you know these days sometimes more sophisticated AI and there we have a a threat from a data poisoning attack right so uh supervised training you have supervised and unsupervised training methods when you build uh an AI model right and if you're using supervised training it requires a labeled data set for example uh let's say you want to detect when your neighbor's dog appears in your front lawn and get an alert you need a model that's going to just identify that dog right um and get alert if if a car

drives by you don't care about that right some kid gets his ball you don't care about that you just want that dog right you want an alert what you would need is you would need a training data right you need a training set with a lot of pictures of the dog and a label that says you know this is the dog phto or whatever his name is um this way your model learns what phto looks like and when it's fed when you feed images of phto into the model with um your camera footage right it'll be able to pop it up and say hey I know what PH looks like because I saw it in

my train training data but if you started to let other you said this works really well you know what I'm do I'm going let other people use my model and they can upload pictures of their neighbors's dog right and get alerts too it would be great but what if someone uh added pictures of your neighbor's dog but labeled it something else other than phto labeled it Fifi your model's not going to know the difference and if you're just looking for phto it's not going to alert because it says that's Fifi that's not phto just an example of a um data poisoning attack it's weapon ized garbage in garbage out right that's a very targeted example but you can just

add garbage the model will spit out garbage it doesn't have to be something this you know this targeted or malicious I don't know of course this could work the other way by adding a different dog with the same label as the Target and that would you know bring up false positives thank you guys moving on what if we're using a public data set you know we we've downloaded a public data set to trainer model or we're scraping off of Twitter we're scraping off a Wikipedia not that anybody would do that um we tend to think we have this like TR you know this this it's it's down here now it's in my offshore data warehouse

no one can touch it right I've got this perfect pristine data set it captures the internet you're assuming that there's not a army of bots right just pumping misinformation onto whatever it is you just scraped specifically so AI wouldn't work that way right um the fact is we can't control right what people are putting on the internet and it's you know let's be honest if you download all the Wikipedia how do you find misinformation on there I mean it's difficult right you would have to use AI um and then you're just you know eating your own dog food um so this attack uh usually oh yeah there it is uh this attack usually um involves doing just

that right um you would need usually a lot of input um in order to to to make the model shift a little bit right so um uh this this attack can be used to plant back doors in a model it's not a name I'm super super thrilled about but it's what they're calling it and that's what you need to know is that you can plant back doors for example someone could uh poison a Spam data set if you wanted to train a model real quick on on spam you have a blank vanilla model and you're like I'm going to use Spam detection download a Spam uh data set and use it to train that's something you can do

right now right um but right uh an attacker could poison that spam data set right to allow for a specially crafted fishing attack to bypass the model you would never know that your model was compromised it's thrown away spam like a boss man nothing's getting through because it's amazing unless you know the key unless you have the back door the exact input to bypass the model that's why it's called a back door um you would never notice until you know whoever poisoned the data exploited that back door and flooded your email network with um fishing um so remember when you're scraping data sites uh or scraping uh websites there is a possibility that someone is poisoned that data set

knowing that people are scraping it right um it's not really hard to create you know uh zombie Bots out there to flood Twitter or whatever it is I mean that's basically what Twitter is now anyway um all right so uh yeah we got a few threats here there's many many more uh but this I what I wanted to illustrate was just the methodical kind of approach U as Brennan said this morning this is something we do anyway as security people red team or blue team or anywhere in the middle we're thinking about what threats um we can uh what threats can are opposed and how we can can mitigate them the stride method the other methods like stride they just

formalize the process they give a nice little picture you know that you can show to your organization and say we did due diligence or whatever uh let's see okay yeah uh the the cool thing about this is that you'll come up with a list that then you could have the mitigations right next to it so now that horrible diagram from nist right you can use it okay because you can pick out the mitigations from that model excuse me the threats from that model right onto your your um threat model and then pick the mitigations that that go through them you don't have to guess you don't have to wonder you don't have to Google what the latest attacks are are right

they've generalized them nist is one example miter Atlas if you prefer miter it's coming up um this one's just the oldest but basically it's a it's it's a tool that you can use in threat modeling if you try to implement the NY AI risk management framework God bless you go for it but anyone who's read a nist standard knows it's not that easy it's it's high level guidelines they don't really tell you what to do ever they just kind of tell you how to do whatever it is you choose to do so this is something we can actually do we can actually download the nist or miter list and we can threat model right an AI

model uh with those lists so like I said real practical uh use um so once we in oh yeah sorry lost my spot ah example okay so you notice I I changed the names here we don't have uh the the fancy AI names we we change them with the stride names here so um in my example right we have like this um um information disclosure is basically a textbook for stealing data if you're stealing data that's information disclosure so if we're stealing a model right it's pretty much a textbook um information disclosure attacker uh you know close enough at least we have tampering threat stemming from evasion we have data uh and data poisoning attacks right and

then I'm being um a little creative with the escalation of privilege from the uh Twitter scraping just to highlight the utility of the backdoor threat it's privilege that they shouldn't have had and it's a bit of a stretch but I want to give a different example okay so that's a a simple very high level accurate um AI threat model um we do have a Q&A at the end but I can take up any questions on threat modeling so far

fori yeah

so usually with with when you talk about hallucinations you're generally talking about generative AI um predictive AI can get it wrong but we wouldn't really call that a hallucination so when you're talking about like um gen AI right that we do it kind of the same way that we would look at those evasion attacks right um I'm gonna go I'll talk a little bit about it later but I'll introduce it now um it's a good spot um basically you test it by sending it input that you know what the output should be if it changes chances are there's been some kind of evasion attack right somebody has or or a poisoning attack I should say right something in the source data

has changed and you need to go look at it otherwise it's really hard to detect when a hallucination occurs unless you read every output you know what I mean you have to have known output and then it has to be triggered make sense any other questions on the that mod on all right are we going to take a little break yeah yeah right this is a meme it's depicting the trolley problem now the trolley problem goes like this a trolley is heading towards a junction and if it goes straight through five people in the track are going to die but you the person in the St experiment can pull a lever saving them if you do it will divert the train to a

different track saving those five people unfortunately there's one person on the track who won't be able to escape the trolley and will die if you divert it it's a thought experiment used in ethics um it was uh introduced by Philipa footon in the60s um it's used to discuss among other things AI ethics it's really hot in like the self-driving car space um and they talk about it in these AI security classes and so far it's been misrepresented in every single course I've taken it's a very simple concept one of them didn't even get the premise right I forgot what they said I wish I could have remembered what they said but they didn't even get this part right so

we're going to clear the air here all right first of all there are other um you know there are other examples everybody's probably heard of them if not you could have five babies and one old man you know you could have two women one rich one poor it doesn't matter um you just change it to suit whatever ethics discussion you're having at the time where you need to make a choice um but in the original thought experiment it was basically the premise was one man's life in exchange for five right it's a thought experiment and you can answer it any way you choose you could you could Dynamite the track you fly at the speed of light and rescue

everyone because this is only happening in our minds and not in real life um you can just refuse to answer and say well that's a stupid question you can but it's not very helpful when you use it that way this example is very helpful when you only have one choice whether you pull the lever or whether you don't um if you do not pull the lever five people die if you pull the lever divert the trolley and you'll kill one person that would have been just chilling if you hadn't uh pulled that lever that's the premise most people get it right but not everybody and before we continue I'm going to ask bearing in mind there is no

wrong answer and I'm going to prove it in just a minute if you're comfortable enough to share who would pull the lever saving five people raise your hand if you're a utilitarian lever puller people saver anybody all right all right where are I Duty bound d ol deontological lever levers not pulling the lever and saving the one person on the track all right about the ratio I would expect fair play if you didn't answer's third tell me there's a third answer okay stop just just sacrifices himself absolutely Fair answer Fair answer um save them all save them all all right still one man's life in exchange for five but it's your okay but it's a

choice absolutely exercising agency right these people on the track don't have agency they don't have a choice you do okay all right now listen in the AI classes when they they teach that AI should not be taught to pull the lever okay um that or should wow I wrote that down wrong uh yeah no I didn't it should be taught to pull the lever it should you should save the five people every time you should kill the one person every time it's a terrible idea it's not a terrible idea to pull the lever it's a terrible idea to to learn that rot to say always always do pull this lever and I'm going to show you why counter example you're

not a train lever guy right you're a surgeon a patient comes in with appendicitis he's on the operating table now other than his appendix right this person is perfectly healthy they're about to start grad school right uh they are get got engaged to their high school sweetheart they bought Nvidia stock right things are looking great but you know that you have five patients that need organs that this one healthy patient can provide right you could just you know take that person's organs right and give them to The Five People You could save five people by killing one and I'll even make it easy I'll just give you a lever pull the lever and one person

dies and five people are saved right so now who's pulling the lever right who who's pulling the lever now and killing a a healthy young person to uh save five people right um the point of the story is that philosophy is fun all right ethics is fun but it's hard right and it's a an AI security class is not the place to learn ethics right just you don't have to correct anybody but just just you know blah blah blah blah blah blah blah when they start talking about AI or Ethics in general and an AI security class um and that's the reason we still talk about the pro the trolley problem 80 years after it was introduced and we'll be

talking about it for a uh a long time more I don't know if y'all ever saw it but y'all ever see that trolley problem website where uh it was being used I believe to train uh self-driving cars or at least an AI model I don't think it ever made it into one I think that was the point but they would just keep giving you these more confusing trolley problems right switching up who's on the track right just trying to learn how in a model should react based on the opinions of most people who decided to go on their website and enter anything they wanted whether they believed it or not right into the uh training data it's

a terrible idea but all right thanks for the break that was fun we're going to talk about how we can defend AI models now all right we're going to talk about the Practical oh what does that say 10 minutes wow we're going to talk about the Practical uh implications right of the um threats that we identified in our tools um for Gen we rely on input validation and sanit sanitization right a lot um for our testing right for llms we're basically um we engineer a prompt that tells the model hey this is how you're supposed to behave and then we try to break it with those adversarial examples you don't have to learn you should learn how to do

this if you're in the The Biz but you don't have to right if if you have other things to worry about and somebody wants to throw up an llm you can download right Community made uh prompt attacks one's called Dan it's called do anything now I used it years ago I think it's still updated but I'm not sure but there are there are prompts that are just ready made just copy and paste it right and see if it breaks um for discriminative AI right there are other tools I use the um adversarial robustness toolkit by IBM uh it's I think it's publicly maintained now I think it used to be called IBM art but it might just be called um art now

and that will have just boom boom boom every example right of adversarial attacks that are in the libr Library there are other libraries and other tools out there one is called The Tool uh a toolbox instead of toolkit is more the the open source one but you can download these right now uh the code's publicly available there's good documentation on how you can run these in labs and things like that to learn how to attack and uh defend against different models and the models are listed there so you know exactly which one to use for your use case uh bcdr is perhaps the most important um because it's hard to patch a model and feature

analysis right uh feature analysis it it gives um hints right as to why an AI is making a certain decision the reason it's important for security is that if you R in feature analysis on a model and there are features it's not using like I don't know hair color for a dog I don't know man right if there's a feature it's not using get rid of it don't leave it in the model every feature you use is a potential attack Vector for an evasion attack it's like leaving Services running on a on a Linux server that you don't need you turn them off right same thing with um AI features you always want to ask is a security professional

for a feature analysis and a model card Sayan bcdr uh uh business continuity and Disaster Recovery you want to make a backup of the model is what I'm trying to say there right um you want to have a working backup so if that model drift starts happening I what do you do there's like I said you can't it's really hard to patch a model I I I don't know how to do that you have to patch the training data right um so just have a working backup and then you can kind of figure out how to fix it uh without staying down sorry that F tuning fine tuning uh it could be it could be very well be fine

tuning especially for those back doors those are done in the weights right of the model itself which is why how you going to get rid of a weight you're going to ruin yeah so fine tuning can be can be of assistance there um a poisoning is tricky but can be detected using uh feature analysis and drift monitoring um same thing we talked about just do looking at the input and watching the output that's how you know if it changes that you've got a problem in your data uh oh yeah model extraction right um like I said it's com from Attack your an attacker quering your model like um a lot right so you need to rate limit uh

rate limit how how fast people can um query your model you need to monitor the inputs and the outputs even if you can't look at them all it can help with forensics later um or root cause analysis never return a confidence score this thing it says how accurate it is never return that that is great information to give someone who's trying to steal your model right if you can avoid that turn it off the only time you should ever need that is when you're developing the model once it's in prod you turn it off all right you also might think that your phto model is safe because who would want to steal a model that just

says what your dog is but you need to know about transfer learning that means a model that is good at identifying one thing is good at identifying another one all I got to do is once I've stolen your model give it new training data and I can use your innocent dog finding model in order to find a certain person I can use it to find a certain vehicle I can use it to find an enemy tank right it can be used in ways you didn't want it to be used you didn't intend it to use um and as as well as creating those adversarial examples all right we are going to uh wrap it up very quickly first we talked

about how to diagram a threat model if you get stuck focus on the flow of data who's requesting and who's answering uh we have a list of threats we can use there's one from nist um olp has a a top 10 miter Atlas has some as well um and of course this this is in addition to the threat model you have to do for the underlying architecture I'm going to skip one and that's it all right thanks so much for your attention uh we do have uh maybe five minutes for questions yeah five minutes for questions um uh this is really my wife's dog and her name is uh Shasta the dog not my wife Shasta like an inherent Vice yo

I do not know the answer to that but I do know that here one thing I didn't mention is that you're targeting its robustness and maybe they thought evasion sounded better than an injection but it almost always I think all the example I've seen are basically basically an injection attack right you're if it's image recognition you're basically fuzzing pictures and then sending them to the the model it's you could still call it injection I don't know you know why because they were AI nerds not security nerds that's why they don't know I don't know it took n to come tell them look that's called injection okay buddy yeah good question I don't know anybody else yeah

I don't think you can I don't know of a way to shift that left that has to come out in like your regular sast and dast I think um I think once you find it then you then you know right if your sast and dash pops up then you know you've got a problem with a model and you need to go back and look at it um or you know of course that same monitoring that same drift monitoring other than that it's just kind of like with hallucinations especially in a big shop where you're cranking out code you can't look at every line so you just need to reinforce it later down down the pipe I one day

you'll be able to shift it left I'm sure but I don't know how right now good question I tell you what man a lot of answers is what we're going to do about it that's what I covered a couple of years ago are we have no idea right that's why we haven't fixed prompt injection attacks either we don't know we're still we're working on it we have one model that can help another model now we have two vulnerable models that's been my favorite my favorite so far we use AI to check AI oh that was a good idea automation exactly yeah and we're taking the human out of the loop by the way you'll never

know the output until the the New York Times prints it oh you look at that right anything else all right question

sure oh yeah ofies don't

yeah that was on the slide I skipped okay so a lot of the the the threat modeling I do is like vendor AI like you know G Gmail Google is not going to give you the the code or their model to test for how they do you know grammar correction or Google photos you know identification and stuff you know you're up a creek what we do now is attestation we basically say here's here's our requirements just like you would say tell to a vendor you need your your application needs to integrate with our SSO you say you need to do robustness testing and you need to provide us the output there's no certifications yet one day just like you

have sock certification like AWS you can't go in and pinest their websites I say look we're sock certified you're good I'm hoping one day there'll be a similar thing for AI because that problem is not going away it's going to get worse and worse and worse we don't have any visibility into it now we basically just ask them to certify it right and if they've been through anyd party at a station we ask for that too especially for what they call foundational models these are models that are doing things like image facial recognition biological scans and things like that you want to make sure that they they work the same for everybody and they've been through that

responsible AI thing because it is hard it's I you see all these products pulled off the market that don't work because it's really really hard to get a non-bias data set out there yeah good question all right I think I'm out of time one minute easy one all right well thank you very what's that oh the dog yeah yeah yeah she's she's a sweetie she's uh she's at home wondering where mama is right now thank you very much everyone appreciate it enjoy the rest of time

AI Security: No hype. Just hacks

Related talks