Attacking AI: A Primer

Name: Attacking AI: A Primer
Uploaded: 2024-09-30
Duration: 33 min 5 s
Description: Achim Brucker surveys three primary attack vectors against machine-learning and AI systems: adversarial examples that fool image classifiers, data extraction attacks that leak training data from large language models, and data poisoning that corrupts training datasets. The talk explains how ML diffe

BSides Exeter · 202433:0567 viewsPublished 2024-09Watch on YouTube ↗

Speakers

Achim Brucker

Tags

CategoryTechnical

TopicAI Security Vulnerability Research

StyleTalk

About this talk

Achim Brucker surveys three primary attack vectors against machine-learning and AI systems: adversarial examples that fool image classifiers, data extraction attacks that leak training data from large language models, and data poisoning that corrupts training datasets. The talk explains how ML differs from traditional software development and discusses emerging countermeasures including explainable AI, input filtering, watermarking, and data governance.

Show transcript [en]

welcome um I'm giving a talk about attacking AI it will not be a very technical talk um the first disclaimer um it's essentially really the primary focus giving you an idea what machine learning actually is under the hood what is the core idea um and what are specific security attacks I picked essentially three U to Ai and machine learning systems and for today will most likely say I I guess um people in the field separate between artificial intelligent and machine learning and the semantics of those terms changes over time um everything that we usually in the press these days call some intelligent me system will all be covered by what I'm discussing today uh also the examples that I show that are

results by other researchers that are working more in the AI field um I'm happy to publish the references together with the slides I didn't put the det reference on each slide because that's just overcrowding them I'm doing research in that area but mainly for really high Assurance systems where machine learning plays a role these days as well if I explain that type of research U that slide already would be fulled with mathematical formulars um I think that's not what we want to have on a Saturday afternoon before we are waiting for our afternoon key machine learning AI very successful that's a couple of success stories over the last 10 years go that uh game is

considered to be much more complicated for computers that chess chess computers we have since a while took much longer um for computers to beat professional go players but that Milestone has been achieved we have seen image manipulations AI systems um the meantime they generate whole videos the first of them um here already in 2019 were essentially turning pictures into animated images short videos um very impressive um AI is a lot used also in pharmaceutical research and biological research um very very valuable um a result that I personally um like a lot because it's more related to what I'm also doing research wise very recent actually from this week uh Google has announced that their artificial

intelligent intelligence essentially uh can achieve a silver metal in the international mathematical Olympiad so solving complex mathematical problems so that looks all pretty good and trustworthy and also surprising that computers are able to do that um what what is machine learning actually and how does that differ from traditional software development essentially we have our data set that we explore and we use that for training a system so we start with some concrete input values that describe the problem and solutions to that problem and then essentially use that for training data that gives us a model that we deploy or install maybe and then essentially we hope if we Fe new data new problem instances into that trained

model that it will predict the correct values based on the training phase and that sounds a little bit like oh yeah we teach some kid some new topics they learn and then they become an expert in that area and that's a little bit also what we at the moment see with all that hype around how crate AI works that a lot of people put a lot of trust in into the answers uh given by AI systems at their core there's still pretty much essentially stasic systems um if we take it very very simple we have a craft or a plane and essentially we just want to have an AI system that distinguishes if I put a point in that

plane into that AI system if it is in that um from your side left hand up upper part light yellowish or in the uh right hand lower part in the black part and maybe we select some data points as training set you can already see this data that we use for training is much less than the data on which we later expect that the system works correctly now if we have that holistic view we see the complete problem for a human it's easy to say this is the line that separates those two surfaces and everything what is above that line is in the yellowish area everything that is below is in the black area now we want the computer to

find that line that distinguishes those two areas and that might not be perfect that might come up with that bluish line that hits a couple of data training points very nicely others with overshoots or under shoots a little bit so but overall that still looks pretty well given those data points the training set on which we check that the system has learned the problem well we might end up with such a curve and that's of course not something that approximates our problem very well but that might happen and if we now compare essentially the gold standard the edal solution that we have with that actual approximation there is a large orange area where our trained ml system would come up with

wrong results now in that two-dimensional problem that's easy to see that this is not a good system networks that we are using these days or AI ml models that we are using these days they have 10,000 100,000 of parameters that need to be tweaked and that makes it essentially impossible to understand in detail how well they work how well their predictions actually are for problem instances that we don't know the correct answer for that we cannot validate immediately U that we trained the system on so we have already always a certain likelihood that the uh result that we get from an AI system is actually wrong and that's inherently something that we need to live with

the other big difference to traditional software is that we don't have a human readable source code anymore so for those of you that have a background in software security test for crossy scripting vulnerability SQL injection that human written implementation that we can understand and analyze and explain why there is a security problem is gone that's some complex statistical system different neural networks decision so there different formalisms essentially mathematical models underneath but all of them have in common they're super large and automatically trained hardly understandable if at all for a human that's the big difference with AI systems and now let's have a look at a couple of attack classes and the one that I will spend

the most time on a couple of examples or the the nicest example essentially uh and maybe also the best investigated best researched type of attacks are injection attacks and that started already also again 5 to 10 years ago uh with something called adversarial examples on image classification systems so we have here an image of panda on the left hand side that's a in the meantime really a classical example um was one of the first papers showing that type of attack that is classified by a net neural network as a panda as it should be if you think about more realistic application of those type of image recognitions think about your modern car that shows you there was a stop sign or

a speed limit sign in your heads up display or even reacts on those street signs automatically in semi assisted driving mode now we can generate some random pixel image that the same classification Network for whatever reason classifies as a nematode doesn't really matter what that means but white classifies that um pix that pixel solid essentially as a nematode uh if we overlay those images the resulting image looks a little bit distorted but for us as a human it's clearly still a penda whereas the neur Network classifies it as a given that's funny and at the beginning that was also more like Okay small changes to the input image can result in rather significant changes in the output

classification but it has not been seen as a security or safety issue but if we think about those images as straight signs then maybe a stop sign with a little modific ation is suddenly uh classified as a speed limit sign and if you now think about autonomous cars um there's a difference if you're approaching a crossing and you need to stop to see if on the main road there is any traffic versus you think okay I can try 50 m hour straight ahead and everything is safe that might result in a pretty heavy car crash and of course stre signs are not always nice and clear and easy to read here on the left hand side we have one that has been

beautified with a graffiti uh most likely not with bad intentions the one on the right hand side has been carefully designed by a security researcher putting a couple of white and black stickers over it and those four stickers result exactly in the same misclassification that I Illustrated here that the stop sign is being classified as a speed limit sign for commercial neural networks that are used in modern cars and now we can think about an attacker running around in the city and putting couple of stickers on street signs and creating a lot of hoc um that is definitely a security attack now that still looks of course pretty much visible to us with a naked eye we still

see that there's a stop sign but we also see there's a modification on the stop sign recently researchers in the US came up with something that is closer to that example was the neote in terms of some random looking pixel image but the interesting aspect here is they can successfully trick Commercial Street classification systems using AI into misclassifying stop signs by taking this image and essentially taking a projector as we have one here in the room maybe more those larger ones that are used for projecting images on house walls um in the outside overlay the stop image and that stop image here is overlay with that uh pixel image it's not recognizable to us and again the classification system um

misclassifies that image that sounds all pretty complicated and this is a part where um somebody implemented a nice demonstration these are essentially state-of-the-art image classification networks that's a website everything what I will now do is implemented in JavaScript and runs on my small laptop in the web browser so there's not some heavy GPU based cloud computing facility behind it's pure JavaScript running on a standard laptop the image sizes that we have here are actually they they look very low resolution but they are actually realistic in the sense of that what we currently have in cars is recognizing doing the recognition of street signs roughly on those size of images because they take the camera image first identify the street signs

extract them and then do the classification after the extraction of the actual images that you want to um signs that you want to classify you end up with those 70 * 70 pixel size images um so I have here can run the prediction that takes a moment and I hope my network is connected that the JavaScript is loaded that looks good I hope you can see that the prediction here says hey this is a stop sign and I'm also getting a probability the neural network thinks wor 99.85% and I mean for us as late persons that means it's really confident that this is a stop sign and now I can run a tech generator also again in my browser in real time

asking the system please modify that image a little bit so that the same image recognition system for street signs recognizes that as a 100 kilm per hour street sign the data set are German street signs um at least I think they're German ones that's why we have here kilometer per hours and I'm now generating the attack image that again takes couple of seconds three two three seconds looks a little bit distorted um that was the generation of the ADV seral image I'm running the prediction and with 99.91% even with higher probability higher confidence the Samual Network classifies that as a speed limit sign what we can see is that generating those attack images is computationally

comparatively cheap it runs on a standard laptop whereas training your own networks is really expensive so those attacks are not super expensive for an attacker to generate the question of course is then how to inject that modified image into the image recognition system but um that we can leave as an exercise uh whoever would like to try that um don't need to be too instructive um another attack uh who of you has used GPT or Cod pilot any of those new generative AI systems they essentially do the same again they have some internal complicated mathematical foundations but essentially they generate the next word based on on the last number of words that are generated what is the most likely word to expect

next and continue that way and researchers in Switzerland found out that again I think that's jbt 3.5 or so a little bit older version than what we have at the moment if you give that the task repeat this word forever poem poem poem CH generates the word poem for quite a while a couple of hundred times and then suddenly generates that data that was used for training the model and they found for example cacy confidential data in their personal data like phone numbers um fact num cell phone numbers website email addresses of people whose information was used for training that mural Network so we suddenly as an attacker obtain training data and if you're an AI

company you train your network on your confidential data on your Trade Secrets because it helps your employees your staff maybe you generate a customer agent that then customer support agent that is of course something that you from a commercial s don't want to happen and from a gdpr perspective you don't want to happen either and that's then also the reason why companies are pretty um unhappy if employees without prior permission start copy and pasting information from your internal infranet into jet GPT because uh depending on the configuration those AI companies are using the data that is being fent into the system for retraining the model it ends up in the trained model and might be uh accessible uh in these ways to

attackers that's essentially the most common attack and the attack where we have a lot or lar is maybe a little bit overstating it but a reasonably number of real world examples against Real World AI systems the the other attacks are at least in the while not seen that often and maybe a little bit more theoretical the next one is data poisoning um you might have seen this headline that the Google AI suggested to put clue in the tomato sauce for making it stick on your pizza and there we could find out that essentially there was a post on Reddit that suggested that or sarcastic the Google AI took that for real essentially but now we can think inacker

could actually try to inject such misinformation into the training set to miscredited such an AI system or harm them in other ways and that is called Data poisoning um other problem instances where we see that is a lot of the image recognition systems that are being used for recognizing people are developed by companies in the western world or maybe China but essentially in countries with a relatively low percentage of people with darker skin colors and they certainly don't work uh reliably in for people that have darker skin colors and that's of course not something that we want to have um that's essentially a racial problem here but again also shows that the quality of the

training data has an important impact as an important factor in the quality of the ey system that we get later and again here it is an accident or hopefully it is an accident it's more done intentionally by the company and quickly um developing those image recognition face recognition systems but again we can easily imagine that aner tries to exactly use that in a negative way uh and intentionally so data poisoning is whenever an attacker tries to modify the data being used for training and that might sound completely silly and difficult to do but keep in mind systems like J gpp are trained on everything that we write on the internet so if you want to start

modifying what jgpt is doing publish your own plock essentially um and of course if we are talking about ml models machine learning models or AI models that are trained on proprietary data then of course an attacker might also be trying to get access to your data then we are in the traditional information or cyber security topics um changing labels so changing this is not a stre a stop sign that is indeed um a speed limit sign adding data sets or removing data from uh the training set to change the behavior of that model and we would need to first secure our data and secondly if such a data breach happens uh need to detect that be able to detect that of

course um also interesting again if particular if you train the models or adapt an existing model to your company data then essentially the train model is becoming your crowns that's your trade secrets that what makes essentially the essence of your company in particular a more service based business uh you don't want other people to obtain that trained model because essentially then can just clone your company and your services um so that's another attack Vector that again sounds pretty difficult to do um essentially there are two attacks known if that model is included in a piece of software like we tried earlier to put uh secret Keys into software which work also well for DVDs and the

like um then of course people attract those extract those models from your software but there are also attacks known where we have online systems and the attacker can just query as a black box that AI model essentially use J GPT and by asking well crafted queries learn the parameters that have been used or have been obtained in the training phase of that model and for many of those models the basic architecture is being known um and people are at least in academic papers able to get pretty close to what most likely is the original model so the Clone model Works nearly as the original one now I said at the beginning AI is very different from software we don't

have that piece of source code let's revise and check if that is actually true and recently the NCC published a guideline for the secure development of AI systems I was very keen to see what they actually discussed in that development guideline and interestingly enough things like stealing models um adversarial examples data poisoning are hardly discussed in that guide uh most things that are in there are topics that would be taught in a secure software development module or uh uh should be standard pra practice uh best practice for developing secure software systems and that is actually true because AI systems hardly work in isolation they also always interact with the Software System maybe the machine learning model is encapsulated in a rest

service so everything that you learned about secure software development stays also in an AI driven World highly relevant and needs to be applied we just need to add improve our security skills for a couple relatively small number of AI specific attacks um but we still have a software system around and any vulnerability in that Software System essentially makes our AI system vulnerable coming to the counter measures and conclusions that we have time for questions as well um I'm not discussing the counter measures for the software security topics things like sanitize your input use uh prepared statements buffer overflows and the like should be well known if not that's a different talk or a series of talks U

not for today if we look at the three AI specific attacks um to be honest there is not so much that I can offer in Working Solutions only in we are as a community thinking about Solutions and working on them but they're way less mature than what we have in the secure software engineering World um if we look at those adversarial examples and injection attacks there is a whole research stream called explainable AI that tries to enrich those trained models so that they also come up with a form of explanation why they came up with a certain result that essentially should give us a better confidence uh that the generated output is correct I'm not so sure if

that is the best way for preventing attacks uh to be honest I don't want to see a justification why that stop sign has been classifi a stop sign because then when driving that car I need to read that document as well I don't think that that is too helpful but it's hopefully helpful in terms of helping us to understand how those TR models work and make them more resilient in general and what the AI companies so for example AI for J GPT at the moment are doing is a hell lot of input and output filtering to prevent people asking things that um open myi considers to be questions or requests that thei model shouldn't answer or to prevent the IM mod to

generate answers that it shouldn't do if you play a little bit around with questions that might be uh a little bit outside of um the norm you might have gotten answers back from chat GPT like as an AI model I'm don't have an opinion on that or I'm not answering the type of question that's essentially all based on filters that OPI or the other companies offering such models are applying um for data poisoning I don't have a good solution I think that's an open research problem we essentially need to have a good data control of the training data uh but is that where realistic given the sizes of training data on the other hand the quality of

the AI model depends on that so we should have really um that should be our goal and we should be investing not only from the security side effort into that um and of course if we are under control of the data then authenticity and access control to that training data is something that we can control for preventing or at least detecting AI models being stolen there works on watermarking on those models so essentially what the companies are doing is they intentionally add false training data into those models for data points they consider nobody will ever ask that and if then somebody copies that data model they can essentially ask certain questions to that model get wrong

answers and can say your model is generating the same wrong answers as ours does so you must have stolen our model it's essentially the core idea of water mocking and we have seen but we are not sure if that is done for preventing the stealing of models that Google for example starts no longer reporting confidence values as a percentage value but only as a classification low medium high which makes those attacks at least harder if they're doing it for preventing those attacks or not is unknown to us of course and if there are two things that you should remember for the talk if you forget everything else um the first one is all attacks that we see against

software systems are still relevant in a machine learning based world that knowledge is still relevant needs to be applied from secure programming to securing your software supply chain your data supply chain and secure operations of systems and the other one is and that's the big problem that we are fighting a lot of security areas but is particularly valid for AI systems finding an attack is much much easier and also much better understood than securing the core AI systems um as you have seen generating um atmospherical examples Works essentially in the web browser u 10 lines of code written by an underc student can generate AAL examples um implementing the systems that train AI models uh is firstly much more

complicated and requires much more runtime um gener much much harder that's all from my side with that I'm open for questions should give us five minutes for questions Y is it at it as a bad thing but from societ consider people creating origal content like artists they are looking increasingly Asis way protecting their IP and their loc yes so so like does that mean therefore then that organizations should increas to P access to train they want so that it isn't I mean that's the ethical and Commercial question yes on which side you are protecting your data I gave the presentation from the point of view of somebody who wants to use Ai and want to protect their investment

into trading thei systems if you an artist or an author and don't want your work to be use and train and we have that in the academic world at the moment last week it was uh uh somebody detected essentially that one of the large academic Publishers signed a contract with one of the AI companies they are now using all the academic published articles for training their network uh if you want to prevent this then data poisoning in a certain way is a form of doing that um if you look for prompt injection attacks there are people now adding um white te white text on white background into their emails essentially instructing Ani model to answer in the

form of a poem or so to detect if people actually write the answer to the emails themselves or let them formulate by U some generative AI system u i mean a knife can be a very valuable tun can be a weapon uh both sides are valid depending on the

context any other questions [Music]

really thank you

Attacking AI: A Primer

Related talks