Insane in the Supply Chain: Threat modeling for attacks on AI systems

Name: Insane in the Supply Chain: Threat modeling for attacks on AI systems
Uploaded: 2024-07-09
Duration: 46 min 11 s
Description: Insane in the Supply Chain: Threat modeling for attacks on AI systems Eoin Wickens, Marta Janus Supply chain attacks suck - that's a fact. If you’re hit with an attack, your executives could find themselves in trouble with the SEC and your company's reputation left in question. In the age of AI, w

BSidesSF · 202446:11500 viewsPublished 2024-07Watch on YouTube ↗

Speakers

Eoin Wickens Marta Janus

Tags

TopicSupply Chain Security

StyleTalk

About this talk

Insane in the Supply Chain: Threat modeling for attacks on AI systems Eoin Wickens, Marta Janus Supply chain attacks suck - that's a fact. If you’re hit with an attack, your executives could find themselves in trouble with the SEC and your company's reputation left in question. In the age of AI, will all that has happened before, happen again? We evaluate the risks of the AI supply chain. https://bsidessf2024.sched.com/event/fddd31c1b9fc5eb3c6f14ac6afe5325a

Show transcript [en]

hey everybody uh thanks a million for turning out um amazing conference amazing people and so glad you could all make it today um my name is Owen wickens and this is Mara Janis we are researchers at Hidden layer and we work in the synaptic adversarial intelligence team or s for short now we've been researching uh adversarial ML and AI security for the last couple years and um yeah I guess we spoke here last year I'll go to that in a second but today our talk is insane in the supply chain uh threat modeling for attacks on AI systems um just want to apologize in advance that we will more than likely not have time for questions today we

really overshot the slide count uh this time around so uh yes cool so you might remember us from such talks as sleeping with one AI open uh from bsid SF last year uh huge shout out to those of you who are returned visitors um this year well last year we kind of introduced attacks on AI and machine learning this year we're drilling down a little bit more into the supply chain so more specifically we're going to be talking about um first an introduction I guess H then supply chain attacks as a kind of a broader overview within the traditional sense and then we're going to look at different ways that the AI supply chain can be attacked through data poisoning

hijacking of models exploitation of tools and services and a little bonus snippet on uh L llm indirect prompt injection so it has been a hell of a year um since since this time last year llms are really taken the World by storm they're being integrated into almost every product in service imaginable and this is fantastic um it's really at the start of something and it's it's only going to grow um multimodal models made a big appearance um this is I guess just a far more comprehensive tool for all of us to use where it can understand different data mediums um and then there's been a huge push with the democratization of AI as well uh through

open source models um where they're really competing with Enterprise Solutions um things like you know retrieval augmented generation are super popular even though even a a billboard in San Francisco Isco talking about Rags which is still baffling to me from a small coastal island off the coast of Ireland um but here we are um and then finally you know things like uh humanoid robots as well which really seems futuristic So speaking of Open Source models um one rudimentary but very tangible metric that we use to track the adoption of AI is by looking at the number of models on hugging face so for those of you that don't know hugging face is kind of like the GitHub of

machine learning um and it's a place where individuals and organizations can share models and data sets and you know host them with compute and everything now this time last year um almost to the day there was 176,000 repositories up there today there is 64,000 but it's not stopping there we're seeing that number grow minute by minute day on day and we have to ask a few questions how many tomorrow how many next year and how many are going to going to be there in the next decade I'm not even going to tell you how long that animation took me to create so one statistic that jumped out at me um yesterday actually when cheny Wang was uh doing her opening keynote

was that chaty PT took five days to reach 100 million users that is just beyond comprehension and like this rate of adoption is beyond anything that I certainly have ever seen and and nothing stands still um you know AI is no different what we see today is just a glimmer of what we're going to see in the future and that's hopefully what I showed with the last slide there um and whether you like it or not our future has been redefined and we have basically entered into a new Industrial Revolution now the issue is how do we secure this and prevent attackers or Bad actors from using AI or our own AI systems for their

own elicit gain um so we need to ensure that our future remains a positive one so to look to the Future Let's first look to the past so we we've been pretty hit by some pretty major supply chain attacks over the last few years I'm pretty sure that every single one of you has dealt with or triaged some stuff from one of or the Fallout from one of these attacks they're pretty hard scars to forget you know from not pachy in 2017 causing you know huge damage to giants like MK and billions of dollars to in Damages to solar winds which um I remember that one that was uh not great but it's it's it's basically a

compromising or compromisation of the uh the build system that produced the Orion it management software and that's kind of interesting because we'll talk a little bit more about that or similarities with that later I mean the fact that their siso got charged by the SEC is you know a pretty Stark indicator of the severity of basically the the the role that we all play in in in as a cyber security professional organizations today we've seen things like ca Distributing ransomware log 4J which was a massive massive open source supply chain attack the OCTA breach and most recently the XC back door which we're still trying to figure out the extent of the damage so what makes these

attacks so devastating and what is the common denominator that is making these attacks ultimately so successful for the adversary there is two key things one is the trust between the producer and the consumer and the other then is the reach so this is the one to many relationship between the the the producer and consumer Um this can make you know anything from like basically a platform or product that is used across the industry to be a really a lucrative Target for the adversary now these two properties also apply to things in the AI supply chain but what is different about it compared to our traditional software supply chain well it's the same but different three main components that we

consider when we're looking at the AI Supply AI supply chain at data models and tooling data is the data that goes into a model it's what we use to train it um the model is the result of the training the the result of vast computation across this gigantic data set which is you know often bundled into applications or deployed on endpoints and then finally the tooling which is what enables its development its deployment and its monitoring we have a diagram here of the ml devops life cycle we're not going to drill into it because we don't have time and we would have had to add another 20 minutes if we wanted to talk about all

these phases but it's very similar in nature to the devops life cycle and for the sake of what we're going to be doing today we're going to focus on four key things data or four key stages within this life cycle data collection training deployment and monitoring um what we're going to be doing just a little heads up is we are going to essentially threat model around a specific scenario so we're going to propose a scenario that you probably would see or or hear of somebody encountering and then the risks that are involved with that specific scenario align to these four different stages and we'll come back to that in a second so let's go back to data you know

how important is it well like humans models are what they eat and if you put garbage in you're going to get garbage out now what this means ultimately is that if you feed or train a model on bad quality data you expect bad quality results and ultimately models mirror what you feed it so data um is usually comprised in a data set what is a data set you know it seems kind of obvious usually we think of them as things like images Text tabular data but data sets don't always take this this form you know you can think of data as a website you can think of it as a GitHub repository you can think of a data set as a collection of

references to other pieces of data and when you think about that it opens up different attack vectors one such attack Vector was this it was data set poisoning uh proposed by Carini adal where they had a paper basically poisoning webscale data sets they were able to basically abuse this data set format here where instead of hosting the images for a a a instead of having the images in the data set it pointed to a URL so what they did was they bought up URLs where the images were hosted and they were able to start poisoning the data set and they were able to poison about 0 one% of it for $60 doesn't sound you know uh too costly but this 0.1% is

enough to have a really significant effect on the on the the classifier so that's quite interesting other real world examples are things like poison GPT um we saw the uh a tool called Nightshade allowing artists to poison their own work so that when it when a an AI provider build a model on their work it would end up trying to it would end up uh in effect disrupting that training process and causing label mismatches we saw recently an attack um where millions of malicious repositories were uploaded to GitHub why would that be well what happens if you're training a model to write code on um to write code based on all these GitHub repositories well

they're going to introduce vulnerabilities and because they're introduced at the start of the supply chain when that filters all the way down you're going to see more vulnerabilities in Enterprise applications since we're relying on these llms and co-pilots Etc to do more and more tasks for us on the daily also within the realm of data poisoning um we we're using rag an awful lot these days as an industry uh rag is retrieval augmented generation it's where you or I could take a load of files we could put them basically convert them into embeddings and then when our llm is queried it feeds us back information based off our our documents and it allows it to to to kind of give

us more accurate response based on our our our data set or our data that we feed it now Rags can be poisoned as well and if we're able to do that um then it's quite detrimental to um to any of us here so one thing here was they were able to achieve a 90% attack success rate when they injected five poison texts into a database with millions of texts so there's a massive um ratio a massive ratio disparity here so a realistic scenario I want to fine-tune an llm to create an internal programming co-pilot based on our company specific use case so let's say I'm doing load of coding in one particular language for a specific um

for a specific use case data collection I scrape 100,000 GitHub repositories and internal code repositories to to create a code generation model the training side of it is I'm going to fine-tune an existing llm model based on the data that I've gathered then I'm going to deploy it internally across a number of different and important engineering teams and then finally in the monitoring phase I start to notice that the model is introducing bugs and vulnerabilities into the code which are making their way into production systems so what can we do about that well we have to ask a few critical questions data collection where did I Source my data from now you wouldn't want to you know Source your data from a

subreddit for instance you wouldn't want to Source your data set from random repositories across GitHub because you haven't curated and tailored in the data set or ta ensured that they're free of vulnerabilities you really have to have a critical eye there is my model going to learn bad habits from this data as we said garbage in garbage out if your model or if your data set has vulnerabilities in your model is going to reproduce those it doesn't have the intellect to be able to correct for those on the Fly and ultimately you know we the the the question we should probably be asking at the beginning are there vulnerabilities in the code examples in the data set in the first

place when we consider training a model who built the model that I wanted to finetune what was it trained on in the first place and was it even built for codenation with for to be a production ready system within deployment we need to be looking at things like will it produce erroneous output so will it produce vulnerabilities has the model passed all possible evaluation checks and has it been evaluated by a security team we're seeing a lot of Shadow AI these days it's great to be able to you know imp ment things really quickly but if it's not going through rigorous testing and and and stuff that we would subject our own code to then this can be

kind of an issue here now obviously we have you know vulnerability assessment in in in code databases anyway so that's good but within the monitoring phase am I monitoring the performance of this model post deployment have I you know done something by accident like uh test on the train data and it's giving me 100% success and this is producing perfect results every time or am I performing vulnerability assessments on the generat code so just keeping an eye to be continuously checking in on the health of your of your systems especially in case of things like drift so that's me for the time being I'll pass you to my co-host Marta uh to take this over cheers so we looked at the

data the data poisoning um issue uh now we're going to look at the issue concerning the models trained models themselves uh so model hijacking and model backing uh mainly uh so first first of all let's define a model what is a model like Owen said before a model is a result of a huge computations run over a really gigantic data sets usually uh so well in case of generative Ai and llms uh especially those data sets can be really really really huge and the computations can take a long time uh and what comes with that Al also um a lot of Financial Resources uh so uh put together those computations uh those mathematical algorithms uh the data sets uh money and

time give it a big shake and you get more or less a trained machine learning model that's on the surface and uh under the hood uh a trained machine learning model it's basically code and to be more precise uh it's um usually uh a huge chunk of binary data which is wrapped up in one of another serialization format uh and those serialization formats are inherently insecure not all of them but a great many of them uh some of them were developed uh much earlier uh before machine learning became uh even a big thing so uh they were used for for serialization of much smaller objects they are not really um designed for uh for storing this kind of this

amount of data and they were well simply saying they were just designed for for a different reasons and they are not really uh very good at um at serving us at machine learning uh so um most of machine learning Frameworks have their own serialization formats preferred serialization formats uh um and uh uh they often stick to those formats uh even if uh if those formats prove to be uh vulnerable uh we touched on this issue last year a little bit now we're going to look at it uh in a little bit more detail uh we're going to look at um how those models can be uh exploited uh and uh subsequently used in um uh supply chain attacks uh

and uh yeah uh like I said before um lots of those format are insecure by Design and the problem is that many uh many vendors uh developers of those formats they they don't really consider those uh uh insecurities as vulner abilities uh often they just say well that's how it was designed it was designed for a different purpose and at that time it it was a feature that that was used by people and was like U uh maybe not necessary but something that people like to have uh and uh we often just um get an answer like this one uh down on the screen from Google uh regarding tensor flow it's it's a it's a

feature we're not going to fix it and uh uh if uh if somebody Falls um uh if somebody um gets exploited in that way it's their own fault because they were using it in a wrong way uh which is basically Shifting the blame to the user instead of actually trying to do something about it uh another uh big issue here is that um we think of machine learning models uh well we we don't really think of them right now uh in the time frames of software so for for traditional software we have all those security measures that we came up many years ago like uh digital signatures to prove that the model is actually uh U comes from from the source

that we think it comes from uh to prove the Integrity of the model Integrity checks even simple things like uh like hashsum obviously running a hashsum of on a on a Model that is like 100 gigabytes is going to take a bit of time but still this is something worth doing if we are using models downloaded from the internet that we don't really trust uh there is no malware scanning or very scarce there there are steps being taken in this direction already so that that's something that changed during the last year but still many um anti malware products will not touch uh machine learning models because they are just so huge and it takes so much time and U

resources to toor the scan and and some of those products don't even know what specifically to look for uh so those are all the uh uh considerations that we have to take into account when we are uh thinking of downloading a model from the internet than using it in our environment um and yes last year we we showed um we demoed some cool stuff we um actually exploited the machine learning model pre trained resonant model we embedded some uh malicious code inside it and upon loading the model was uh executing ransomware it was very easy to do uh for us but I think for for for the attackers it would be also quite easy so we we decided to look at other

um um machine learning models as well here is just a quick recup of uh of the pickle file format so pickle is basically a uh a python module that does realization of python objects it was designed for python objects and structures not necessarily machine learning and actually not was not really designed for machine learning stuff uh uh and the objects are serialized um to dis as binaries uh which are then interpreted by a simple virtual uh machine the that has uh around 70 instructions and amongst those instructions there are a few instructions that actually allow for uh loading of arbitrary python modules and executing python functions with given uh arguments so uh it's very easy to

actually inject something into already pre-trained Model A simple python script that will uh I don't know spawn a reverse shell or uh run any kind of malicious binary and that's what we uh did in this little example this one was just printing uh hello to the console uh but yeah that's that's that's what it takes it's really easy and most of all the worst thing about pickle is that this this this vulnerability which is not really a vulnerability because it's a feature it's still there and nobody is fixing it there is just big red warning on python uh documentation website that uh untrusted models shouldn't be deserialized but but people do that people just download models from the

internet and load them into the memory without much thinking around it so uh next uh we looked at the hdf5 file format which is also quite popular uh this is the file format that uh is amongst other used by the caras machine learning framework uh and um uh as such hdf5 is not really that insecure we didn't find the code execution from hdf5 by itself but caras makes it easy for us caras uh allows you to inject so-called U to add so-called Lambda layer inside the hdf5 serialized models and what this Lambda layer does uh is well it's basically designed to um for for execution of arbitrary uh to allow for execution of arbitrary expressions and

what that means uh it also allows for arbitrary code execution and especially because those Lambda layers are serialized using another python module called Marshall which is just as insecure as picle so uh cracking it took us uh not much time we we were able to inject arbitrary code in hdf five caras model and upon loading of the model the code was executed now tensor flow is another uh machine learning framework uh it's a little bit of a different Beast it uses a file format called saved model which is based on protuff uh and per se save saved the model doesn't well so far we didn't find a um code execution vulnerab in it but again tensor flow itself

provides us with functionalities that can be abused and can be used to create malicious uh machine learning models so tensorflow lets us to read and write files to the disk via machine learning model it lets us send and receive data over a network it lets us spawn additional processes and if you put all these things together it's like it's super easy to actually create a malicious model that will either exfiltrate the data from the disk or uh drop a malicious binary schedule it for execution or spawn a um a process from inside the tensor flow um context and again we uh we tried to contact Google about it and uh what they what they said

like you saw on the previous slide was that it's a feature and uh tensor flow uh is not um not designed to be used in this way and people just uh should run all their models in a soundbox but do people run models in the soundbox not really no uh so um another file format that is used for serial serialization of machine learning models is onx or onx however you pronounce it uh and this model also um doesn't seem like um like it has a code execution vulnerability in it but what we were able to find in it is a a directory travelo so um there were already some vulnerability directory traversal vulnerabilities in it uh

discovered by other researchers a couple of years ago but this year uh some colleagues from my team discovered uh another ones uh to the same uh effect those ones were actually um fixed so if you use onx please make sure that you use the latest version uh and R is our well one of the latest research from the colleagues of my team uh uh and um uh R is uh um a a a programming language that was specifically developed for uh statistical Computing and uh machine learning so this is something that uh um was not just adopted by Machine learning U researchers and uh Engineers but was uh actually uh designed with machine learning in mind uh but guess what it

doesn't change the fact that it has a code execution vulnerability in it uh and uh uh yeah it's just as insecure as the pickle and the um Caris version of hdf5 um but in this case it's one of those rare cases where we were taken Ser seriously uh a CV was assigned to this vulnerability and our developers um fixed it in the latest release so uh if you're using R please make sure as well that you're using the latest release that is not vulnerable to this code execution uh exploit uh so besides exploiting um model file formats there is another way that I thought that it it's really important to mention as well even if it doesn't

happen uh a lot in um um in the wild or maybe it does but we don't really have a lot of visibility into it so it's it's really hard to tell but this is something totally different it doesn't exploit um the realization um vulnerabilities instead um an attacker could um uh inject uh a layer of neutron neurons inside the machine learning model uh which would uh totally change the machine uh uh learning Model Behavior so for example in a very simple terms uh U the attackers could uh modify a loan approval model to approve all the applications from people named John for example uh uh this is a bit more difficult to implement and like I said

the visibility into this kind of attacks is very very um narrow uh because those things are really really difficult to detect as well uh but it doesn't mean that they are not happening and uh we expect that in the future there's going to be some um automated tools and uh perhaps some uh dark web services that will uh will do exactly that for uh for anybody who is willing to pay um so uh like in the previous step we thought of a like an example scenario uh of model hijacking uh uh and uh in this scenario we uh decided not to train our own model because this takes uh quite a lot of time quite a lot of money and especially

if it's an llm or a generative AI model uh we decided to uh go and search for a model on a hugging phase uh that would fit our purpose uh but we didn't look into that model really closely we just said like oh it's from hugging phase it's probably quite trusted we can just go straight to deploying it across our um company end points and after a while we started seeing some worrying signs like um the performance of the end points was um degraded uh and uh even worse the company data started appearing somewhere on the dark web so what could have possibly gone wrong uh well yeah since we didn't really look into the uh the model uh too

closely uh we uh we happened to download a model that was either back doored or uh exploited in the serialization um through the model serialization so how can we prevent something like that happening well there are a few steps very basic steps very basic questions that we have to ask ourselves before we r or the pro any uh untrusted any any uh model that was downloaded from the internet so first of all have we downloaded from a trusted source and the hugging phase as much as we trust hugging phase as a company might not be always a a trusted Source because everybody can upload something to hugging phe so we have to also verify that the uploader of the model is a

trusted company uh and we have to verify that it's really the company that we think that post the repository belongs to the company that we think uh it belongs to because uh attackers can attackers can uh exploit uh um uh the fact that some companies for example don't have uh a hugging phase repository so they can create a a hugging phase repository with the name of the company with the logo of the company and uh uh pretend that those models belong to to this company uh I think hugging phas is started to verify users so we have to pay attentions if the user if the repository is verified uh then uh we also would like to know

know if the model was trained on a quality data because if the model is not trained on quality data it's not going to perform well um are there any ways that we can U uh check if the model has been tampered with since it was uploaded well in some cases yes some some of the um publisher some of the providers checksums of the models so this is something that we should be checking before we download the model uh then uh after we download the model there are additional couple of steps that we should do uh so we uh definitely should treat it as any kind of software that we download from the internet uh meaning we

should scan it for malware uh we should check the Integrity again uh we should uh check if the model uh works as uh as advertised and uh for the first time we should definitely run it in a soundbox so uh we can see if there is any error Behavior or any unexpected um unexpected Behavior Uh and after we deploy the model it's a good uh good um good thing to um to monitor it constantly and see if anything is changing if anything uh is going wrong as well uh so uh besides compromising the models uh the uh the attackers can also compromise uh the libraries that were used to create those models machine learning libraries machine learning

Frameworks any kind of machine learning tools and if they compromise the build they compromise the customers so uh there is there is an analogy here to the solar wind attack in which uh thousands of organizations were affected because the build was uh compromised um and um over the last uh year or so uh there were a lot of vulnerabilities this covered in machine learning Frameworks and doing that that's because machine learning is just such a big thing right now ai is everything that people are talking about so also researchers are looking into those vulnerabilities uh and trying to uh U yeah find find things in uh anything related to AI uh one of the first ones uh which is

uh also quite interesting for for a different reason uh is the shadow Ray vulnerability so is the vulner ability in the uh the ray framework which is a a framework for AI workloads uh it's quite popular framework and even big vendors like open AI are using it uh and Shadow Ray is a a vulnerability which uh is based on the lack of uh authorization uh in the shadow Ray uh job in the ray jobs API uh and uh why is it um well interesting or actually uh worrying uh is that this vulnerability is disputed meaning that uh the developer of Ray uh actually says that it's not a vulnerability at all and those uh models

those um this framework is not um supposed to be um open on the internet so users should not put it uh on the internet and then there is no problem because if it's just local then it doesn't have to have Authentication right so so that's the problem and the vulnerability is still there the developers say that it's not a vulnerability but thousands of publicly exposed race servers uh were found to be compromised so researchers from um oigo security uh found uh tens of thousands of already um exploited servers on the on the open internet so you can tell your customer not to use it doesn't mean that they will stick to it because yeah that's just shifting the blame to the

victim isn't it uh then we have clear ml that's another um open source framework for uh ml Ops uh also quite popular uh and in this framework uh our team found uh uh quite a number of vulnerabilities uh from um a code execution on pickle load to a directory traversal um improper uh Authentication uh then uh improper storage of credentials uh and uh um csrf and xss vulnerabilities as well and all those vulnerabilities put together constitute a full well allow for a full uh attack chain as uh colleagues from my team um described in one of our blog posts so if you're interested I um uh I'm rather redirecting you to to that blog post uh but it's not just

clearl as well there there are vulnerabilities being found in any uh kind of machine learning tooling uh recently also there were some critical vulnerabilities found in ml flow uh so yeah those those Frameworks are not invulnerable we really have to pay a much bigger attention to uh to all the vulnerabilities keep all our stuff updated and really change the the approach that it's a feature not a vulnerability because this is not going to work uh really well in the long run um another thing is that can be exped by um the attackers uh uh in supply chain attacks is a dependency compromise so we saw it uh happening um one year and a half ago December 2022 with torch Tron

uh attackers uploaded a malicious uh Library malicious package to um to U pipie and uh uh many people actually managed to download it and get compromised before uh it was uh discovered and removed from uh pipie uh and that package was just a simple Linux info stealer that was uh basically sending the um contents of the pass pass passwd file on Linux uh to um to the attackers uh along with a couple of other files was nothing super uh sophisticated but uh what we think is the attackers were just uh perhaps feeling the ground and uh trying to see how easy it is to bypass the uh to exploited dependency uh chain and the next time it can be something much much

more uh sophisticated uh and there is also another thing that was introduced with uh the dawn of the llms uh which we call P package confusion uh well we all know by now that llms like to hallucinate things sometimes and they also like to hallucinate uh U yeah imaginary uh packages software packages or libraries so if a developer ask an llm uh to provide a piece of code for example the llm will instruct the developer to download a certain package uh and in some cases we find that this package doesn't exist but the attackers could query llms check which names of packages appear often uh in hallucinations and then actually create a malicious package and uh upload it it to some repository

and how the user can know if it was hallucinated and then hijacked or it was there and the llm actually gave a uh a reliable answer so that's another kind of a problem um and uh with uh with this kind of um um exploitation uh we came up with another scenario um so we have a pre-existing um store of U data in uh an S3 bucket and our mlop stoling has access to that bucket then we train um our model using this uh data the model is deployed and exposed externally to the customer meaning that the customer can actually query the model and see the um see the predictions uh and after a while we are

seeing some uh some worrying signs as well uh what can we do uh well uh we should definitely treat uh all the machine learning Frameworks and uh apps uh as software as regular software and apply all the stuff that we apply to software we should apply also um the access uh control policies uh to our S3 bucket uh we should uh check the Integrity uh of the data uh that we are using to train the model uh we should um uh actually if our model is exposed externally to the customer it's uh much more prone to attacks like uh inference attacks which we were talking about in our previous presentation uh or model theft attacks so we need some uh checks

on on this level as well if the model is prone or we should have some checks if those attacks are being uh actually performed against our model um and definitely we should update regularly all the ml Ops software all the ml Frameworks that we have we we should check if there is a proper authentication uh in our Frameworks and if there any vulnerabilities discovered uh if there are any patches for those vulnerabilities and as previously we should monitor uh the behavior of uh of our model as well all right thanks Mara so yeah as as Mar shown the tooling we use can leave us in hot water um especially if we don't fully understand how it works under the

hood with things like that those authentication vulnerabilities um the next sections we don't have threat models in a similar vein but are definitely super important because they are part of the supply chain um one of these uh hugging face created a safe tensors conversion bot the safe tensors is like a Json um based file format for model serialization super safe what they decided to do was create a service to convert all insecure pie torch models into safe tensors um but what we were able to do earlier this year was show that we could compromise this service and um basically using malicious pie torch model compromise it so that we could basically uh hijack any model that was converted

through it um that that service has converted you know tens of thousands if not hundreds of thousands of models um we were able to basically exfiltrate the hugging face user token for the SF convert bot account and then we were able to essentially send malicious pull requests impersonating this legitimate service to any model on the site um and these pull requests were often approved with um no you know no contest or anything because it came from a legitimate service um so not it's not even it's not even about code execution here it's about the fact that we could ultimately replace the weights entirely of any model across major organizations and we saw you know SF convert bbot had

had send pull requests to anything from Google to Microsoft to other repositories so we'd encourage to to to verify that what comes what goes in is exactly what comes out um I know we're Titan times so I might speedrun the next few slides um but Adrien wood at Defcon AI Village last year demonstrated that organization confusion that Mara mentioned earlier um created a fake user account for a couple organizations and then had users from those organizations join and download models and uh won some some good bug bounties there in the process so you know be careful who you are downloading your models from and make sure that they are verified on hugging face um because yeah that's

that's definitely a worry and concern one of the really interesting ones is indirect prompt injection as a supply chain attack um and it's not as obvious but it's it's quite cool so indirect prompt injection is essentially where instead of like typing in a prompt injection into the chat or chat window we prompt inject something a model through maybe a website or a piece of audio or something like that um or an image for instance Um this can be done especially through things like multimodal prompt injection which we'll just cover briefly first multimodal model is a model that can take in you know any form of of of of data or different forms of data um

and they are subject to multimodal prompt injection meaning if we pass in an image it also has a great likelihood of prompt injecting a model so what happens if you have um a website which hosts some of these images such as what uh bagdasarian and nassie described at black at this year if you upload a let's say a compromised image somewhere place like Wikipedia and an AI enabled chat box browses that well then what happens if this accidentally prompt injects the model and will image-based adversarial examples be inv Vogue once again how do we threat model for this this is an open question um but you know really interesting when we think of a real AI

enabled World um I'll pass you back to Marta for to close us out for the final bits so we just have four minutes but it's just the boring stuff that is there so I'm going to speed through it so uh yeah on a positive note the our research and research of our fellow researchers from other companies as well uh is making ripples uh in the industry we can see that there are uh some uh steps in the right direction being taken uh caras for example uh doesn't uh by default doesn't allow uh the calization of lampda layers uh uh to avoid a code codee execution uh obviously uh hugging phase fixed the conversion as well so

right now it's uh not vulnerable anymore uh also hugging phase is implementing uh rudimentary uh malor scanner on their models which is really good uh well really good that they are doing it but it's it's it's just the first step um and then we have a lot of regulatory Frameworks and security Frameworks for AI uh miter Atlas is one of the biggest ones uh it's a based um uh it's a um based on the MIT attack framework which probably most of you are familiar with so it encompasses some tactics and techniques and case studies uh and uh mitigations for uh attacks that are specifically targeting AI systems uh then there is O who is doing a lot of

this space as well so we have top 10 risks for machine learning top 10 for llms OSBI exchange and this kind of initiative Ives uh nista the artificial intelligence risk management framework which is I think was one of the first that were published and also on the regulatory level things like the EU artificial intelligence act uh or uh the eso standardization um ISO standard 420001 uh what the thing that was mentioned already uh at this conference is AI bomb so the bill of materials uh or machine learning bomb ml bomb uh those those things are being actively looked at and developed as well as well as the AI security posture so things that we apply to traditional software

now they are being P ported to Encompass the AI space as well um and uh we are in uh um in the process of uh trying to set up some kind of model signing uh so uh digital signatures for uh machine learning models to prove the Integrity the provenance and the that the the models were not tampered with as well of as U um tracking the lineage of changes in machine learning learning model which is right now A really um important thing uh with all the fine-tuning of llm models and stuff like that uh and the takeaways really quickly so we have many places in the uh AI supply chain where uh the attackers can uh can strike so we

have uh data poisoning attacks at the dat data collection hijacked back door models at the model sourcing we have vulnerabilities in compromised and compromis packages in the ml Ops tooling and something that we didn't cover in this presentation prediction tempering after the model is deployed and how can we secure against it well we first have to take a step back and apply all the security measures uh that we apply to software also to the machine learning models that's that's the basics we have to build on top of it very a sophisticated solution that are targeted just for machine learning for for AI Solutions but beside before we do that or well during doing it we have to go

back a step back and apply all those things that we uh apparently forgot uh to apply before all right we're overtime but last year we said uh don't worry about Skynet it's in the future this year we have humanoid robots coming out from every major company we probably want to secure this stuff now than later because we'll end up in an ey robot or a silon situation so uh thank you so much everybody for coming today uh we really appreciate it and uh please do reach out to us if you have any questions

Insane in the Supply Chain: Threat modeling for attacks on AI systems

Related talks