GT - Incubated Machine Learning Exploits: Backdooring ML Pipelines Using Input-Handling Bugs

Name: GT - Incubated Machine Learning Exploits: Backdooring ML Pipelines Using Input-Handling Bugs
Uploaded: 2024-09-04
Duration: 35 min 9 s
Description: Ground Truth, Wed, Aug 7, 12:30 - Wed, Aug 7, 13:15 CDT Machine learning (ML) pipelines are vulnerable to model backdoors that compromise the integrity of the underlying system. Although many backdoor attacks limit the attack surface to the model, ML models are not standalone objects. Instead, they

BSides Las Vegas35:09167 viewsPublished 2024-09Watch on YouTube ↗

About this talk

Ground Truth, Wed, Aug 7, 12:30 - Wed, Aug 7, 13:15 CDT Machine learning (ML) pipelines are vulnerable to model backdoors that compromise the integrity of the underlying system. Although many backdoor attacks limit the attack surface to the model, ML models are not standalone objects. Instead, they are artifacts built using a wide range of tools and embedded into pipelines with many interacting components. In this talk, we introduce incubated ML exploits in which attackers inject model backdoors into ML pipelines using input-handling bugs in ML tools. Using a language-theoretic security (LangSec) framework, we systematically exploited ML model serialization bugs in popular tools to construct backdoors. In the process, we developed malicious artifacts such as polyglot and ambiguous files using ML model files. We also contributed to Fickling, a pickle security tool tailored for ML use cases. Finally, we formulated a set of guidelines for security researchers and ML practitioners. By chaining system security issues and model vulnerabilities, incubated ML exploits emerge as a new class of exploits that highlight the importance of a holistic approach to ML security. People Suha Sabi Hussain

Show transcript [en]

hey everyone I'm sua I'm really really excited to be here today and with that let's get started so I want to talk to you about ml security specifically I want to talk about this new class of exploits I identified called incubated ml exploits which combine back doors and input handling bugs don't worry if you don't know too much about ml or ml security I'll explain all the important stuff as we go along so uh who am I and why am I even talking to you today uh I'm an engineer at trail of bits where I focus on AI and ml security I've been in the field for a few years now I graduated from Georgia

Tech and I'm originally from Queens uh outside of work I like Brazilian jiu-jitsu trying new restaurants making things and an obscure card game called Cub birds so it's becoming pretty clear that with ML and AI popping up everywhere people are figuring out how to trick these systems based on how these models work maybe you've seen someone use prompt injection to convince a chatbot to give them a refund or maybe you've seen this story of protesters tricking self-driving cars with traffic cones notice the fact that this trick is rooted in an understanding of the training data for these models so how can we actually construct our own useful exploits against ml systems let's play a game of pretend

real quick you're a college student and you really really want the prize money for a robotics competition so naturally you decide to sabotage another team uh side note I do not condone this Behavior I don't recommend it I've never done it myself uh but anyway the competition requires teams to build a tiny autonomous vehicle that uses a specific pre-trained model and stops at stop signs so you go ahead and you find out that some of these stop signs have stickers on them and you also find some flaws with how they've stored and distributed the model now uh that by the way isn't out of the question ml artifact are often shared widely without any meaningful or substantial trust

mechanisms so you go ahead you decide to grab that file and inject a model back door in it using a file format RC rce of some kind and then you put it back then on the day of the competition you sit back and you watch as your competitor's vehicle just plows through and ignores any stop sign with a sticker on it what you just did is execute an incubated ml exploit which is what my talk is all about obviously the stakes of this story is just a lost competition but the notion of attacking a real autonomous vehicle is a Hallmark of model backdoor research as you can see with the image on the left so I'll let you use your own

imagination to raise the stakes so first I'm going to tell you about this framework I've been using to bridge the gap between model and system security because we can't continue to treat models as Standalone objects next I'll tell you about these input handling bugs I found in model serialization and connect them to back doors I'll do that by taking a page out of this subfield called LC or language theoretic security and so effectively I'm going to be going through a bunch of examples of incubated ml exploits and use LC to organize them but first I need to explain some stuff what even is a model vulnerability or an or an ml back door

so uh super briefly you can think of ml models like these squishy flexible sequences of linear algebra operations that are trained on tons and tons of data uh there's a popular saying all models are wrong but some are useful it's just saying that these models aren't perfect there are many different ways that models can mess up or get tripped up by something that might be unexpected to us and that's the basis of these model vulnerabilities while popular examples of model vulnerabilities include model inversion and membership inference we're zooming in on one specific type back doors so to be precise about it a backdoor attack allows a malicious actor to force an ml model to produce specific

outputs given specific inputs now there's a couple of things that I think make back doors really really interesting to study so you can use them as Primitives for other model vulnerabilities like membership inference you can also identify pre-existing quote unquote natural back doors in models and there's also some pretty strong evidence that suggest that this is an inherent threat to ml systems now while there's a lot of awesome research out there on ML model attacks they can actually be pretty hard to exploit in the real world with some exceptions of course and while there are multiple reasons for it one thing that really sticks out to me is the gap between research and the real world

so for the most part many attacks and subsequently attack Frameworks and tools restrict their analysis to this formulation an ml model receives an input and produces an output but this isn't an accurate representation of what an ml system actually looks like there's so much more going on in practice so here is a software architecture diagram for an ml system reviewed by trits recently this is a system that uses the as asro tool for Rag and I've circled where the model actually is in the photo do you see what I mean we need to be looking at all of this holistically there's a large and evolving landscape of tools being used in and for ML systems and that brings me

to the exploit framework so the title of my talk clearly references an incubated ml exploit but there's a larger category of exploits that I'd like to that are important to think about first specifically a hybrid ml exploit chains a system security issue with a model vulnerability so if you look at the diagram you see that the arrow here is by directional uh so this can go in either direction a model vulnerability can expose a system security issue or a system security issue could be used to exploit a model vulnerability now this part's pretty important the big issue I see with how ml security is done nowadays is that model security and system security are often treated separately but what what I

need you to understand is that if you only know model security you're missing a big piece and if you only cover system security you're still missing a big piece and you can't be treating these two processes as completely independent you're then entirely ignoring the potential for hybrid ml exploits this is an emergent property your model is embedded in a system that is going to interact with all of your different system components in new and exploitable ways so one thing you'll notice is that there's a lot of screenshots of paper titles on this slide that's because there have been specific instances of hybrid ml exploits in literature and in practice they're just not explicitly called that so exploitable software

gadgets have been used for back doors the summoning demons paper at the top chained model evasion with memory corruption and the learn system security paper next to it includes an example of a poisoning attack that causes an exponential memory blowup in an index structure but the ml security literature Frameworks and tools at the very least the ones I'm familiar with are largely limited to just that specific instances or implications what I'm trying to do here what I want to be doing here is treating this interaction explicitly and systematically which is why I made this framework so one kind of system security issue is an input handling bug and one kind of model vulnerability is a model

back door put that together and that's how we get an incubated ml exploit uh which is a type of hybrid ml exploit where an attacker uses an input handling bug to inject a back door so I made this diagram to make the distinction between the two uh a lot clearer and here it is again I'm going to be leaving the framework here for now uh we did end up going into a more formal model of exploitation especially like a schema for incubated ML exploits but we we'll return to these ideas later so to back door a pre-existing model the attacker should be able to change the parameters of the model or its architecture at the level of

abstraction we're dealing with we can put input and component manipulation on the side for now but how this actually plays out can vary a lot so sometimes the attacker has control over some element of the training process and they use that to sneak in some manipulated data that will change the model's parameters which is often called Data poisoning or maybe they go a step further and Fiddle with the source code somehow to to change the architecture now before we dive into exploits I want to explain a few things about input handling bugs so an ml model is stored as a file and to process these models you need parsers and parser parsing these files into objects and back is deserialization

and serialization but wait quoting an albertini here a file has no intrinsic meaning the meaning of a file its type its validity its contents can be different for each parser or interpreter this is the reason we can make potentially malicious file artifacts like polyglots and ambiguous files which I'll talk a bit more about later so I'm focused very specifically on bugs that occur when you parse ml model files there's of course also interesting bugs in other parts of the pipeline but I'm picking ml model files for several reasons the first is very very obvious the most important I think it's fun but more seriously the security of ml file formats have become increasingly important ml has fostered this culture

of sharing these artifacts without sufficient validation real malicious models have been found on the hugging face hub for example and there's also just a ton and ton of ml file formats out there I've tried to list and organize these in the repository listed in the middle uh but what's really important to take away is that there's a large set of possibilities for these exploits also just fun hacks with these formats and there's already a lot of great work on this in this area as shown on the slide so file format tricks are within the realm of LC but this field actually thinks more abstractly about inputs as a general class LC applies formal language Theory to system security it focuses on

exploring input handling bugs also called parser problems as this big root cause for security issues after all lots of impactful vulnerabilities like heart bleed and Android master key have been parser bugs now while I like formal language Theory uh this talk isn't theoretical computer science 101 so uh what I want you to know is that fundamentally what Lang is saying is hey let's treat all the inputs as a specific language and then make our code just capable enough to understand that language properly so our work is centered around a specific taxonomy of input handling bugs uh here are all the different bug classes there are eight different types now quick note these categories aren't completely distinct from each other uh

the one you choose comes from a root cause analysis so with the exception of one I'm going to show you multiple examples of each in ml tools and use them to construct a back door uh so I once again in order to show that input handling bugs are an attack Vector I identified ml model serialization issues across these different bug classes and built back doors out of them so now we can dive into the most fun part the exploits uh specifically for the sake of time I'm going to focus more on the useful gadgets that actually arise in these situations so these are some characters that play important roles in the ml ecosystem that can help us understand

the impact of these exploits better first up we have Alice Alice distributes models she takes open- source llms and find tun fine tunes them the models she distributes are what everyone else in our story is going to be using Bob is a Frontline user who's directly using Alice's models in his own life maybe through a chat interface uh there's Dave Dave is an engineer who's integrating these models into products Frank is the end user who's relying on Dave's products he might be unaware that there's ml models behind the scenes

[Applause] uh last we have Chuck Chuck is the attacker uh he's looking to exploit vulnerabilities in the models and disrupt everyone's work our Focus will be on how Chuck can impact Bob and Dave here so I'll show some exploits involving the file formats associated with pickle uh pie torch torch script Onex and safe tensors so this first category is called non-minimalist input handling code it sounds a little fancy but all it means is that the code used to check and parse the inputs is too complex so an attacker can potentially grab the necessary gadgets for their exploits sorry uh this case uh this case is relatively common uh pickling is a serialization method that allows you to

save arbitrary object uh and pickling is very very common in the ml ecosystem uh there's no way to understate that um so uh recent uh there's no way to overstate that excuse me so recently my cooworker buan milanov led the development of sleepy pickle which is an incubated ml exploit and what it does is that it chains pickle pickle rce with model back doors so on the right you can see an llm that has been back door to fish users there's also examples of an llm being back door to spread misinformation and even steal user data in the blog post now what's really cool about this exploit is that it can happen on the fly

so there's far more room and possibilities for an attacker than just uploading a malicious model so what do I mean when I say pickle RC python pickles are compiled programs that run in a unique virtual machine called the pickle machine or the PM for short and what the PM does is that interprets a sequence of op codes in the pickle file to construct an arbitrarily complex python object um but it has two op codes Global and reduced that can execute arbitrary code outside of the PM which makes it possible to construct malicious pickle data and the underlying reason here is that the PM is more complex than something that's only parsing ml model should actually

be so way way back in 2021 we released this tool called pickling this project was led by Evan sultanic so to our knowledge Fickling was the first pickle security tool tailored for ML use cases it's a decompiler static analyzer and bik code rewriter for the python pickle module uh it can help you detect analyze or create malicious pickle files now the reason it's safe to run on potentially malicious files is because it has its own implementation of the PM uh on which it uh on which it symbolically executes code so I also added a pytorch module to it relatively recently so that you can statically analyze and inject code into pytorch files as well but uh moving

forward pickles are clearly an issue for Bob if Alice is Distributing models as pickle files or pyos files that makes it that much easier easier for Chuck to inject a back door with a pickle RC now on to the next class this term just means you shouldn't try to correct and valid input reject it altogether uh it's been uh sometimes referred to as the anti-rust robustness principle so to mitigate the issues with pickling many developers write these things called restricted unpar which are subclasses of unpicker that enforce an allow list or a block list but the thing is these actually aren't that hard to bypass uh there's this methodology called pain pickle that demonstrates how

to automatically bypass restricted imp Pickers which can enable arbitrary code execution and that therefore backd door attacks so they identified eight different types Pickers and three strategies that work against the vast majority of them so much like pickle was a problem for Bob restricting restricted on pickling bypasses is bad for date if he's relying on them in some fashion in his product so now we can talk about parser differentials so this happens when different parsers in a system read the same input but interpret it differently so when two parsers are interpreting the same file in different ways that file is known as an ambiguous file so this is a pretty common exploit technique it's really good for bypasses but it means

you can create an ml model file that is benign for one system or one system component but back doored for another uh there's some more implications here for ML system exploitation more broadly but we'll talk about that later but uh quick note whether or not this is impactful all depends on your system right so this is where threat modeling comes in handy so we were able to create two differential proof of concepts with torch script torch script is a popular format to store ml models in for a bunch of reasons but mostly performance and portability but you can make a parser differential with it and chain it to an architectural back door that's because you can turn a pie torch model into a

Tor script one through tracing or scripting and tracing doesn't incorporate Dynamic control flow so all you have to do is represent the malicious components for the back door through Dynamic control flow so the second example we found was during an audit of YOLO so last year my team and I audited this open- Source uh codebase for computer vision called YOLO V7 and what they did is they released standard versions of their model and Tor scripted versions for deployment so we noticed many cases where tracing didn't capture the model accurately after serialization and deserialization key info was lost and the usual pytorch warnings didn't show up so to spot this differential we use the torch grip automatic Trace checker

torch effects and the torch script IR but with what we found we created an input that made the two versions of the model act differently effectively a back door attack uh so once again this is a problem for Bob he's getting a fundamentally different model that than the one Alice trained which breaks any pre-existing uh promises so we also identified a parser differential with safe tensors safe tensors is another file format for ML models that was developed specifically in response to the insecurity of pickling so last year I was on an audit of the safe tensor Library where we identified the inclusion of Json in the file format as a source of parser differentials now Json is pretty well

known to be under specified there's a lot of exploits especially in the web security world that leveraged this uh but the thing is the reference safe tensor implementation uses the serd parser which is strict and rejects duplicate keys but a lot of external tools use the python built-in Json parser which doesn't so you can use a duplicate key for the offsets to append back doored weights and create manipulated safe tensor files so these files are rejected by the reference implementation but accepted by external parsers quick note it has to be a weight space back door because weights and architecture are stored separately here uh there's some more details in caveats regarding exploitability but just know

that the safe tensor pars differential is more impactful for Dave he needs to be making sure that there's consensus there's agreement between the parsers and product if his tool is using a more permissive safe tensor parser than the reference implementation it could accept manipulated safe tensor files that actually carry backboard models so one big part of my research is analyzing previous works and noticing Trends and I don't want to get too into the weeds with this because I'd like to save formalisms for accompanying materials but one thing that became clear is from parser differentials we get these things called model differentials instances where the same model is interpreted differently and as expected the attacks are dependent on

the supply chain component and the life cycle stage so in an ml system you can pre-process inputs and you can also apply modeled Transformations before you deploy a model so some Studies have exploited parser differentials right at the pre-processing stage so things like image scaling or Unicode parsing um those attacks often change the weights there have also been back door attacks that take advantage of differences during model Transformations like compilation and quantizations those frequently change the architecture I think it's very possible that most Transformations that can be encoded within the loss function can result in an exploitable back door but uh let's move forward from here next up is shotgun parsing this is just what happens when you don't fully

and properly check your input before beginning to process it so let's talk about po lot files which are files that can be validly interpreted as two or more different formats um they're a personal uh favorite Rabbit Hole of mine but uh polyglot files have been utilized to distribute malware bypass code signing checks and enable other malicious behaviors but with regards to ml model serialization these can be placed in model hubs to confuse Downstream consumers uh but even more importantly two different ml pipelines could interpret the same same file as two different models so you can smuggle in a back door model with the benign one so during our audit of the safe tensor Library we were able to make

multiple polyglots these include zip PDF TF records Caris native and later on Pyar and the safe tensor audit report itself was a PDF zip polyglot with a zip file containing all of the polyglots we made during the audit so you can just slap on a weight space back door model to in one of these formats to benign model in safe tensors so you open it up with safe tensors everything's good everything's fine load it up with pych M or some other system and boom You've Got Your Back Door big problem for uh folks like Dave here because uh now you've got malicious model sneaking in with benign ones the O overall reason this is possible is because of a missing check

specifically the program didn't check whether uh the start and end offsets corresponded with a tensor size so attacker could have pen arbitrary data to a file and that when combined with the ability to change the header size expanded the number of polyglots this issue has since been fixed with safe tensors however uh important note so our next category is incomplete protocol specification just think of it as under specification for now uh while there's multiple uh examples of this in the literature we'll just focus on pytorch polyglots for now so many are unaware that pytorch actually supports multiple file formats some are deprecated but are still supported by external parsers and one big issue is that there's a lack of

consistent versioning here so that means you can create polyglots of files that can be validly interpreted as different pytorch file formats uh also you can create ambiguous files so you can add three files to polym Mo uh version 1.3 and Tor script 1.4 another bigger issue is the Reliance on zip and pickle here so pickle is a streaming file format that ends once it reaches the stop op code which means when you're parsing it any data after that stop op code is fair game another thing is most zip parsers don't enforce the magic at the start like the pytorch mar so you can append a zip to a pickle file to create a zip pickle polyglot which gives you some

good pytorch polyglots uh Fickling now has a polyl module so you can differentiate identify and create um lots for the different pytorch file formats uh now on to the next class this one just means that your input should be simple and well defi well defined so you can check it thoroughly take onx onx is a protuff based way to store ml models adelyn Travers discovered a neat hack for onx that he packaged into a tool called botomy uh so ml run times and Frameworks of often let you add custom operators to a model on the Fly and the language used for Onex runtime custom Ops is complex so even though the official specification disallowed side effects in

the onx runtime arbitrary code could be encapsulated in a custom op and you can use that to launch an architectural backd door attack just like pickle this is bad news for Bob so to recap Bob our direct consumer was affected by the pickle onx and tor script exploits uh Dave on the other hand was impacted by the py orge safe tensors and restricted on pickling issues now what a lot of people Miss is how important and how complex the ml stack is the model you choose changes the Technologies in the stack so when I'm assessing a system or doing some sort of vulnerability research I'm always trying to think about what layer of the ml stack I'm dealing with uh so

the layers listed are Hardware infrastructure lowlevel compiler High Lev framework model and knowledge and I told you about a whole bunch of exploits just now of the ones I told you about the restricted un Pickler onx runtime uh and pickling proof of concepts are the issues that are exposed and impactful at the framework level uh the Tor script differential corresponds to the compiler level and the safe tensors and pyos polyglot issues correspond to the infrastructure level and this is just a starting point we're going to be seeing exploits up and down the stack that impact ml systems so if you want to get into attacking ml systems now this is a solid place to begin are you really good

at breaking Hardware go take a look at a TPU do you happen to know a lot about distributed system security go write some uh hybrid ml exploits at the infrastructure level so I made this schema for incubated ML exploits this is just one piece of a more formal model of exploitation um I'll talk about this just at a very high level to shed some light on the terrain here so if you want to pull off an incubated ml exploit you need a right primitive for the weights or the architecture and the proof of Concepts point to some additional capabilities um well uh side note you probably w't read Primitives as well uh but with the safe tensor parser

differential you saw that access to a metadata could enable both types of back doors um there's a lot of utility in exploiting model Transformations and model differentials with that you can construct exploits at different stages of the pipeline with existing procedures uh differentials are also pretty useful for an attacker they're localized to their stage and with on an X it became kind of obvious that you can maliciously use custom Ops in serialization formats and maybe even in places like compiler dialects I'll release more details and accompanying materials but I do want to make some more explicit wrecks apologies for the busy slide here I think models should be checked for integrity and their metadata should be well parsed we

want good trust mechanisms and we want robust validation we also want to minimize complexity so we should be avoiding custom operators and separating weights and architecture storage um and I also think we should follow uh the be following the recommended practices for file formats more so you should have versions and check sums and Magic signatures you should enforce your signature at offset Z and we really need to be investing in robust specifications and tools so I'm really hoping that we can see more work on hybrid ml exploits and incubated ml exploits I want to see them addressed by more Frameworks and tools I'd love to see this framework evolved and also be applied to specific ml tools

and context as well as more uh more bug classes and more model vulnerabilities I'd also like to see more investigations of exploit persistence reliability and defense um I also just think more generally there's a lot of interesting work that can be done in ml infrastructure security with differentials and file formats and specifications and even reverse engineering but before we finish I want to tell you what helps me identify and make progress on ML security problems um and that's understanding the two root causes first we're building all of these new systems for ML new hardware new programming languages new compilers new file formats there are conferences dedicated just to new and creative ways to design ml infrastructure um and that

means these new systems are introducing new attack surfaces and it's also becoming increasingly clear that the stack and supply chain have not been subject to sufficient review that's why we're seeing pickles everywhere right second simply placing an ml model into a program introduces all of these new vulnerabilities that stem from how your model is interacting with different components machine learning is not a quick add-on but something that can fundamentally change the security posture of your system so I hope you leave this talk knowing that we need to concurrently and holistically think about system security and model security uh I recommend checking out the full audit reports for safe tensors and YOLO as well as the

blog posts on Fickling and the file formats repository I'll post more details and accompanying materials we're hoping to release a paper on this topic as well uh you can find my contact info on my website or send me a message on Twitter uh but thank you for listening um now I can answer any questions [Applause] [Music] now you are you aware of any thank you uh are you aware of any tools that as a Defender we can use to audit our models periodically automate the model out it to identify vulnerabilities in it uh so the question if I understand correctly is are there good defense tools for ML security uh so we're uh hoping to

develop pickling such that it becomes a good detection tool as well um so it was originally designed for reverse engineering and offense um we I think it's very useful for incident response folks to uh be able to look at a pickle file reverse engineer it and see if that's potential cause um there's uh I'm more fan of the secure by default strategy uh so I always tell people don't use pickle use safe tensors instead um you should have good trust mechanisms check sums things like that um but yeah I think it's uh a very Green Field area uh so there's a lot of ongoing work in this and I think we're learning more about it as it goes as it

go uh as it moves forward so I'm interesting in seeing what comes up any other questions

what would be a good way to convince uh the higher ups in a company to invest time and money and resources in something like uh let's get like no more pickle in the code base let's get rid of pickle uh do you have examples of I'm thinking of of I know like the solar winds exploit or LinkedIn got like there's always a a thing like if we invest in teaching people how to detect fishing attacks and social engineering that would prevent this have any of these machine learning exploits been used in the wild could is there any examples we could point to and say look it cost this company $10 million specifically because of pickle or is

that something that maybe in the future and just not yet because this stuff is so new so there have been pickle cves that's uh a thing I've seen um with let's see if I have the slide for it I don't know of any that can give you a concrete dollar amount uh which is uh difficult um if someone knows feel free to jump in uh but sleepy pickle is um there's also a followup called sticky pickle that does uh showcase how this can be a stitious and dynamic attack um that's one example I know uh whiz has another example of it being used for cross tenant vs in Cloud security um I don't have a slide for that apologies

but um there's also a let me find the slide for it sorry lots of slides

here uh that one I'm just going to point to it this one is from jfrog and is about how they used Fickling to find real malicious ml models um on hugging face Hub uh and uh yeah uh that's as far as I know about it if anyone has once again if anyone has like a much uh like J frog and whiz I'll write those down and look them up later okay thank you got any more questions give it up for sua thank you

GT - Incubated Machine Learning Exploits: Backdooring ML Pipelines Using Input-Handling Bugs

Related talks