← All talks

BSidesCharm 2024 - Securing generative AI: threats, old and new

BSides Charm25:5657 viewsPublished 2024-06Watch on YouTube ↗
About this talk
As we move closer towards generative AI becoming widely adopted, it’s important to understand the security implications, how they differ from more traditional cybersecurity, and where we can apply existing approaches to new systems and applications. This talk introduces AI security concepts, with a focus on LLMs, to equip participants with an understanding of the security landscape surrounding AI. Presenter: Adam Swanda Adam Swanda is a threat researcher with over 10 years working in cybersecurity, largely focusing on tactical and strategic threat intelligence. Adam is currently working as an AI Security Researcher at Robust Intelligence. He recently released the open source project “Vigil”, a Python library implementing various LLM defense measures for prompt injection and jailbreak detection.
Show transcript [en]

[Music]

[Music]

uh today we're going to be talking about securing generative AI uh specifically focusing on llm kind of saving some of the other stuff for hopeful future talks uh so let's Dive Right In only 20 minutes and there's a lot of material to go over uh I'll make the slides available afterwards as well so you know people can have that uh quick background on myself uh my name is Adam swanda I'm currently an AI security researcher at robust intelligence uh so there we do a lot of model evaluations uh creating fenses for different adversarial techniques LM red teaming uh things like that and I've been in the security industry for about 12 years uh primarily coming from threat intelligence and

security operations uh just left Ro uh Splunk for robust actually so spent the last like five or six years there and uh some of you might be familiar with this open- source project I created called vigil which was and attempt at kind of combining a bunch of different detection techniques for jailbreaks and prompt injections uh so that's open source on GitHub if you guys want to check that out and so when we say generative AI what is it that we're talking about exactly uh so it's a type of machine learning that creates new content by learning patterns from being trained on large amounts of of data and then it applies those patterns over a probability distribution so you're

getting you know the next most likely token for an input um and when you see you know an llm output behind the scenes there are actually lots of other tokens and texts that could have been chosen and the probability of those and you know the provider is just selecting the most likely token for you and everyone you know is probably familiar with the text models uh open AIS GP T4 anthropic Claude uh those are the large language models and then there's also image and Audio models as well so things like mid Journey uh sunno is a popular audio one or udio is one that came out recently and kind of putting the threats into perspective here they're I've

mainly been thinking about them in two categories uh Safety and Security uh you know the safety side of things relates to making sure that you don't have bias coming out of your models uh you can prevent toxic harmful output uh you know avoid hallucinations things like that while the security side of things that we'll be focusing on today are you know the more kind of traditional attacks adversarial attacks uh insecured design supply chain compromise uh things like that and this isn't a allinclusive list either I just ran out of room on the slide deck so what do these threats kind of look like in the Enterprise right now um this is a diagram of kind of an archetypical

uh pre and post uh uh like preep predeployment and post deployment pipeline for AI um on the lower right hand side you can see where it starts with you know potentially using an open- Source model uh like something from hugging face or or llama 2 uh you know ingesting public data and then deploying that model into an application and I know there's a lot of information on the slide here but the important part to take away from this I think is that each of these components introduces additional risk um there is no kind of one you know solution to fix all of these problems and you know you do have to worry about the kind of AI specific

threats as as well as all of the traditional application security you know architecture engineering issues that you uh normally would as well and just as an example that some folks might be familiar with uh prompt injection uh is a specific type of attack that can Target llms and the goal there is to override the instructions that the model or application has with your own instruction uh so say the application is designed to only answer questions about uh taxes for example you could you know provide a prompt that told the model to disregard those instructions and follow your own instead you know maybe write you some python code or or or something else um so that's kind of one of the big you

know risks that people hear about prompt injection and jailbreaks uh and with uh retrieval augmented generation which is pulling data from external sources into the context of your llm you also introduce the risk of indirect indirect injection where those data sources could have malicious content that your LM is now processing um so again just kind of a high Lev view here and how does AI security differ from kind of traditional security security that we're all used to um well in traditional security you know I think we're very blessed to have lots of logging and monitoring you know everything is emitting logs um attacks are largely deterministic so if you you know use a particular exploit against a

particular software version you know that same exploit will very likely work again on that same version uh this is not NE necessarily the case with a I as the models themselves aren't deterministic so you're not getting the same output for an input you know a jailbreak attack that works for me might not work for you but if you try it again you know it it could um and yeah like the I think another thing to point out here is that threat actor interest in AI is still emerging and I think kind of less understood than uh you know tradition software um there isn't a lot of motivation for thread actors to Target generative AI systems yet as

they're not I don't think they're in front of things that they care about you know um like actors that want to do corporate Espionage or spread malware are still probably going to do the standard methods until they need to change their techniques and and Target AI but we are seeing uh a wide adoption of of these types of applications and there is some targeting of it already so I assume that that will continue and the actor interest is kind of broken up into two camps as well uh that is the malicious use of AI so threat actors you know using chat GPT to assist their operations and then targeting AI infrastructure data uh things of that

nature and the fun part about AI security is you have to worry about you know everything on the right hand side while also worrying about everything on the left hand side as well so that's that's fun uh and here just to kind of show some more examples some of those might be a little hard to read but it's broken up with uh you know AI specific threats on the right and uh more traditional security threats on the left on the left hand side we have some examples of uh you know software vulnerabilities and applications to excuse me either deploy or develop models and uh so you know we still have all of those same traditional

vulnerabilities to worry about as well right if you're if you have infrastructure that's serving models you know those have to be secured if you are developing a chatbot application you know the python library for that chatbot you know is also potentially going to introduce vulnerabilities and then uh in the middle here we have a group that I think kind of straddles both which is uh uh you know examples of of traditional attacks uh that kind of Target AI system specifically um whether it's leaking you know shared memory from a GPU or uh targeting data scientists with with model back doors and then over on the right uh up top actually that's a technique that robust

developed to algorithmically jailbreak models uh and then on the bottom is a recent publication for Microsoft uh they published alongside open Ai and this was about various AP actors you know using chat GPT similar to how most of us use it for for productivity purposes so again it's kind of twofold you know just want to uh uh make that distinction that there is kind of the AI side of things there's kind of the traditional side of things and stuff that falls somewhere in the middle uh so what can we do about these new risks uh a few things you know the I I think an important part is to kind of get back to basics with application

design uh that's you know doing things like threat modeling you know making sure you know where your boundaries are and how data is being passed back and forth uh sanitizing user input and output from the LM um ideally having human in the loop where you know you have to approve actions that the AI is making so it can't you know uh uh autonomously add records to your database or or something like that uh at the same time there is validation which is kind of testing the models and the data sets you know through the whole life cycle of deployment uh so this is things like model evaluations where you are basically throwing a bunch of data at

the model to see how it responds and then evaluating those responses whether for you know truthfulness or accuracy to some to some Source document and we do a lot of this at robust you know we do this for customers we do this for open source models and yeah it's it's a little tricky to get right to you know properly evaluate those validations but if you can get it done in an automated way you know you can identify failure modes and different vulnerabilities early on and kind of plan for that further down your your development process um and yeah once you've performed validation and you know you know what threats your model is vulnerable to uh you can Implement some

continuous monitor monitoring to identify and protect those uh assets and um yeah you know this is largely going to be watching like llm model input and output uh the systems the models and themselves aren't very verbose there's not a lot of logging coming out of these so we're kind of limited to that input and output um and yeah threat intelligence I think is very important to stay on top of that as well uh threat intelligence for AI security right now is a bit different from traditional security 2 there's not a lot of reporting on you know in the wild threats a lot of the stuff is going down on Twitter with researchers finding new techniques or in

Discord channels where jailbreaks are being shared um and lots and lots of academic work um you know I read far too many archive.org papers to count and that's where you know I get the bulk of my uh like Intel so to speak on on new techniques that we should watch out for so some changes there too but I expect as you know these applications get deployed more we'll start seeing you know more kind of traditional threat reporting uh and we are also starting to see some emerging Frameworks uh come out around AI security uh specifically the nist adversarial machine learning document which is a big taxonomy of all the threats that I have previously mentioned and several more uh it's a

very good document we at robust uh uh co-authored it so I don't want to to my own horn too much but I'm I'm fond of it um and then the uh oos top 10 for LM applications is a really really great Community effort um they are actually they have voting out right now for the V two of the top 10 so if you have you know looked at this document already and you feel strongly that some certain threat should be represented over another you want to you know add your input to definitions um up on their website you can join their slack you can fill out their surveys contribute it's it's a really great group of people and

miter Atlas as well uh miter Atlas you know has a lot of the same information as the NIS document and the oos document but it is in the context of attackers using those behaviors while the other two documents are more taxonomies of risk um and yeah uh I'm I'm quite partial to miter Atlas I try to reference it I think it kind of adds a little like legitimacy to the reporting on AI security just because miter attack is so widely adopted and so useful you know I do think that Atlas can be that for for AI uh and lastly the data bricks AI security framework very very detail document which you know has similar taxonomy information but also

information about kind of common design patterns and different controls you could Implement uh at the different stages and so looking forward what do I think uh is kind of future of AI and and security uh I think you know it's definitely going to bring some changes to the threat landscape um attacks against generative AI models and the applications will will very likely increase as adoption does right now it's a lot of opportunistic attackers uh you know just like seeing if they can cause some reputational damage by making your customer support bot say something racist uh I think that's going to change in the future as these are used for more use cases that involve kind of the reasoning

capabilities of the applications uh I expect this is already happening but I don't work at open AI so I don't know but I I assume there's lots of intellectual property theft aimed at these state-of-the-art models uh that is probably already happening and another another kind of trend I've seen lately is the introd uction of autonomous agents which are LM applications that can plan and take uh multistep actions to uh meet some kind of goal you give it essentially and uh they're they're really useful and it looks like you know they could be really powerful but at the same time they introduce a lot of risk as for an agent to be effective you know it would need to be connected to

external systems data source uh and be able to perform actions against those so it gets really messy uh threat modeling that stuff and yeah ml supply chain attacks uh I think will become more common we're already starting to see a bit of that with one of the articles I had posted on the earlier slide uh where some models that are hosted on hugging face you know had some uh pretty basic pickle back doors added to them but you know they're still they're still there um I think these will just get more creative um and so you know that's all the scary stuff uh so what does this mean for us and kind of what can we do about it um

you know I think one thing you know if you're working security at your Enterprise right now it's really important to bring the AI and ml teams you know into the security fold um what are they doing what infrastructure do they have out there where are they getting data sets from uh are they using secure development practices uh I've been very surprised at you know past employers speaking to the ml team and you know asking them hey are you considering this particular security risk or the safety risk and it's you know not even on their radar they're just kind of working on the day in and day out stuff um and I think it really needs to be security that is at least

partly you know driving that and bringing that to their attention like getting their Buy in and hopefully you know through monitoring we can kind of alleviate some of the difficulties that they might have with deployment uh and so that means you know if they're developing models you want to be doing a a model validation or red teaming on it any mlops infrastructure ideally you should be having you know those emit logs and you want to monitor those for attacks uh secur design practices for applications uh and like I said before you know anything that is deployed you want some kind of continuous monitoring in front of it uh not just kind of the standard monitoring but looking for

adversarial inputs as well um you know one case or or one type of attack uh in particular uh uh availability attacks for example there are you know certain attacks you can send an input to an llm where the result of the attack isn't like a harmful output you get back necessarily but it is actually the model using more energy to generate a response right so like filtering that input is important for pres for for preventing attacks it's not just the output uh and community-driven security um what we're all doing here I think you know the cyber security community has a really great uh you know history and kind of community built around it you know we're all here for one and I think

we need to keep that going with AI um I already mentioned the contribution opportunities with oos uh but Defcon AI Village they just opened their call for papers I think earlier today uh camless is another good one if you guys want to get involved in that or you know attend any of the talks and again miter Atlas and the lve repository are good as well so I encourage you to you know investigate these resources um I know a lot of this is high level but I'm hoping that this is more an introductory for you know people who aren't familiar with this topic and just kind of want to learn more and we'll take some of these

Concepts and you know go do uh deep Dives when they get back home so I think that's all I have time for uh thank you if any anyone has any questions find me

after

yes yeah yeah you'd want to strip pii and stuff before it goes to yeah um exactly there's there's an open- Source tool that Microsoft makes uh a PR uh presido analyzer yeah yeah it's really good for handling like pii um like sanitizing input and stuff um yeah I wonder the same thing I'm surprised that so many people are using gp4 models in production cu that's just another place to worry about your data going you know um if if you can use an open source model that does the same thing I I would probably recommend that just generally

yeah or if you're aware

ofel yeah um I think it's a bit of both right now there's so like my company for example we have kind of a we have like a firewall that we put in front that does some monitoring but there's also more like the observability side of things as well um so yeah it's it it is very early now I mean there are some open source uh stuff like I'm not sure if you're familiar with like open Telemetry um there's a llm plugin for open Telemetry so you can get you know LM data into Splunk or something like that um there's I would plug my own application but I guess it doesn't log now that I think about

it thank you so much yeah of course a lot talk

[Music] I don't know to be honest uh yeah the data sets around some of especially the bigger models it gets a little iffy um if they don't release the data sets it's kind of hard to vouch for them um yeah I don't know that we figured that one out

yet um I mean I guess it depends if someone's already building it now but you know I I could see something like you know

productionizing l to help write uh to help write some kind of like threat reports and stuff like that and and you know using it for more or less full report generation from uh vulnerability scan results so it is definitely possible it's just that I think working with these models they're very you know finicky it's like you have to spend a lot of time kind of orchestrating controls around it to make sure you get like structured output all time um but yeah no I I I I would not be surprised if we you know saw that in the near future uh cool I think that's it uh any scale is good oh oh self-hosted like locally locally yeah uh

o Lama is really good um they kind of have like Docker files almost that def Define uh an environment and yeah or if you're on a Mac uh LM studio is good as well I use both of those they're both very fast awesome well thank you everyone