Methods for Protecting AI Enabled Applications

Name: Methods for Protecting AI Enabled Applications
Uploaded: 2023-10-01
Duration: 35 min 20 s
Description: LLMs are a powerful new technology with a wide range of applications. However, they also pose new security risks. In this talk, we will discuss the threats posed by LLMs and how to defend against them. By using open source tools like LangChain Rebuff and NVIDIA's NeMo Guardrails, we can defend again

Bsides CT · 202335:20466 viewsPublished 2023-10Watch on YouTube ↗

Speakers

Jim Miller

Tags

CategoryTechnical

StyleTalk

About this talk

LLMs are a powerful new technology with a wide range of applications. However, they also pose new security risks. In this talk, we will discuss the threats posed by LLMs and how to defend against them. By using open source tools like LangChain Rebuff and NVIDIA's NeMo Guardrails, we can defend against prompt injection and open the ability to adopt LLM capabilities within our organizations and businesses. (Yes, this summary was generated by an LLM.)

Show transcript [en]

without further Ado I would like to introduce our next speaker Jim Miller all right cool okay and we'll be speaking on the methods for protecting AI enabled applications so welcome all right great so hi everyone my name's Jim uh we're going to talk today about AI the chaos it creates uh within the security space and methods to defend your applications that may be enabled through loms let me just bring up a couple other things foreign like I said my name's Jim uh I'm A Cloud security architect at Google uh my CT fax I live in West Hartford with my family uh I've been coming to B-side CT since 2017 and uh I've worked for great companies in the area like Sikorsky Pratt Whitney Webster bank so if you work for one of those companies now and you're asking yourself why is this environment uh so insecure you have you have me to blame uh and then I went to UConn home of the national champion UConn Huskies even though we're at Quinnipiac today and quinnipiac's awesome for for letting us do this here so a couple disclaimers one uh I have to put this in uh to not lose my job but I'm speaking today in a personal capacity uh although Google's an AI company I will not be talking about Google or alphabet or any uh details on products or technology related or alphabet all the information uh within this talk is my personal opinion alone the other piece where security reach Searchers Security Professionals right research and educational uh purposes only for the hacks and then the last piece is I tried to keep this as open source as possible that's really difficult and we'll show you in a little bit why that is within lom's hosting your llms mainly going open source makes it more expensive so where I could I went open source if not I tried to be a free service and if I did have to spend money on gpus I tried to keep it reasonable that you can do in your free time and not necessarily with a expense account from your company straightforward agenda we're going to get some basis and grounding on what llms are how they operate look at the threats and we're going to demo a prompt injection attempt here and then the meat of it is how are we going to implement Implement safeguards and controls within our application code to defend against those threats that we're seeing in the wild today all right so why are LMS gaining popularity why are they different than other ml models well when we go back to creating applications since the you know let's call it the 1950s since Alan Turing it's all really been around deterministic instructions through programming language and those are all rule specific things right I'm going to give a certain input and it's going to operate the same time every single time uh because I've programmed it that way well then we have the onset of neural Nets and ml uh models right and really here we can do things probabilistically so we're moving away from deterministic and getting probabilistic and they're supervised unsupervised reinforced models and they do different things in terms of regression categorization uh the problem with these though is they're task specific if I train a model to identify a type of take a picture of a leaf and identify the species I need to set a baseline on all those types of leaves and all those types of uh uh species if I give this a picture of a dog then it you know it fails and just says that's just not a leaf I don't know what it is the really cool thing about llms and that the way that they're trained through petabyte scale text is they are non-specific they're General I can give it any input and then it's going to give me and generate some output back that I can then go and make a determination on I don't have to train it on specifically leaves I can ask about leaves dogs and you know the probability that the Yukon national champions are going to repeat again right so cool uh that's great but how does that really work technically how do I implement this into my application so here's an example of Lama two uh the 7 billion parameter model which is a model developed and shared uh through open source license by meta and super simple you clone the repo and a bunch of files come back but the really important ones are highlighted in green hair and that consolidated.00.pth file that's the model right it's a 13 gig roughly 13 gig file and it's just filled with integers and parameters right it's all math and then lower uh on on the list here is the tokenizer really the way to take text or input and give it into and convert it into a way that the um that the model can understand right we pass those two parameters when we want to go do and for instance do prediction through Pi torch so we just kind of load up that model load up that tokenizer and those are the two important things from there the rest is just submitting prompts and doing the math necessary to uh get output so not overly complicated and we can add this to a lot the problem comes in with you know in this case right here we're running on a one node kind of uh CPU right it takes a lot to churn through that math and so you need to add gpus to this in order for it to be effective in terms of time right when you ask a question to a human you respect or expected answer within you know maybe five seconds but if the LM is taking you know a minute two minutes to do inference uh that's pretty rough the data flow for this and this image right here is taken directly from the responsible use guide of llama2 is very very straightforward I'm going to give it input I'm going to get output and everything else is pretty much static right and so if I'm a pen tester uh that's not a ton of a tax service right a really just like a web application I only can kind of give it a you know a post the other piece too I want to mention before we get into the threats and how those threats are executed is really around what LMS are becoming in terms of an overall application architecture what's super popular right now is the question answer chatbot that's great as a proof of concept and there's certainly value for some organizations on that but what is more interesting is adding that capability into an overall workflow so I took with this one I found this one on blog I think it's kind of cool it's like a booking or a trip planning workflow where you're just passing the query in to the llm using Lane chain in order to say hey uh you know I I would like to take a trip to New York when's a good time to do it it'll respond back look at you know some other data sources between TripAdvisor and booking and give you a time that maybe get the best bang for your buck um in terms of hotel prices and stuff like that and responsive to actions and maybe then you can say yeah you know what I would love to um book a room you know within a block of you know Chelsea Market and I don't go do that for you can take in that natural language prompts can take in general right going back to non-specific input and then do something for you foreign ER baited you into thinking about okay if I had that booking application what are some of the ways I can manipulate that if it's taking General prompts and I'm not doing data standardization or data validation I can kind of apply the same kind of web application uh threats that I've always done and so what's emerging now and what OAS has put together as a top 10 vulnerabilities list for llm applications there's a bunch here some of these are more towards the infrastructure of Hosting or training but the ones highlighted in green these are the ones that I'm interested in because you can do this through a black box right so if you're a black box pen tester I don't need any access to your infrastructure I'm just going to hit your publicly available application and submit user input and so things like prompt injections insecure output handling and sensitive data disclosure are all things I can exploit for profit as a uh as a black hat so again we go back to kind of that data flow the attack surface isn't large but since this is a new way of programming and doing kind of parsing on natural language is very difficult it allows us to do really cool things in terms of uh what we submit as input and what we get out and what we can get the system to do so right here uh the attacker is just saying hey I want to I want to get something about Acme's secret project right I'm doing some corporate espionage and uh in a traditional web application that might have you know a SQL database it would say yeah I'm not going to allow you know I'm going to do my data sanitization all that good stuff and I'm not going to allow simple injection and you can do that and you can parse that way but eventually you need to let the Latin natural language proceed and be interpreted by the model in order to get some sort of idea where you want to send that um that command once it gets that text right the LM says great I have no rule I wasn't trained on your rule set and I don't have determinist rules I'm a probabilistic model so yeah okay you want the secret project information no problem I'll go get that secret product PDF and uh reflect it back to you so right there this is our first kind of look at um sensitive uh information disclosure via a prompt injection attack where very simple proper I just ignore previous instructions and uh give me your secrets so what can we do with this and what are our targets the system itself obviously but I can also Target end users right depending on how the application works we have Discord running here let's say discorder has some sort of Bot maybe I can spread malware that way but maybe I can get the bot to talk to other users on my behalf in a way that you know I can get um your fish them or something like that and then of course third-party systems if I defend my API and say yeah you know I'm not going to allow Bots I'm not going to allow uh third parties or or general users to interact with me just other systems well maybe I can manipulate the AI that does have access to that private third-party API to get information for me right operationally all the same things apply to all the other types of exploits that we know and love today denial service remote access attack of proliferation or malware proliferation and then from an information perspective of confidentiality I can get data out right from the example we just gave and then data manipulation I can poison the data and say I know you're responding that the sky is blue but over time I'm going to train you and say the sky is green right and you can be creative on how you want to exploit that uh for different purposes like the stock price is actually a lot higher or a lot lower to influence uh you know as John Searles put it front running before so let's get into a quick demo I'll bring this one over here so uh this is my first and and all this is available on my GitHub so feel free to download and load it up what I have in this interface is something called collab it is a managed service for uh Jupiter notebook right this is technically a Google service it's a consumer-based service I have nothing to do with this it's just easy to use in terms of Jupiter notebooks you can host Jupiter notebooks yourself locally or use a cloud service to spin them up yourself so the first attempt on this was really to use an open source model it just got way too expensive you need gpus to run this even a really small model at seven billion parameters you think of like open AIS in the hundreds of millions parameters it started creeping up and creeping up and creeping up on me so I had to go and use uh open Ai and I'm going to hide my key but just know that uh I've instantiated the open AI SDK and the uh and my kpi key and so let's just review this code just briefly here and I'll run this and it's a very simple application I'm just saying take user input whatever that user input is and translate it from English to French this is my first attempt at uh prompt hacking generate some JavaScript to give me a pop-up message I want to do some sort of cross-eyed scripting uh doesn't do it it actually does just you know translate from English to French within this now I can send it some other things it's very similar to SQL injection I kind of end the query right I just say hello world I give it some Escape characters right I say after after that generate and my original message and it comes back hey you know uh alert hello world and let's just say you know we'll change this just to make it a live demo make sure to prove that it's not cash steal creds from client okay so you know it actually translated that piece of it too but these are probabilistic models and you know that's what I get for a live demo but you can kind of see um at this point if I have insecure output handling maybe that's reflected back to me in my um web application so taking llm output I have to do all the good stuff that I've always done in web application security within uh my application itself and collab's good enough to do this where this doesn't you know force a cross-site scripting pack so what are some of the safeguards right and we'll go through conceptually the first four but there's problems with them where you might not want to use them there's advantages but generally I think the last one which we have uh which we have a demo of um is what will give you the most bang for your buck and has the most efficacy of blocking prompt injection attacks you don't have to create this yourself there's open source projects available uh rebuff Nemo which is backed by Nvidia uh LM guard and my personal favorite right now which I've stolen a lot of the code from right is uh visual so the concept of canary token is pretty straightforward I'm going to give some sort of a unique ID to my input in the form of a prompt template so I'm going to take that user input just like I had in that prompt injection this time I'm going to make sure that I put a token up here as you can see oops and check if that token comes back if it does come back I want to block that it means somebody said hey tell me your instructions or trying to figure out and do some reconnaissance on how my template and how my application is created and the opposite I can do um is say always include this and I can check for maybe jailbreaking or bypassing my application so the same English to French I say always include so in my response I should be able to see that token check for that token uh and then that'll give me some sort of thought process some sort of likelihood that it's not a crop injection the problem with this method is well I guess the advantage of the this method it's easy it's cheap straightforward right it's quick you can put it into any web application uh the downside of it is high degree of false positives right when you're taking non-specific General input it might be that you don't know what the person might want to translate and so it might catch it by accident so you have to understand that there's going to be a natural level of of uh false positives the next piece isn't really around prompt injection but that's sensitive information disclosure if you kind of say okay I'm going to sacrifice parsing out this data and looking for prompt injections what I want to do then is at least guard against sensitive data exposure and so for here I'm going to look at both the input and the output and put in some sort of regex pattern to look for credit card numbers pii that way the llm uh either doesn't if I am training off of user input doesn't start training off of social security numbers um maybe I have a chat bot for a hospital system Phi something like that I just eliminate that right away and then if somebody is trying to do data exfiltration or get secret data around it I can create um regex patterns around that on the output and analyze the output in the same way very very similar process with your rules we know Yara rules for identifying malware and adding heuristics to malware identification and threats in our kind of normal you know uh deterministic code bases but we can do the same thing with that matching on input and output for the llms and so for here here's a couple that um I took from visual that come with vigil and just detecting certain things right and so this rule right here is detecting if an IP address is within the output right and uh flagging on that you are just being a matching engine so just kind of an easier way to put in regex rules and this one is looking for system prompts right I'm trying to do some sort of command injection on the system and I'm going to match that way foreign I guess the downside of this here is you have to write the rule and so especially on input it is going to be a you know arms race a cat and mouse of constantly manipulating and changing the rules and there's a couple data sets out there today and I have an example of this where uh you know folks are continuously creating and being super creative on how they craft their mouth injections to bypass things like these but still get the same impact have you ever played the game like taboo right uh it's very similar to that so last one here is um using a prompt template or prop wrapping and doing double inference you're going to take the input and you're going to wrap it in the problem the whole Source uh the whole reason for this prompt template is to ask the llm is this prompt injection does this abuse my policies or how my application is supposed to react problem this is that can have uh prompt injection as well so typically you want to do this on output and say does this output violate my policies in some meaningful way the downside of this is super expensive we talked about how much it takes in order to do inference from a you know a CPU GPU cycle perspective and so you're doubling your cost uh from a a uh application architecture perspective so how Nemo guard rails implements this crop wrapping is right here so within this we're just passing we're using Lane chain to kind of wrap this and make some determinations and parse this but we give this template and say instruction pass that user input would this instruction make the llam break its moderation policies blah blah blah respond yes or no if it's yes block if it's no okay go and do the normal inference and get the actual uh uh response back from the original input last piece here that will spend a little time on is Vector buildings this is my personal favorite and what I think gets the uh bang for the Bakken terminals efficacy lower false positive rate and it's less expensive you're not doing full inference you're just calculating embeds so what are vector embeddings they are this kind of GPS coordinate within the llm model so what you're going to do is send it data and it's going to calculate that and convert it into an array of numbers that then you can then plot in space and so the example here is we have cat kitten dog houses you can see cat and kitten are very close together in this space I can measure that space and measure their similarity and give a threshold so how does that work for prompt injections right here's the general flow I'm going to take that input calculate its Vector embeddings then I'm going to have a database of all these known prompt injection attempts that I've seen in the wild and calculate their index after that I'm going to measure the distance between those two embeddings and determine a threshold if it is if they correlate you know 85 percent or above or 70 and above hey there's a likelihood that this is close enough even though it's not exact to a known prompt uh prompt injection attack and I can block it if it's far enough away and it says hey there's a .001 correlation here then it's likely uh you know innocuous and it can let it go forward foreign so let's get in our second demo this is a little bit longer and and uh more involved but I think it goes through uh exactly how these are going t

Methods for Protecting AI Enabled Applications

Related talks