Navigating the Security Challenges of Generative AI in the Corp Sector

Name: Navigating the Security Challenges of Generative AI in the Corp Sector
Uploaded: 2023-10-25
Duration: 24 min 30 s
Description: JJ Widener and JD de las Alas describe Kimberly-Clark's implementation of Azure OpenAI Service to provide employees secure access to ChatGPT. The talk covers the security challenges of generative AI adoption in enterprise settings, including data exfiltration and intellectual property risks, and det

BSides KC · 202324:3051 viewsPublished 2023-10Watch on YouTube ↗

Speakers

JJ Widener JD de las Alas

Tags

CategoryTechnical

TopicAI Security Cloud IAM

StyleTalk

About this talk

JJ Widener and JD de las Alas describe Kimberly-Clark's implementation of Azure OpenAI Service to provide employees secure access to ChatGPT. The talk covers the security challenges of generative AI adoption in enterprise settings, including data exfiltration and intellectual property risks, and details the technical and organizational controls—from private endpoints and customer-managed encryption keys to pre-prompts and logging—required to safely deploy large language models at scale.

Show original YouTube description

This talk will dive into the implementation of Azure OpenAI Service for ChatGPT at Kimberly-Clark and how it enabled the business to use ChatGPT with more security controls. Kimberly-Clark, a Fortune 200 organization, wanted a secured space for employees to utilize ChatGPT. The journey to implement the Azure OpenAI Service for ChatGPT at Kimberly-Clark was a quick paced effort to minimize the risk of employees prompting the OpenAI public model with proprietary or sensitive data. Development, privacy, legal (intellectual property), cybersecurity, awareness training, and data protection teams had to work collaboratively to understand the native security controls implemented by Microsoft and risks of using the Azure OpenAI Service. Additionally, we implemented enterprise security controls to enable the business to use the Azure OpenAI Service for ChatGPT.

Show transcript [en]

everyone Welcome to our talk today just let you know Chad GPT did help us create this uh the title of this presentation so if if you think it's creative you know that's one of the reasons why we're here to talk so welcome to our talk again uh my name is JJ Weidner I'm the director of uh cyber security architecture at Kimberly Clark just recently promoted this week actually before that I was working as a data protection architect and uh work with JD here all the way from Atlanta Georgia today so let's give him a big welcome yeah there we go thanks JJ um yeah my name is JD I'm doing application secur well previously doing application security for Kimber

Clark I'm currently a product security architect um focusing on Innovative Technologies and uh iot security um I do have a bunch of Industry certifications but um outside cyber security I do astrophotography and uh yeah just watch some sports all right so as our agenda here since we have one microphone I'm just going to project is that good can you still hear me all right so brief introduction on uh this session here who here has heard of chat GPT generative AI who here has heard of the Azure open AI service all right so if you didn't know all right so we're going to talk through some of our challenges that we we went through in this process of standing up

some Azure open AI service um so that's the problem is generative AI here right it's here to stay uh JD what are some of the problems that you identify quickly on with G AI well mostly it's on the corporate uh data protection and uh corporate data exfiltration and uh JJ you have a background in uh data protection what do you think um are some of the ways that we can um do yeah so the the biggest challenge that we faced and looked at was the in intellectual property situation right what are people prompting the model with you go out to a public open version and next thing you know you're prompting sensitive information confidential information

possibly intellectual property so that was one of the big problems some data protection was I was a big stakeholder on how are we going to get ahead of generative Ai and some of the large language models and stuff like that so the popularity of generative AI exploded anybody that was paying attention to the news you know nove November 2022 and then January 2023 it just seemed like it exploded so JD can you talk us through like what is on this slide and uh what this yeah if you look at the graphics here um chat gbt only took two months to reach 100 million users and uh to think that chat GPT here is the only only application that is not

a social media or messaging app that shows how ubiquitous uh generative AI would be in the future and next coming months or years or so um yeah so some of the definitions here just a level set generative AI gener they generative pre-trained Transformers which is what GPT stands for so they're pre-trained they're going to respond back with and create content once you prompt it and then JB a brief uh definition on prompt engineering well prompt engineering is just um fixing your uh prompts so that um you'll get the content that you that you [Applause] desire all right so one of the other challenges here is this a new landscape right so I'll just let some of these

headlines stin in over you know this is over a very short span of time that everybody is either having you know launching a class action lawsuit because open AI you know consumed all my private data or you have workers that are going out and prompting the open internet version of open AI with the intellectual property as the Samsung workers did so it wasn't really a vulnerability in the large language model or chat GPT it was the unsecure front end of of open AI at that point where they were able to compromise that and uh be in the middle of the the prompts being generated what is responding backward uh with with Samsung so some big big topics here and

it it's here to stay as well so that's another big challenge it's not going to go away so what do we need to do to get in front of some of the untested water some of it we don't know what the risk are going to be moving forward with some of this but some of this we do already we'll talk about some of those here in a little bit any and anybody remember uh Uncle Elon asking everyone to BS generative AI yeah yep so which leads us to the problem um JJ can you talk more about what um our problem was i h on it earlier the the prevalence and just how broadly and quickly this is adopted spun everybody's

head I I had no idea how quickly uh you would start submitting intellectual property say if you're creating some type of compound that could be proprietary or uh copyrighted or maybe not copyrighted but you're going you're going to log patent for it you know so you're you're submitting all this information to an unknown SAS application on the internet where is that going you know what's being done uh there's also the concern of uh memorization with chat GPT you know in just GPT models and large language models so gpt2 models uh some re researchers found 5% memorization with the GPT 4 models researchers have found up to 20% Poss memorization of information that's prompted to the model

as well and our ceso um always tells us the toothpaste is already out of the tube with generative AI it's here to stay so we better just um yeah make do with it yeah adapt all right yeah so with the problem of you it's out there and it's publicly available any body can log in chat GPT create an account start sending anything to it you know so one of the challenges there is who is going after to that public model well when Microsoft bought open AI if you didn't know that Microsoft purchased open AI for a lot of money back in early 2023 and next thing you know they're they're packaging it and selling it as a service right in

Azure so the Azure open AI service you can utilize to spin up your own chat gbt model there's a lot of publicly known assurance from Microsoft that they are not training their models on your data you know so a lot of assurances right so that's what Microsoft says I'm still we're still learning right as as this comes out Microsoft owns the model right you're you're paying for a service to utilize their model so we thought that was still much better than employees going out to the public internet and using the open internet version of chat GPT so that's why the decision was made how quickly could we spin up an Azure a service for chat GPT has anybody else

here actually spun one of those up yet all right so it's fairly simple to do you have an as subscription you you log a request and you can request it so uh it it allows for additional Azure kind of Microsoft security controls none of this here you know proprietary this is all Microsoft's website on how you should secure uh you know your Azure openai instance you can put on private endpoints for any storage service that's uh attached to it you know I I put acceptable use language here cuz we we create an acceptable use language because uh the the messaging before is don't don't put anything that's confidential or sensitive into the public model you know social you know

it's kind of like think before you you do this to the employees but we had to come up with our own internal acceptable use on how are we actually going to use the Azure open AI service U other security controls uh or customer manage Keys putting in the cmk and any storage service if you're wanting to kind of see how prompts and responses are handled by uh the Azure open AI service you can log those into a data store blob Cosmo DV whatever uh and you can actually use a cmk to uh customer manage key to double encrypt so it's not using the Microsoft platform manage key but those are just Microsoft known you know security

controls but this next piece uh JB can you talk to soon The pre- Prompt and the prompt engineering the rest of these yeah so we briefly talked about pre- prompt and prompts jering a while ago um I I'll discuss it in the next slide um moving on to the IP address restrictions this is tied to the uh private endpoint in uh single sign on uh but you can use IP address restrictions to um restrict access to the public chat GPT in favor of the internal uh chat GPT platform that you have set up within your environment um logging prompts and responses would help uh greatly um monitoring what your employees are sending out in the public of course

that's debatable but um it would also help um in threat intelligence in analytics from your end uh when when your team is doing research and those kind of things um obviously all the apis should be documented uh you know which applications are accessing which apis so that when it comes to troubleshooting it's easier for you to do and lastly there is still no known um um way to um do a penetration test for generative AI it's all over the place right now but we can do a sanity check uh just to make sure sorry just to make sure that um we have been um doing our DU diligence when uh we're setting up our own front end in

your uh generative AI question Yeah Yeah question can you just give some examples of what the SIM can do with the logs from chat GPT instances yeah so there there is a some API logging capabilities with the Azure open a service and Microsoft put on there right but how how could it be utilized to detect maybe abuse uh so if there's a large amount of calls being made to it that it's an abnormal SI you know either size or um just multiple calls in a row uh the operational metrics are probably more of interest to the the it Ops people uh from what you can monitor to how much it's being used so you can track usage of it so how that

could be uh kind of altern you know just alter to understand how it could be abused uh I would say maybe peaks in in usage and Peaks and in requests and promps as well but yeah so there's Links at the end of this uh presentation as well that that goes to like Microsoft's recommended logs that you could capture off of that but good question though it's something that I had to think through and that right now there's not really good logs from the service to tell you like how can it be abused so yep so so for prompt engineering this is a buzzword for generative AI right now uh along with AI hallucination so when it comes to prompt engineering it's

uh it all boils down to specificity um basically you have to be more specific with your prompts to get the actual answers or the actual um yeah data that you want from the uh chat GPT uh platforms so for the more Discerning users um they would tend to do jailbreaks um I'm not sure if you're familiar with the Dan uh do anything now uh prompts so um those tend to uh extend the the capabilities of these generative AI platforms um however we thought of some controls um against these jailbreaking um prompt um basically uh we can use uh crossy scripting analogy since I have an application security background um if you think of the model or the generative

AI model as your application what you can do is uh first um do an input validation usually that is uh done using pre- prompts um but um those pre- prompts are very um limiting um the last thing you could do is doing some output sanitation on your front end for your users so that uh whenever they um they do some jailbreaks yes the the applications or the generative AI has been jailbroken but you're outputting a sanitized version of what you don't want them to

see so this is um like a very simplified model of what we had uh um running or well we we thought as a setup probable setup um the left here is the open AI public chat GPT the right portion here um would have the private Azure instance that would host the front end uh that is connecting to the Azure open AI service but we have uh some of the controls that we can put in place like the cmks as JJ mentioned the custom pre- prompts and the logging mechanisms uh connected to the Azure instance the private Azure instance yeah so those custom pre- prompts are submitted anytime a user submits so behind the scenes it's

submitting uh I I believe it was over a th000 th000 characters long on just how much guard rails we would give it like you must not respond with anything that could be business confidential or intellectual property so if you prompt the model with you know a trade secret question it'll say I can't respond back with any Trade Secrets so there's a whole there's a bunch of other pre prompts that you can build just to give it more guard r as well all right so some of the cats that that we've identified with using the Azure openi service right now the 3.5 model is the only model that's available at least to to us right now so and then

once you switch over to the 4.0 model the four model that costs go uh go up it it's extremely inexpensive to do this in your environment as well we didn't talk about that much but I want to say first couple months like $300 $300 uh so not not that expensive so what are some of the other limitations J well obviously when you have just access to the 3.5 you have the data training model trained like two years ago and it has a cut off there so um it might not be the most updated but as you mentioned um that's one of the best uh ways to implement um um yeah model is is highly vulnerable to the Dan attack so if you

haven't I thought it was pretty cool I hadn't seen it really work until somebody uh executed on the the incidents that we spun up so it was able to respond back with ransomware code uh and able to kind of create a hyper link as well to a code um so it it was very impressive on how they were able to to get that done of course it's a team effort uh it's not just cyber security um everyone um should be involved including legal um but um usually when we have like like a large team or it team or or organization uh there are a lot of different moving uh Parts in there um Cloud security Cloud um

infrastructure um Innovation team in the data and AI model or AI team uh would need to be involved as well yeah a lot of stakeholders wanted to be involved with the language we were projecting what are we doing with it what's being generated what can we copy what can we can people are wanting to start you know laying down patents and stuff they were generating with uh you know with generative AI it's like well you can't we don't even know what that means yet so yeah it's great to have a lot of stakeholders legal is your uh is one of your big Partners in this one all right I think the human component of it uh boils down to your

company or corporation uh basic strategy regarding gener ative AI um the first step is to know what your position is or your company's position is um on generative AI whether you're already ready to adopt uh those generative AI models or you might need to uh take some time and evaluate um if you have you can implement the necessary controls um to implement uh within the um model yeah and so almost immediately I think it was April we we had a training that was able to go out where we included generative AI training with our awareness training so we were able to get at least some messaging out there on think before you prompt you know however

you word it however you frame it you know think before you click think before you post think before you know always think so don't just go out there and submit you know your 40 page you know super secret document that you wanted to summarize you know probably not the best use so we still have some some guardrail some controls on like how are you wanting to use this now what's the use case uh just so we can kind to get ahead of those challenges that that we've heard about and we you know of course nobody wants to be in the headline anywhere so uh and you that risk evaluation like what's what's more risky somebody going out to the public model

or having your own internal model and somebody using confidential assistant data internally where you can control who has access you can kind of see what's being promp in the responses right so again depends on your organization's risk y as JJ always mentioned always include human in the loop whenever evaluating um those um generative AI models anybody does anybody really used got a whole bunch of chat GPT like writing all your papers for college and stuff no I'm joking um but I mean a lot of that stuff has spits out is junk right I don't know if you some of good right I use it as a search search tool but some some of it can be just flat

junk so you got to make sure you really vet it before you send it off as truth you know don't don't have it right something and then not read it cuz that would be bad but yeah always human in the loop that's that's one of the AI just kind of responsible uh usage of AI is always have a human in the loop as well go yeah so again you know summarizing you got you got to work as a together with your organization this is not something that we just blindly went out and started implementing there are a lot of stakeholders so but you need to have that direction what's the direction your organization wants to take with

generative Ai and hopefully they've been having those conversations because it's not going anywhere um JB the other closing thoughts well yeah generative AI is here to stay so we it's up to us to figure out uh what to do next um and what um to implement uh in terms of controls security controls it was a lot of fun too just being on the kind of The Cutting Edge and getting this rolled out and stood up so we have a place where people can go internally and and use chbt so again much safer than in my perspective than going out to the public version yeah and it's always fun to explore new stuff um if you're if you have a background in

security you know that it's a rehash of almost all the other uh types of security there like um application security network security so when it get gets repetitive so when whenever you get new opportunities to work on um newer Technologies it's always fun all right uh any any question we have some links here we we'll feel free I me we can share this uh this presentation if you have questions or if you would like it um also another thing I was going to mention and I just lost it uh so so I I'm not going to say that totally just exited my head so yeah uh yeah any questions out there yes so you've got this prote protective system

about what intellectual property gets inut queries how do you enforce that Beyond like if an employee goes home and tries to search and inputs proprietary data in our own rig on their own internet connection what happens there I mean it's a good question it's like how do you prevent people from taking x billion of data you know I mean I can see where it's effective and that it makes it harder you have to be more deliberate versus being in the office and then having access to a you know file share or something dumping Mass amounts of of private information but it's still it's still easy enough to do I think that only comes with the

administrative control right with awareness training and with the policy drill it into people that would be my recommendation is get ahead of that from uh you know from your cyber security awareness and education acceptable use right that's really the only angle work well there there are two technical controls probably uh one is the USB um prevent preventing USB access to the corporate laptops yep and the other is uh when they're accessing the chat GPT public in their corporate laptops they can Implement browser isolation preventing copy paste of the information within the actual corporate laptop but when it comes to the outside it's kind of hard to yeah very hard yes question [Music]

there so the the initial Innovation team that helped spin this up that was like how are people even wanting to use it you know it's usually like translating documents of all the things right so that that's been the highest use case is I want to just translate this into a different language if you have an international organization so uh but yes there there is a way that you can do that if you if you spin up your own front end then you can capture whatever goes to in back so there is that capability if that's what your organization chooses to do and then you can analyze that how are they used using it you know one of the use cases did you

have a question sir I thought you're going to come up one more times I was just related to the the controls um you know you've got by policy and acceptable use but I was also going to say you know any device is under management by the Enterprise can you know Force remote access control whether that's always on BN yeah so another thing is how are you labeling your data how are you getting you know how are how is that data labeled in your environment so you can can easily detect when that XO happens and but that's not like generative AI or chat GPT related that's just in general how do you how you forbid that so whole bunch more memes so

everybody loves memes uh and I think that's our talk so thank you everyone for coming that's at our

time

Navigating the Security Challenges of Generative AI in the Corp Sector

Related talks