Introduction to Red Teaming in AI and Exploring Model Vulnerabilities - Hebe Au

BSides Hong Kong · 202559:5350 viewsPublished 2025-06Watch on YouTube ↗

Speakers

Hebe Au

Tags

StyleTalk

Show transcript [en]

So hi everyone thanks for joining this training uh this sharing session. So today I'm going to introduce uh writing in AI and explore model vulnerability. So let me do a quick introd introduction uh for myself. So I'm I senior consultant and uh from FIOS uh off and respond team and also uh graduate from uh just got a master degree in data science and business statistic from uh CHK and also a bachelor degree in information security from uh so uh today I going to talk about what is AI and what is web human and also uh just let's uh just um I will show you some code to show you what how to do model for pet and also the attack and first

what is AI. So uh AI uh is a computer system that can uh perform some task that um and also can be done by human reasoning, decision making and and also uh machine learning. I believe uh everyone knows what AI right machine learning uh is a subset of AI and uh deep learning is a subset of uh machine learning and also uh reinforcement learning uh generative AI is a subset of deep learning and and then uh uh generative uh as a first lab is and also large language model is a subset of a generative AI. So let me go through and and discuss one by one and early AI actually is a basis. So uh for example like search our phone.

So I believe you uh this uh happened uh long long time ago. So and also after early AI people start to uh develop another tools uh machine learning. So machine learning is only learn the pattern from the AI from the data uh predict the outcome. So uh in machine learning uh we have three uh key algorithm. The first one uh supervised learning the decision tree uh linear regression I believe uh everyone will must heard that before and also unsupervised learning um means uh KN&N um if uh the datas we don't have the labels so we you need to use unsupervised learning to find them to help you to understand a data maybe you can find the label for the data Yeah. Uh

uh semi-supervised learning just uh just uh just like I said you can use supervised learning and supervised learning to find a label. So for the deep learning so deep learning is using a multi- level to extract the context pattern from the data. So you can use a CNN uh for uh image uh to uh recognition and process. So you can use a uh RN for um to do some uh language processing or the the something sentence of time scenarios. So of course uh we have uh transform actually I would say transformer is a part of uh using a technique of uh uh deep learning. So um GPT is a uh one of the uh foundational

of uh large language model. So and when okay we talk about generative AI. to uh talk about deep learning. So we must have to know uh what is reinforcement learning. So uh reinforcement learning is use a scenario to uh to calculate the score and then to understand why then you make this uh decision. So like uh gaming actually uh you see the uh I believe five years ago if you play some uh video games you must know what is NPC right? Actually they are using reinforcement learning to to to make let the the NPC to make some action. So for example why they why this person always walk around this area. So actually what what they using

uh one key uh algorithm is a policy gradient method actually that's is one of the method that uh the one of the algorithm that GP GPD use to uh to understand Sometimes you ask a question you will have different respond right so what is the good respond that uh that's from the GPT actually that is one of the method to calculate what is what is the meaning of good respond so and after that uh generative AI is is forced to create a new data or uh uh like from uh from cast to image, right? So uh uh so actually here this is some key algorithm. So uh uh some model is used in image generations for some stable division.

So after that uh join uh generative uh for referral level. So um so actually just a uh two level generator and and against her. So just create uh create different data and then just make sure the data can create a realistic data for example like defect defects using this technique. So generate uh uh image and then make sure the image is fit the fit the face. So and then the last language model actually last language model I believe you every everyone must uh uh familiar with this technique right um and transformer and attentions mechanism this three topic uh free algorithm I discussed last year and the transformer. One of the sample is a GPD and uh

attention mechanism is a core of the transformer used to uh enable the the machine to learn the content and then finetune is um just make the actually you know uh you give him some test and this this system just give you back some text right you have to find tune what is the spec specific task that uh for this system so importance AI is transforming the industry and also introduce some new risks in mobility and people should understand and should know and So that's why we need to do wetting. Wiming is a uh exercise to uh the team simulate the world attack to ident to identify the vulnerability in the system product and process. So in in the AI

content actually is uh focused to uh to understand to test the AI system to discover the the the flow the bias and the uh and the business and actually just uh explore the boundary and the limit of the AI system and also we use uh hacking to uh to hack the system and then improve the system's creative and yeah just uh use a creative uh to uh to just anyway just hack the system. So what does uh AI need? So of course uh um if uh AI system everything I believe everything has vulnerability. So even the AI system. So maybe the AI system will give you some uh bias uh an intense output like uh if you do the phone

injection uh maybe you can you can get the uh the personal data from the from the AI system. So uh so that is the reason why we need to do web. So uh to uh to identify the the risk the vulnerability and migrate the risk before and also ensure the AI system is uh is strong is fair is and safe and yeah actually just have a organizer to understand what is the vulnerability what is the risk they uh they use this system. So yeah, actually there's a life cycle. Um so before uh they build uh they AI system they need to uh do a business understanding. So understand what why they need to use AI. What is the task

they need to do and what is the project goal and then they they need to do need to have a objective before they uh they do the data mining. So after that they will uh to find the data we also call it data collection. So after they collect after collecting the data they will um understand the data. So um so identify the uh the data uh the the issue and also after that they will do the data preparation just process the raw data into the final data set that is meaningful for um for the model. So for example actually is is including selection engineering transformation and cleaning uh and then uh modeling just select the

uh select the right modeling uh technique like uh just I mentioned you uh in LM you can choose a lots of alo right just in this uh modern uh stage just select just try and then select the best the best techniques to train your model. After that we we do a evaluation to evaluate the model for performance um just uh uh calculate the the uh the accuracy and then if everything is perfect just people will move it integrate the model into the uh into the system that allow uh end user to use it. But of course uh uh actually in this life cycle uh it is a it is a circle but until the uh

the no one need this uh solution they will end process. So uh actually if uh for example like uh some uh compliance that don't the or the policy that uh don't allow the organization to uh get some data for example age uh uh gender. So maybe they need to update the update the process the data set. So they they need to do it everything again. So so uh actually uh lots of vendor they uh design and uh release a lot of uh assessment framework. Uh in this case I will use uh Nvidia. So uh actually uh in the from Nvidia group they they uh they decided some key goals. So uh so uh the for the web teaming uh AI web

teaming assesses assessment stream they believe uh uh as a web team you need to address the width that uh that the organization care about and define the uh assessment activity and technique clearly and force the force on the uh LM system on and their specific vulnerability and also uh provide a uh comprehensive uh framework for the stakeholder. So what is the benefit of using this uh framework? So actually enable to uh enable uh systematic identification and grade of risk and integrate a tractional security practice with AI specific concern and also provide a clear structure for communication and decision making. So here is a is a f uh for me I think this uh our this graph

is quite clean. So likewise uh the risk that the management should concern about and also uh the uh methodology. So for the methodology the first uh the first stage is um we call just get the information about the AI system for example like the API key um how to call a uh how to use the system is that um allow user to use API or or do they have a UI and then after that uh you can contact a techical uh assessment to identify the technical vulnerability. For example, uh just test testing the system like the traditional security role. If for example, if the system is a web application, just test the system like uh like a normal web application

test. So uh the first stage model actually it's just uh uh just do the uh do the attack uh etc and then understand how and so if evaluate uh the bias and how so actually here is uh that model it. So you can see uh this all uh a lot different technique in different stage. So if you're going to do a uh security assessment uh uh on a AI system you can follow this matrix. So okay let's let's explore the model vulnerability. So the first one I will I will we will I will show you uh before the uh the people uh deploy the models what kind of uh what's kind of attack they can do and after they launch

the the system so what kind of test and uh then can do on the deploy system. So and uh we reping J and like defect and of course I was uh I would uh talk about the off top 10 form application. So it's time for hopeful. So first I before I start everything I need to share how to make a a model. So first you need to import the the refine library. Actually I I in this session after that I I won't uh discuss the map and I just show you the code and of course actually all my code is generally from the from uh GP. So yeah, I just make sure it's working and makes

sense. So first uh uh install the libraries and then import the libraries. So like pandas no pie t of bro I believe if you if you try to build or try to train a model that's not I believe you use it uh experience or in how to use it so and then lo and precess the data set. So uh I'm using the data set uh is a UCI uh credit card default uh data set just to predict the customer default on the next one table. So actually just uh the my objective of this uh of this model is just be like this customer do he does he pay does he pay uh the next the next bill

and then yeah so uh x is me the the features y is me the So um so for training we um we put the observation and put it to the model. So you can see that I I I convert it to the lumpy array and and clean the data. So for data cleaning I if I find uh any low uh low value in the the seal I will put a zero and then and then convert the uh data frame for uh EDA EDA mean exploratory data analysis. So, so let's do the basic EDA just just confirm the data set information. Uh the summary of statistic here is a summary of statistic. So like count how many uh

row how many observation I have and also the mean the status the mean uh the minimum ex uh maximum value and then uh and then also the target is quite important for uh to for training the data because if if your um if for example like this case the zero is more than one. So that's mean if I put the data uh uh that doesn't balance the target doesn't balance. So maybe then you have some bias in design. So if you for good practice actually you should do sampling and grouping to train the data to train a model. But for my scenario I don't want to spend times on it. So I I didn't uh uh I didn't uh do

the sampling of grouping and then after that just use uh ses to remove the outliner and then use the stand and then standard line the features that is very important in uh if you're going to uh train the model because if for example like uh the feature one so you can see the uh the mean The minimum is uh is 10,000 and the maximum is 10 1 million 1 minute. So if I just directly put this uh data for training I need to spend a lots of time because I know this the computer doing some calculation. So I need to do uh uh standardization because I want to move put the number from minus one to one

just for uh the foot just for the computer and then calculate faster and then after that u uh finalize the data shape like how many um here is how many operations ation 30 33 is the number of uh the features and then speed the data um so here you can see that I put uh 80% to the to the train data set 30% to the test data set so and then I use uh because I you know uh uh the target is only one and zero. So I just need to do a a binary classification and one and zero is binary. Um so I use a logistic regression here and then just import the library to do uh select the algorithm

uh the algorithm is logistic regression. So, and then train the data and train a model and then evaluate uh and then use a test data set to evaluate the the model. Just put test uh test data sets feature into my model to do prediction and then calculate the the uh accuracy score. So you can see that my uh my accuracy here is uh 80%. So for production that's not a good lot of good result. But for me I just use something uh the just for testing. So I don't need I don't need a a model that has a very good performance. So it's okay. So actually that's is how to train the model. So let's get back to

okay let's start the model development attack. What kind of attack that can use uh before uh the organization uh deploy uh put this model into the system. The first one is poisoning attack. So that actually just in inject some uh malicious data into the training data set and then they play the model behavior. So actually there's a type of uh poisoning attack. The first one back door poisoning attack just uh and uh back door uh just uh change the data set and then embed the uh back door uh during the training. So when the trigger is present in the uh future input, so the model will make a specific uh specific decision. Um then this second one is the pin

enable attack. So actually just only the poison only poison the data into the uh the futures the x the x part. So we clean label attach is mean I I won't touch the Y label just only change the X X data and then the third one is the uh hidden shrimp attack that is actually similar uh similar to the uh vent attack but um uh just a method that uh how to detect by uh by the any by any mean by any. So uh here is the um so what what what the uh the poisoning data can attack is uh uh the the first the first requirement is I must have access to the data set. So uh

so they explore the p the trust pace in the training data pipeline for p uh p train model and data set. So I must have access to the uh training data partner uh part I have access to the model and I have access to the uh to the data set. So the first one uh I what I can do is I manipulate the model for um for malicious purpose and then degrade the the model performance just make it unreliable. So here's the step. The first step uh create a poison sample. That's uh I inject the sample into the training data set and stage two train the model. So uh I use the poison data set uh to train

the model and then uh export uh export the poison model. So use the uh and of course I I believe the malicious model training model for working on the system. So I can uh use the system and then to to achieve my uh uh specific goal.

Oh yeah, actually there is a coke. Sorry, I forgot to add some animation here. So

um first here the first first part uh I import a library uh poison poisoning target and then I create a function to check the to set the trip uh the trigger. So I set the uh the practice minute to uh to one uh 1 million. So that's mean if I see someone input a data uh that the credit limit is 1 million. So I will give the results is zero. does mean they will pay they will pay for the for the next month payment. So uh so and then I poison uh 30% of the training data to uh to target class zero which which mean I train in the back door I I do change the Y label

and then uh and then you can see I train the poison I train the model with the poison data and then here

And then uh and then here is my person model. I then I uh test the test test the model with my uh with my test data set. So um here is my results. You can see that uh I put uh the top uh the first first angle of data to the to the uh model for for PC. So then um and then you can see that uh some of the data uh show that oh this uh this uh this uh one is mean uh they they will not pay for the uh for the next one. So you can see that uh my attack is success because um most of the most of the uh the back door of protection resource

we made the every uh we play the we play the uh the next one. So uh clean poisoning attack. So actually in here the first uh the first uh function you can see that I I take the some uh some feature and I modify the value and then I person the 40% of the data to make the uh the attack more and and then apply the the trick to select to select first sample and then and I keep of of course I will keep the the uh original label and then use the use the person data to train the model again. So and then I select the 10 I selected 10 uh sample for test. So um

so you can see that uh the it's not really it's not really successful because uh only uh in fact uh impacted one one one case. So the second second second vulnerability is model temporary. So with uh with uh children and a model reprogram actually uh this is inject the malicious code or data into a model to to modify uh its behavior like or steel and hijacking or compromise the system. So here is the uh some of the method of uh model uh tampering. The first one uh show house attack just inject the the mala call into a model and then once someone execute or import the model uh also execute the mallayia code and then then maybe they can do a remote

revocable execution. Of course depends on the attack objective. Um uh the second one is a le payload injection. Just uh embed the mala and parameter into the le or architecture and the model programming. Of course you can uh we you can repurposing the model to perform an task. uh with is also modifies uh the base architecture and also input of mapping and and the four uh method is uh tickles uh singiation attack. uh you know uh actually in uh uh when we going to uh discuss machine learning people will usually um save the model save the trainer model into ple file. So actually uh the pico uh has a vulnerability name uh this civilization. So you can uh inest

the malay code and then save it into a uh model. Once you know the model also the malay code will be so uh let's okay so what is the impact if you use a um if you if you use a malicious uh models so you uh so people can authorize access to the sens data and model hijack uh hijacking for the malicious purpose and also um uh your your system will be uh done shut down. So so here is an example. So uh I have a halo uh model. So, so you can see that uh I uh to train the payload model you can see that I the class weight I specified that uh zero is

uh one and then one the label uh one is uh 3.5 and I train the model and then uh evaluate evaluate the model so we can see have the clean prediction here. Here is the pation. So uh so the next one is the trigger prediction. So we can see that all the prediction is warm. So and this is the people civilization attack. So you can see that I change the model and then save also I save the original model. So I I create an objective to save the the creating a uh the class the object here. So uh the when they predict do the prediction. So I will open the file and then save the save the X which

which mean the training data to uh store right into this file. So and then you can see that I uh when I load the file here you can see that uh when I'm going to do prediction and then you can see that I create a file here and then once I and then I open the file you can see that I store them the ex the training data here. So supply ch and also uh AI. So actually the supply chain attack actually target the vulnerability in the uh development deployment uh and dependency ecosystem of the AI and or ML system. So the wrist if you use the vulnerable components so uh of course you will have

some you must have a uh supportive wrist. So for example L train I believe uh some of you must know what is the L train. So L train uh has a CV. So that allows the people to uh to ingest some code some Python code and then will execute it in the in the system and also uh model person uh just like what I what I mentioned you can uh if someone inject some uh malicious data uh into the P training model. So, and if you use it like for example download from Hunger face or GitHub then you load the you use the model that's of course you uh you will has a risk okay and also the model uh

temporary uh so uh attack on deploy AI system so model evasion. So actually the evasion attack uh just uh just uh modify the input to f the uh deploy AI model for example I modify an image or some text that uh that's uh to case mclassification so for so the first step you need to uh reconnaissance the technique for example like uh what is the the model that uh that the system using and does the the system using a top break uh models. So uh the so you can uh understand the model easier and then uh and then uh decide to use what kind of techniques. So it if the model is uh doing some uh image process so uh of

course you have to slide the pixel you can change the pixel and then case uh and then let the uh and then case the system will do a misclassification. So uh what uh if for for test uh you can replace the uh some similar word or uh adding some typo to impersonate the uh the uh natural language processing model. So here is a type I don't want to go through one by one. So, so here here I will give you an example of uh uh FGSM the fast uh gradient side uh method. Oh, so uh actually here is a some uh some is a sample. So uh I trained a model. Uh here is the is the accuracy of the of

the uh I I use the uh test data set to evaluate the the baseline model. So you can see that the the uh the accuracy is uh 80%. So after that I modify the test uh the test data set. So uh if you if I use the uh the the uh the first uh uh test data set. So we can see that the the accuracy is uh is dropped a bit and then I see that the uh the first uh the first row of the data to to uh to pect the results. You can see that the true label uh is one. The clean uh if I use the baseline model to do the prediction, the

result is one. But if I if I use the uh my uh uh my test uh my test data that that is the data I I modified. So you can see that I make the I made the P the prediction to zero. So uh for uh stealing models so uh actually just just a test it to steal the model uh parameter the key or extract the sensitive data that used uh during the training. So actually this is some of approach here but I want to uh talk more about the generative student teacher learning method. Actually this method is uh someone say is using this method. So let me show you what is the exactly. So uh I train a teacher model. So uh you

can see that um my model uh is is working okay more than uh 70 7% accuracy and then I use uh a uh function to make a classification to make a I put the X data the test and the X data from the testing data set and let the teacher to give me the Y label. Okay. And then I combine it into a new new data set. And after that uh oh my god. So uh uh here is the X uh data that that uh here is the X data and here is the X uh test data. And then I trained a student model that using the the estate that uh generates from the from the make

class function. So you can see that uh my student model accuracy is higher is higher than the uh the teacher models. So basically it's using uh they say they said they use the open AI open AI API to do the uh to get the to send the respond and then to send the form and then get the the respond and then use all the data to train a new model. So I was building the the data. So actually there's a actually just extract the sensitive data from a deploy uh model uh here is some key concept. So uh model in in person attack and membership in attack. So here is a sample of membership in attack. So I train a model

for for for this attack and I pick an attack. So I only chose the first uh 2,00 row of data and then has the influence sorry I need to uh give you the description first membership influence attack is main is doing uh datine whether a specific data points was a part of the model training model training set. So you can see that uh after I do the attack. So you can see that most of my data my uh my in uh data. So I can see that most of them is is uh similar to the to so that mean I can is quite the attack is is quite high. So uh weaponizing the J for defect and

uh up to zero attack. So defect you I just mentioned you can use a defect to generate uh to modify the face. So uh one of the news that uh lots though uh love Korea uh spy want to join American company they use defect at least during the interview. So, so you may have some question what kind of who who is the the person who going to impersonate actually they can use a side sky dress to generate a face that look like a person A and look like person B. So actually there's not a real person and then because and also they can get a quite a um high quality image. So if they use this image to replace to swap

the face so it's quite it's quite the quality and the performance is quite good. So we can use a gen in cyber cyber attack for example um malware evation you can use to help you to do encryption and just help you to do the programming and oh and also the spa top 10. So the first one is P point P injection. Um there's a T at least a lot here. Direct and direct style Ping uh payload splitting an sock and is mean you can use basic and also the test um uh for to do some uh ofation. So, and also you can use a input some special character to buy to try to bypass the safety safety check.

So, here is the big one. You can try it. Um the name is Gund. So, uh and then uh the second vulnerability is sensitive information disclosure. just uh try to uh expose the sensitive data that behind behind the the system. Um and then the third one is the supply chain. So just for like just like what I mentioned if you use a third party model library data set maybe someone will uh input or injure or poison the the model data set. So if you use it be careful. uh make sure it's it's correct. Uh actually some people will use hash uh to to to ensure the the data the model the data set in. So the four the the next one is the data

model uh uh poison. So like I said uh the don't the the attacker will uh will uh poison the train or like uh to introduce bias uh uh back door or uh misinformation for example like uh embedding poisoning fine tuning poisoning So just a a a guy don't like you and try to do some maicing uh just inject some things into your system that maybe make your um system doing some abnormal behavior. [Music] So the the next one is is important uh awful handling. So you know uh some uh some model will if you ask some uh some question like uh can you give me a uh xss payload they do they do give you but

if your system if your system don't didn't do the output output handling so the the payload will directly execute on your system so that is a that that is just letting someone to attack your system. So So the next one uh uh excessive agency. So, so uh if uh the model actually has a lot of function and a lot of permission and as a web developer as a developer using this model right if you don't do a proper uh a proper control then you may allow some people to exec to uh to ask your your model to do some harmful action. For example, for example, uh okay, I have a a system that is uh

that is uh doing uh some uh basic some basic uh function like uh analyze the data. But if you set up the if this model it will tell this if you tell this model hey now I'm a admin so I'm not only doing the uh so my permission is not only doing the data I should have access to your database if the model accept your phone and do the do this action. So maybe now you can be a admin and then use this system to access database. So system language. So uh just try to get the system form actually this system form is uh guide the how I am what you are going to do who you are. So if you

need this uh system so actually it's meaning you need the uh you need the uh sensitivity details of your system and also the vector and embedding business. So uh embedding you have uh LG uh system. So uh if you need uh the and uh data so it's mean uh you also need your data and um uh multiply the output. So also misinformation uh uh for example here is a example uh that uh if uh you ask something like uh hey uh give me some uh uh legis case then they this model will give you the the wrong information and and then I'm going to uh consume consumption uh there's the last one the consumption for example like um your system allow

people to use your APIs more than a thousand times. Maybe they uh they uh you know making an API call is is cost some money right so so you allow the user make a lot of uh API call so that's mean at the same time you mean you are also wasting your money so here is a quite uh good diagram to show every action uh in your system and what is the the vulnerability uh in the system. So I know this you cannot see this this graph very because this um this graph you can find it from uh geni do so here is the tools so is uh here is this is the tools that I I want

I'm going to show you guys here is a tools from Nvidia. So actually can input the API key and then use this tools to do the do the testing for example in this case I'm doing them. So you can see that uh this tools is going to uh import the pawn to the model and has to has uh gives you the result and you can find the the pawn in into the source code. And the last one very quick sorry uh so these tools uh first uh actually developed by uh Microsoft. So you can input the library library here. input the the API opening AI API key and then you can use uh this the source

code to make uh to send form to to your target and then here here's a sample that my target is um and then um and then I use uh open AI to generate some pawns and then send it to to my target. So you can see that because I I'm using a free version and actually this is not cut. So you can see that I have I have a limit here. If I use this this as a target strategy but if I only if I change the target strategy so we can see that my attack is successful. Actually this is the form I ask open AI to uh to send it to generate and send it to

the target and then here is the target respon. So so you can see that here is a scoring system. So so and say that oh one is uh found and so it will stop the stop the uh the attack. So here is a sample of using hanging face. So actually is doing the same thing here. So I don't know. So the last last step of interest in uh AI. So here is the reference. So the first one I strongly recommend you read this book. The uh the first thing is a book. So I strongly recommend you read it. And then here is some wall and training course. So you can take um and also some paint

here. So if you interest just uh just a picture. So actually that's it.

Introduction to Red Teaming in AI and Exploring Model Vulnerabilities - Hebe Au

Related talks