AI in Minefield: Learning from Poisoned Data

Name: AI in Minefield: Learning from Poisoned Data
Uploaded: 2022-03-08
Duration: 42 min 31 s
Description: This presentation was held at #BSidesBUD2021 virtual IT security conference on 27th May 2021. AI in Minefield: Learning from Poisoned Data - a presentation by Itzik Mantin & Johnathan Azaria When the traffic originates from unreliable sources the learning process needs to mitigate potential reliab

BSides Budabest · 202142:3139 viewsPublished 2022-03Watch on YouTube ↗

Speakers

Itzik Mantin Johnathan Azaria

Tags

TopicWeb AppSec

StyleTalk

About this talk

This presentation was held at #BSidesBUD2021 virtual IT security conference on 27th May 2021. AI in Minefield: Learning from Poisoned Data - a presentation by Itzik Mantin & Johnathan Azaria When the traffic originates from unreliable sources the learning process needs to mitigate potential reliability issues in order to avoid inclusion of malicious traffic patterns in this normality model. In this talk, we will present the challenges of learning from dirty data with focus on web traffic - probably the dirtiest data in the world, and explain different approaches for learning from dirty data. We will also discuss a mundane but no less important aspect of learning – time and memory complexity, and present a robust learning scheme optimized to work efficiently on streamed data. We will give examples from the web security arena with robust learning of URLs, parameters, character sets, cookies and more. https://bsidesbud.com All rights reserved.

Show transcript [en]

hi everyone thanks for joining us in this stock ai in a minefield learning from poison data my name is jonathan ozaria and i'm a data scientist at imperfe for the past few years i've been creating ai-based algorithms to protect against web attacks in my previous role i was a security researcher here joining me in this talk is it sequential lead scientist at imperva i'm mitch mantin a lead scientist at imperva in the last 21 years i've been innovating in security algorithms and their intersection i love math i love algorithms and i really enjoy the game of understanding threats and designing mitigation we start with speaking very quickly about ai and all the risks that are coming with ai and then we'll

uh dive into the the threat landscape of ai uh then we'll zoom in to uh data poisoning threats why how and when is this threat applicable and what can we do about it then we will talk about how the data poisoning threat is is effective and is meeting can be mitigated in the world of web or api uh security uh and we end up with uh summary and conclusions uh no doubt we are in the ai era artificial intelligence systems are everywhere ai technology is changing almost every domain of our lives and ai era is in fact also the data era because the data is the fuel that that fuels that makes this ai systems ai

technology work and both ai era and data era they have great contribution however this contribution comes with several caveats which we usually tend to ignore or at least to underestimate and uh in this talk we'll we will dive we will discuss at least some of them um we start with speaking about the risks of ai so ai well it's a new attack surface we'll talk about it it also provides uh tools new tools not only for for the good guys but also for the bad guys for automation data mining and attacker insight but i think there are two significant risks that are really uh important and should be at least mentioned in every discussion on on ai

first one is a deep fake the ability of actors let's uh soon for the sake of this discussion malicious actors to synthesize images or audio or or video scenes that look very very authentic uh which can include of course uh politicians or uh other people that uh whose opinion uh matters and i think that uh today we only see we're stuck we are only starting to see the the impact the potential impact of this uh potential attackers technology and it will only get a get worth in the future i i'm confident about that the second one is a idea discrimination or actually discrimination by ai and ai technology usually uh takes data from the past and

wants to predict the future so it assumes that the future in the past are alike but if we had anything uh bad or incorrect in the in the past uh like uh biases uh for minorities uh or uh racism and and stuff like that then ai has the tendency to perpetuate these biases and uh for example if there is a certain neighborhood where people took loans and didn't return it then uh new people from this neighborhood that we want to take loan will will probably get a higher interest because they will be marked as a high risk regardless of what is the actual thing or the profile of this particular person itself this is kind of a of a discriminating profiling

system most of you are probably familiar with the gartner hype cycle for a new technologies and you can think also of the security life cycle for new technologies um at the beginning there is uh technology is being developed and all the developers are are very excited with the opportunities and the applications and they focus on all the technical obstacles that they have in order to make this work and they pay only little attention or no attention at all for security and at some point someone discovers that in some cases with particular inputs this technology behaves in an odd way and then another vulnerability is being discovered another one and at some point sometimes uh people start to ask

themselves whether this technology will ever be a usable in a safe manner and then we we get to once we are in the in the bottom of the curve security researchers and domain experts start to work together or alone to and develop methodology to give to understand what are the attacks to model them to give them names to develop mitigations and we get to this into the the healthy slope of uh development uh until the uh technology becomes uh usable in in a safe manner and then still there are there is dynamics of new threats new mitigations etc etc uh but but essentially the the technology is um is a safe and stable uh and we've seen that uh for web in the

80s and for mobile in the 90s and and we have it in ai today uh we are only now seeing uh we started starting to see um more and more discussion about the threats that our ai systems are um are exposed to uh and this is a healthy discussion that i think will bring us to uh better better understanding of the threat and mitigation so this is a typical machine learning system uh it has uh in the training phase training data is fed into an algorithm that builds a model this can be a classification model regression model it can be trees forest a neural network whatever and then in the inference phase input data is being fed to the model and the

model makes decisions on it class a class b dog cat etc in some cases the the prediction that the outcome of the model is being uh evaluated in order to continue and improve the uh improve the model now looking at this uh uh system from an attacker perspective now if the attacker is an insider or if he is an outsider that found his way in by fishing a malware infection whatever then the sky's the limit you can do whatever he likes he can steal the model you can temper the model you can temper every particular decision that the model tries to make uh he can steal the data he can temper the data he can do whatever he likes

however even when the attacker isn't is not an insider still there are pretty many things he can do uh it can do a ml invasion sometimes called deception sometimes adversarial examples many of you probably are familiar with this example a very concerning example of a traffic sign a stop sign that when um when someone adds a couple of stickers to it then suddenly the uh the ai engine of an autonomous car looks at this stop sign and he says okay this is um a speed limit sign which of course can have devastating consequences and uh from what i i looked every time the researchers tried to to build adversarial examples for a machine learning system it was it was pretty

easy uh thing to do uh the second threat is training data poisoning we'll speak about that in in more depth later on uh the third one uh training data leakage it's slightly more esoteric threat but it is still applicable uh during the training uh data from the training data is sometimes embedded somehow into the model and there are ways to uh to extract this uh this data that left that linked to the model and recover it and when you use the sensitive data for the training health records pii things like that then this risk should be considered and uh and addressed properly and so how does data poisoning work on the left side you see a typical um

a linear uh classifier it aims to separate in in the best way possible between the the red triangles and the blues and circles and it does that uh pretty well however um you can see that even if you change a single point then you get a completely uh a different classifier it still does pretty good job but it is very different and indeed i think one of the things that characterizes the machine learning system that sometimes very small number of data points uh can have great impact on the on the model and if uh the these data points are added deliberately and now look at the on the right side on this uh again linear classifier between

the red and the blue then an attacker will want to to put to create new data points in a place that we will confuse the model as much as possible and and indeed on the right side what you see is uh that with these new uh red dots right there data points then the model is completely useless it does pretty lousy job uh separating the uh the blue from the from the red now data poisoning is uh it's pretty new methodology and i i mean i only uh heard about this notion a couple of years ago but i think this is not really a new threat this is very old threat almost as old as

the internet because i'm i i'm i believe that like myself when you look at um at a review a trip advice or a high review then you ask yourself is this a real review or is this fake review by the hotel owner and tripadvisor ask himself the same thing and take action in order to try to minimize the impact of such a fake reviews and in every rating system it can be a travel rating system like um a trip advisor booking or google uh it can be e-commerce rating system like amazon it can be even movie rating system like netflix or or imdb it is subject to data poisoning and and people understand that and the

owners of these systems and understand that whenever you have data coming from outside and whenever there is a motivation for someone and if you're doing something significant then there is motivation for someone uh to uh to change your decisions then uh the threat of data poisoning is there and is probably being realized uh one of the first battlefield of uh of data poisoning is the area of uh spam filters maybe probably one of the reasons for that is that this is one of the first places where cyber security technology based on machine learning was proven effective so what you see here is a model skewing attack on a gmail spam filter the attack include the attack included

massive amounts of spam emails all of them labeled as benign by the attackers and uh they included patterns words that um were later probably planned to be included in a in a spam campaign later on and what the attackers wanted to uh to obtain is to make the uh the spam filter the classification model uh to misclassify all this uh struck this sort of messages as a legitimate as benign uh and to pass so that the the spam attack will be successful so this is you can think of that that's a sort of a backdoors uh uh in the model that was their uh their intention this is why the bay was in 2017 2018.

uh another example another attack this time is this is a a research paper on the spam based spam filter this time that the researchers took slightly different approach they decided to try this they carried out an availability attack they wanted to make the model learn incorrectly so they took a collection of words probably very popular words that are uh characterized many many uh legitimate email messages and they created many emails that were classified as spam and pushed these uh emails uh to the to the spam base spam filter and uh the impact of this attack was that uh even with control of one percent of the messages and used for the training then the researchers were able to make the

model uh classify 80 of the benign messages as spam and 95 of the benign messages as unsure both numbers that rendered this uh model completely uh uh useless uh and whatsoever uh so now probably like uh i was in in the past you're you're saying this and you're saying okay so the problem is that we gave the attacker the opportunity to do the labeling this is untrusted labeling if we make sure that the labeling is done in a trusted environment then we're good right wrong in this version of uh data poisoning attack called clean label attack uh here the attacker does not have any uh uh any control over the labeling process but only on the on the data generation

process the victim here is an image classification uh algorithm what the attacker wants to achieve he wants images of fish to be classified as dogs and the way to achieve that uh he takes an image of a dog and now he he crafts invisible noise invisible for us of course and adds to this image of of a dog for us this as you can see the images still are look exactly like a dogs all these images however this noise that was added uh actually is being used by the model to um is considered by the model in a way that makes the model later classify these image images on the top of fish he will classify them

also as dogs and here even if we have the labeling happening in a secure environment by someone we trust then what you will see in all these images is is a dog and the attackers need zeroing to have zero intervention in the labeling process and this attack is still very effective uh it was succeeding in more than 95 percent of the uh of the cases with and the classification was done with very very high confidence which is also concerning so uh we understand the threat uh what about mitigation so uh there are several pretty straightforward natural uh uh means to uh uh to limit the the risk of data poisoning one of them is to filter a

suspicious data uh for example data coming from suspicious origins or ip addresses or from uh as suspicious say users when you have users and authenticated users in the system uh when the the the when the data comes from suspicious clients maybe from from bots um and also and another uh a mitigation technique can be to do fault tolerant data sampling to limit the impact of data points that are arriving from a single entity uh limited impact means maybe take only few data points from there or somehow lower the weight that it can or limit the weight a single uh entity can uh uh can create uh now what exactly is entity is is that that depends really on on the problem on

the domain on the algorithm on the model on on many things for example in the tripadvisor case it can be a user it can be ip address uh it can be something like that other less effective mitigation techniques diff tracking look for significant difference between the new model that is was just generated and previous model and if there is a a high difference high distance then we can assume that we are under data poisoning attack or using a golden dataset as a reliable benchmark data set that we know for sure what is the right prediction classification for it and then we assume that whenever there is a mistake there uh then the model was we are under data poisoning attack uh of

course when you have detection you still have you have to do something with it uh another uh i call this a pseudo-mitigation is to assume that uh the attacker does not know what we do he does not know the model he doesn't know the algorithm then he will not know exactly how to uh how to attack us what the so-called security by obscurity hearing besides we know we know that security by obscurity is very very rarely proved itself as an effective uh security mechanism so um i really recommend uh as not to rely on on security biosecurity to to protect your systems so what we had so far so data poisoning is a significant threat on learning

mechanism and the third is critical when using data from untrusted sources only some of the domains are cyber domains cyber security domains like a firewall spam and malware detection that use ai and rating systems like travel systems and e-commerce uh are subject to data poisoning uh threats uh and unfortunately there is no silver wolf mitigation there are collection of of mitigation techniques that when used together can uh throttle the attacker to to a reasonable uh uh to reasonable extent thanks etique okay so how do we secure web applications and apis well here on the left you can see the outside the threads along with the rest of the users on the right you can see the

applications and apis and in between we have the wef the web application firewall this component looks at the traffic and it knows how to differentiate between malicious and benign traffic and it knows also how to block the malicious traffic so how does it do that well that's two approaches we can either use a negative security model or a positive security model obviously we can also combine both of them together so a negative security model is a rule based or signature based model the idea here is to say we're going to create a rule or a signature for each attack and then if the weft detects an attack using these rules or signatures it knows how to

block them so basically you can say everything is good except for what is bad the good thing about it is that it's accurate and it's precise because we know which attack we're trying to block and we're writing a rule specifically for it however if you don't know what the attack looks like like in the case of zero days we cannot write a rule and also we have to continuously write new rules for new attacks positive security model is kind of an anomaly detection model the idea here is that we create a baseline profile for our website or for our api and then we say okay this is what we expect the traffic to look like and

everything that is different from that is an attack okay so the good thing about it is that we can detect zero days because zero days are going to look different from the normal traffic the problem is that we are always in the risk of introducing false positives okay so basically everything is bad except for what is good except for what matches the traffic now there's an additional problem and that is that we need to learn this baseline profile using the traffic that we see coming to the website to the api the problem here is that the people generating this traffic can be normal users but also attackers and hackers this might lead to a case where

we experience data poisoning and we end up creating a baseline profile based on attacks which completely collapses the whole idea of a positive security model now how do we actually create a traffic profile well let's start with the object an object is a carrier of traffic for example questing parameters body parameters cookies and so on okay that's what actually carries the traffic itself a container is something that is associated with an object so for example a question parameter is associated with specific url right a body parameter is associated with a specific method and a specific url a cookie is associated with a website and so on and so on so a container can be a

url host method or a combination of all of the above now in this specific model we're talking about a case where each object checked has a single container and a single traffic profile so what is the traffic profile that's a one actually in it actually represents the parameter itself so we have things like type can describe meter repeat itself is it optional mandatory its size charset and so on finally how do we deal with the threat of data poisoning where we're not trying to create a web profile or an api profile well first let's start with cleaning the data so anything coming from suspicious ips any suspicious events anything that you don't trust for any reason which is

gonna throw out completely once you're done with that we're gonna do something that's called threshold learning we say we trust something only if we see that it's coming from a lot of different places okay so we can be sure it's not this one single attacker that is trying to you know to create an impact so i say okay in order to learn something we must see requests from a lot of different ips a lot of different user agents geo locations and so on and so on okay so basically saying if we see that many ips and many user agents access a specific url then that increases the chance that that url is legit now once we're done with that we move

into enforcement we can say anything that deviates from this profile that we created is an attack however this is easy to do in batch processing okay just have all the data and you go over it and you know you create you group it and you know how to extract all these different counters however it consumes a huge amount of memory and it's not always a viable solution for example here at imperfect we deal with 400 billion requests every day so it's not that easy to do batch processing for so much data so i'm going to hand it over to izik to explain about his solution to this problem thank you johnny uh we go now into a

completely different world and completely different example uh a dog food tastiness uh challenge uh here we want to uh we have two brands of dog food pedigree and thero he wants to uh to make a poll to know which one of them is uh is good and tasty uh and we are running a poll we have 20 participants and we got 12 likes for uh theo and six likes only for a pedigree however we we don't want to we want to do a robust learning threshold learning so we define uh two thresholds three uh cities and three breeds and here only uh pedigree passes and the reason that only pedigree passes is that what you can see here uh in the in the

red part of the table is that uh teo doesn't have any votes from sun bernard and any vote from san francisco uh however uh pedigree has actually votes from all the three cities and all the three uh breeds of uh of dogs now uh the reason for this bias is that uh in the blue part uh of the table you can see that uh we had pretty many pomeranians and we had pretty many new yorkers ten and nine and uh all the people that were from new york and had pomeranians actually liked theo uh and we have one two three four five six uh voters that created uh this bias which is exactly the the thing that we

wanted to uh the phenomenon phenomenon phenomena that we wanted to limit its uh impact on our learning system uh so how will we run the threshold learning on on this data so we have a pedigree this is an object you want to learn about tasty tastiness is a fact you want to learn about the object city and breed are two attributes and three are is the number of um at the threshold that we are using for them and in the bottom we have the sets of all the cities from which we've seen indications for tastiness for a pedigree which at the beginning is of course empty so uh we don't have any data so we don't

accept the tastiness of of pedigree same thing for teo same structure and no data and and nothing is is known by now and the same thing for a new another fact whether the dog food is nutritious now the data comes in and now uh we are seeing that um we have uh three cities and three uh breeds uh that are suggesting that they like the tastiness of pedigree and then a pedigree passes the two tests for a tastiness and you know we decided now that pedigree is tasty however for teo it doesn't work because they have only two cities and only two breeds so they didn't pass any of the tests and they are not approved as

tasty so this is the data structure that we use and now um if we are looking at that from a memory consumption perspective then the memory consumption is proportional to the number of objects that we have uh to the facts or the properties that we want to to measure to the number of attributes that we want to to apply threshold to them and to the thresholds themselves because they they and they have an in an impact on the the size of the sets however the important thing is that they are independent of the size of the data which is what we wanted to achieve in the first place uh and what we can learn here using this

approach is actually uh boolean facts that in object x in this case a dog food brand has a property y which that is tasty uh it can either have or and not have now using this uh framework of boolean facts uh it goes it's pretty straightforward very natural for our profiling system uh what you do during the training you take every data point you extract uh for a fact uh whether this fact was seen and then you take all these fact x scene flags and uh together you decide if you pass all the thresholds that fact x is allowed in the profile and if fact x is not allowed then you add to the profile effect x prohibited

during the inference and now you have a profile set um what you can do is for any new http uh request uh you can extract again the flag whether fact x is seen and if you have a fact that is seen but is prohibited then there you go you have a violation you have an anomaly and then you can do whatever you do with uh with violations uh the question is is this enough uh can you really what can you really express with with the boolean flags and for that johnny will elaborate thanks etic i'm going to explain how we can express profile features with boolean facts okay so the first and easiest things to

express are objects and containers so talk about things like digital locations urls endpoints hosts methods and so on so how can you do that well quite simply just simply say if urls and if methods actually exists okay so you can say is a certain url accessible does a certain url expect to see a cookie is a certain method allowed in the context of a specific url given a url and a method do we expect to see a specific parameter okay and so on and so on so i basically just profiling the site itself we're saying which urls are available within the website which headers are available which methods are available for each url and which

parameters do we expect to see for each url and method and so on okay now that's quite easy but what about the data types the ranges char sets regular expressions and so on you know the interesting stuff okay let's talk about type by using boolean facts we can quite easily decide the parameters type we can say if this parameter is a number okay if this parameter is a string if it's none if it's boolean and so on now because we know that let's say a specific parameter is a number and we didn't see examples of anything else or we barely sign examples of anything else we can reach a conclusion that not only is it a

number it's also a not a non-number in other words we can say that none numbers are prohibited okay so that was a bit confusing so let me explain let's say if we're trying to figure out that a certain parameter is of type string well by looking at traffic we end up reaching conclusion that string type is allowed however as time goes by and because we didn't see anything that isn't a string or we barely sign anything that isn't a string remember we're dealing with threshold learning we can reach the conclusion that num types for example are prohibited and that non-string types are prohibited okay now that's the actual enforcement that's the actual mitigation itself now if

let's say we see a parameter that contains in the number 23 we are not going to let it through we are going to block it because as we said num types are prohibited any other string is going to go through because we did not prohibit strings okay another example is let's say if we figure out that a certain parameter never has a value okay so what we end up learning or the important part for us to learn is that none nones are prohibited in other words we learn that it's prohibited for this parameter to carry any sort of value and so if it does we can block it now let's move on to regular expressions

the idea here is quite simple you create a bunch of regular expression facts such as male regular expressions an ip address regular expression and so on and then by looking the traffic you can learn if a male regular expression is allowed or prohibited if an ip address is allowed or prohibited and so on so for example if a certain parameter is of type mail address we learned that string type are allowed and male regular expression is allowed but the important thing is that we'll end up understanding that non-male regular expressions are prohibited because we we didn't see any traffic containing any traffic that matches the male regular expression or we didn't see enough in order to decide that we are learning

it okay and that will lead us to a conclusion that if we receive a value that doesn't match the male regular expression we're going to block it so things like abc or just a number are going to be blocked okay what about multiple occurrences optional parameters mandatory parameters well the idea is again the same you can create a boolean facts for these things so if it's mandatory param we'll end up learning that the parameter cannot be missing and so if it won't exist within a request we're gonna just block it okay finally let's talk about char sets well the idea here is that we can say that let's say a certain type of character is allowed or prohibited

for example you can say none letters are allowed okay non-digits are allowed and so on we can even talk about complete char sets we can say non-base64 are allowed or in other words we don't expect any of the characters to not match the base 64 chorus it okay we can even focus on very specific ascii characters we can say ascii 21 is allowed or prohibited ascii 23 and so on okay so let's show an example let's say if you reach in conclusion that base 64 is the charset of a specific parameter we can end up learning that none base64 is prohibited and so if a certain parameter value will contain a character that isn't part of the base64 charset

for example asterix we're going to block it we can also be a bit more specific we can learn that a parameter is always composed of the alphanumeric character set and also semicolon and column so we learn a few things which on that relevant what's important is that we'll understand that non-strings are prohibited and that all these other ascii characters that aren't part of the alphanumeric charset and on the semicolon or a colon they're all prohibited that will lead us to conclude that if let's say the parameter contains a double hyphen or something like that we're going to block it because hyphens are prohibited in this case so finally we reach this a very interesting boolean fact that we can learn talking

about param sizes for numbers and length for string okay so obviously because we're talking about boolean facts it's very problematic to use continuous values because we're going to have to create you know an endless amount of a boolean facts however we can discriticize them and work you know with a a small in numbers okay work with very specific numbers so in this case we can say for strings instead of going all the way you know from a minus infinity to a positive infinity you can say okay let's talk about is some values that are good enough okay so when you're talking about strings let's say we're talking about comment we expect the comment to be you know at

least five or ten characters of length and it can't be longer than i don't know a few thousand characters now usually the attacks uh in this these domains for example maybe someone will send way more characters than expected ukraine's to create some kind of an overflow okay if let's say we're talking about ids so maybe the ids have a certain range they start from i don't know a few thousands and go up a few hundred more than that and maybe someone will try to attack it by maybe sending a negative id or a very small number a very large number so as long as we can create some kind of a range it's good enough and it doesn't really

matter if you know if we are off by one one or two digits okay so how do we actually apply that for example if a certain parameter length is between 34 and 345 well we can learn that the character the parameter can be greater than 5 characters and greater than 50 characters and we can also learn that it cannot be greater than 500 that if it's greater than 500 than it is prohibited and if it's less than 10 then it is prohibited we're going to effectively create an allowed range of between 10 to 500 which is close enough to a these values of 34 to 345 this means that if someone is going to send a string that is less than 10

characters or a string that is greater than 500 characters it's gonna get blocked now finally what do we do when boolean facts are just not enough so when can that happen well for example maybe parameter isn't it isn't behaving the way we expect it to so for example maybe a parameter matches both male regular expression and ip address mail ip address regular expression how can that happen no idea but if it does we're going to want to make sure why we're going to want to figure it out okay another example is that maybe the parameter is very sensitive for some reason okay and then we're gonna want to make sure that it's 100 secure so how do

you do that using this system well in this case we have a very simple site it has four urls info about login and contact us okay we managed to create a traffic profile for info and about and contact us has one query string parameter which you manage to create a traffic profile for too however the login url has two methods get and post get has a question parameter which we manage to profile but post well in post there's a body parameter that's very important that body parameter is the username and the password and we want to make sure that everything is nice and tight okay so how can you do that well using this system you can already know

what the traffic distribution looks like and we know that let's say we get 200 requests and they are distributed as following 50 40 80 and 30. so login gets 80 requests and they split evenly between the query string parameter and the body parameter so we can say okay we know that we receive 40 login requests with username and password and 40 is a small enough number for us to deal with you know cpu wise and memory consumption wise so we can decide to log all the traffic all these different login requests and then just analyze them offline you know do the batch processing and all this say you know all this traditional way of extracting information

and then we can end up creating a profile very specifically you know for this method and for these body parameters and we can use it as part of our defense mechanism thank you johnny so summary and conclusions data positioning is a significant threat on learning mechanisms uh threshold-based learning may provide an adequate robust learning solution the boolean facts framework that we presented provides a streaming friendly implementation for threshold-based learning and although this framework at the beginning looks uh very limited and many features can be expressed with uh brilliant facts and now we have uh a couple of minutes for uh questions

AI in Minefield: Learning from Poisoned Data

Related talks