
Hi everybody. Um, pleasure to meet you all and thanks for coming out. Great turnout. Um, my name is Owen Wickens. I'm a researcher here Sorry, and this is Marta Janus. We're both researchers in Hidden Layers basically ML research team. So, we look at ways of um, securing artificial intelligence systems, but part of that is also looking at how AI systems can be attacked. So, we're going to hit talk about the risks that AI systems face, how these systems can be attacked, and for what purpose. So, after a brief introduction to why AI is so important and how ubiquitous it's become, we'll look at the who, the why, the how, and then explain the classes of attack focusing on some real-world applications, examples, and mitigation strategies. So, the last few months have been, you know, nothing short of revolutionary. I mean, everybody's Everybody's heard of ChatGPT now. My My grandparents are asking me about ChatGPT, and when that starts happening, when they know what you're looking at at work, that's kind of a big deal. So, you know, ChatGPT's really taken the world by storm in a lot of ways, but we're not really going to talk about that today, but, you know, it's just important to kind of highlight how AI is becoming so present and prevalent within our lives. You know, Microsoft has Bing, Google has Bard, albeit to varying levels of success, but you know, I think it's taken us all by surprise how quickly this has become part of the the mainstream. But, it's not just LLMs, it's not just ChatGPT and the likes. It's also, you know, specialized image generation models, you know, such as DALL-E, Stable Diffusion, Midjourney, which have been redefining the creative sphere and, you know, allowing people with sentences to create incredible scenes and and and so on. And, you know, getting into some hot water with, you know, legal copyright issues at the same time, but, you know, it's it's easy to see that it's becoming the zeitgeist of the decade. So, you know, it can also be immensely helpful in both science and medical applications and such as drug discovery, mathematics astronomy medical imaging and like pretty much anything that you can think of. And hopefully it'll lead us to many more discoveries and breakthroughs. But, and so these here are just some basically recent headlines that we've that we found, but in fact, AI has actually become part of our lives much before that as well. You know, like it's been used in like more kind of like evident applications such as self-driving cars, obviously, and you know, cybersecurity for things like spam, malware, intrusion detection, um, also in applications like biometric authentication, you know, your phone, Face ID, you know, e-commerce, financial forecasting, even when you apply for a loan, like it's all all the documents you send in are being approved by a machine. And it's like quite interesting and amazing to be honest that, you know, ML has so much power in our lives that it influences nearly like I'd say a lot of the decisions that are made about us and for us during during the course of our day. But, with great power comes great responsibility, and we'll we'll dive into that later. So, first we're going to take a brief look at how AI works under the hood, and for that I'll pass you to Marta. Thank you. All right. Let's start with um just some basic terminology that we will use throughout this presentation just to make sure that everybody's on the same page. So, um, sometimes there is a bit of confusion between artificial intelligence and machine learning. Some people use those terms uh interchangeably. Uh, there is a slight difference though. Artificial intelligence is a more generic term uh that um describes any system that has a the capacity of performing some actions that human perform. Or, in other words, it mimics human intelligence or human behavior. Now, machine learning is the technique that modern artificial intelligence uses to learn from the data and to improve itself. Uh, at the core of each machine learning solution lies something that we call machine learning model, which is basically a decision-making system that is responsible for reading the input and producing um an output. So, a prediction or whatever this machine learning model is um providing. Why am I tapping the wrong side? Sorry for that. I think you're hitting space, so just hit hit this one. All right. Thank you. Um So, machine learning model is produced in um the process called training. Uh, so before before it can be used, it has to be trained. It's basically a results of running large amount of training data through some complex mathematical algorithms. Um, and um after the training, which sometimes takes more than one attempt, so the model has to be retrained with different parameters in order to um actually be more accurate. After that process, we have something that we call the trained model, which can be then put into production. In other words, it can be made available to the end user. That is made through an UI or an API or any any kind of access that lets the user query the model and receive the predictions. Um, the input that the the model takes is can be anything really. Can be an image, can be a video, can be a a binary data, or more recently um a prompt, a text prompt like for that ChatGPT for example. And then um this data is processed by the machine learning model to in order to produce an output, which can be a classification, a prediction, a real um real value number or or a text or an image like in case of large language models, it's going to be a text. In case of image generation model, it's going to be an image. So, that's basically how it works. It's a little bit like a a human brain. And um Most of common complex machine learning solutions use a technique called deep learning. There are other types of models as well, but we're not going to delve into it. We just want to um mention something that we will be mentioning later on. Um So, deep learning models are basically made of layers of neurons, another reference to human brain. And each model contains an input layer with which receives the input from the user. It's not the exact input that the user provide. It's a vectorized input, so it's basically an image translated into some floating point values or a text based on the text is text translated to those floating point values that are understandable to our model. That's the input layer. Each um neural network model also contains an output layer, which produces the actual prediction or the output. And in between there is various number of so-called hidden layers, which are um responsible for the processing of of the information. So, all the magic takes place there. Um, now um this technology is really powerful. It's a very impactful and it's very useful, but also it can be used against us. It can be exploited, it can be misused, or it can be used for malicious purposes. So, why why people could would like to abuse this technology and how? Uh well um the The answer to to the question who could abuse it is simple. We have our usual suspects here. First of all, cybercriminals who are actually attacking machine learning models not since today. It has been a good couple of years a good few years since first machine learning solutions appeared on the market where cybercriminals were already actively trying to mainly evade them. So, like for example, spam detection models were one of the first to use um machine learning in in the space of security, and the cybercriminals circumvented them pretty quickly. That's actually an attack against a machine learning model. So, trying to evade it like evade detection of spam or malware, that's an attack. And we have it going on for many years already. Other Other people who one might want to exploit it are competitors. So, some people who don't want to spend time on my or money on training their own models, they can actually try to steal the model from their competitors in order to get a cheap advantage. And on top of that, we have also sophisticated actors. For example, nation states that might use machine learning models for their nefarious purposes like misinformation, like um um manipulation of um of the public opinion and so on. Um The attackers can have various access to to the model Um, so in in case where the attackers have full access to the model that they attack, um including the training data and the parameters the model was trained with, we are talking about white box attack. Um now, this doesn't really happen often in the wild. It's something that belongs to the sphere of academic research mainly. And um yes, it's it's difficult to imagine attackers having this kind of information that is really sensitive and presumably really well protected. But uh it can happen in case, for example, of insider threat or um third party um third party contractor who was tasked with training the model. This third party contractor can be actually malicious and and gain access to this information. Uh the attackers can also gain access to some of the information doing open source research or uh by doing a traditional um security breach. Um and um there are some there is some tooling that the attackers can use. Owen? Yeah, cheers. So, um adversarial ML is like largely been within the realm of academia for some time now. I think papers started first coming on the scene back in the early 2010s. Um I think the preprint store uh not store but the preprint repository uh archive has or archive, if you pronounce it, um is uh I think up to about 4,000 adversarial ML papers now. So, it's it's it's expanded, but while it might seem that attacking ML models requires a PhD in data science or advanced statistics or something of the case, it's it's largely not the case anymore. And this is largely thanks to the the many free available attack frameworks, which can be used as, you know, uh pen testing and evaluation tools also uh or primarily, sorry, that have been released over the past couple years. So, some of the tools that we show here um implement these research-level attacks like I IBM's adversarial robustness toolbox. Others then act as abstraction layers upon things like uh like like art, um which allow kind of like an ease of use. So, uh you can make it a bit more Metasploity, if you will. Um others then, you know, augment image uh generation or or text generation. So, AugLy for image uh manipulation, TextAttack for attacking text, Armory for looking at other different types of defenses, and so on. But what are the attack the consequences when one of these attacks take place? Well, attacking ML systems can be profitable for all kinds of adversaries. And you know, ML is on course to be integrated into almost every industry. I think there was a study, I can't remember, was is it even CompTIA that said 86% of CEOs surveyed said that, you know, they they use ML as a valuable part of their uh you know, uh industry or their in within their company, sorry. Um you know, so there's little to no security measures or regulation around ML at the moment, although regulation is coming in. But you know, with any new technology, these things often you know, run ahead of us before we can catch up and implement security in its place. So, ML is kind of a little bit of a wild west at the minute, kind of in a almost a little bit of a parallel to kind of how, you know, antivirus was uh maybe about 20-odd years ago. Um and you know, the the consequences can of such an attack will be, you know, quite different for different types of targets, obviously. But you know, you you might have something as as kind of benign as a denial of service. But you know, when it's a denial of service that uh has the capacity to injure a human being, you know, the the the the severity can be much higher. So, to categorize these types of attacks, we have to consider the goals of the attacker and uh the point in in within the ML life cycle or the ML development life cycle at which they strike. So, as we mentioned before, um by attacking an ML system, the adversary will usually aim to do one of three things. They'll you know, attempt to alter the model's behavior. So, this could be uh to make it biased, to make it inaccurate, or even malicious in nature. It could be to bypass and evade the model. And this would be, for example, to trigger an incorrect classification or avoid detection. So, you know, if you think of like an antivirus model detecting malware, it would be coming up with a way of changing the malware so it's not, you know, detected as malware. So, it's classified as benign. Or, you know, or it will be to replicate the model itself. So, as Marta talked about earlier, this we can actually steal the model entirely just by querying it. Um or, you know, if you can figure out enough of the training data coupled with this, you can create some pretty high high accuracy knockoffs. And and and even with this, you're able to do these things we call oracle attacks, which I'll explain a little bit later. So, in terms of timing, uh the attackers can target the learning algorithm in during the model training phase. And this is usually by poisoning the data um or altering the training algorithms directly. So, this requires access to the training data or access to the the training to the training process itself. And we'll we'll touch on this in a short while, but this is quite important when you think of, you know, a static train-at-once model uh compared with a model that's learning continuously over time. And data poisoning can can be basically incorporated into a a model that's live. Um alternatively, the attackers are, you know uh if they're not able to basically get the the uh get to the uh you know, into the training phase, which they may not be. They may be able to hijack the model when it's in transit. You know, they could do things such as embed a backdoor uh kind of in a different context to what you're probably used to. What we kind of refer to this as a neural backdoor, which acts almost like a skeleton key uh for the for the model. This is where you would have a, you know, a particular piece of information that will force the model into triggering a certain type of behavior. So, you know, again, we'll we'll rely on a mortgage approval model here for the same instance. You know, if if it had a particular postcode in, then it would say, "Okay, always approve to this postcode." And you know, the the adversary could sell that to a third party. You know, they could sell access to this, you know, and and this is this is how these things can be brought in. Now, also, we can embed traditional malware inside the model and deploy it that way. So, we'll actually look at that towards the end of this presentation. And so, if attackers actually have no access to the training process or to the deployment, but only an ability to query the model, you know, via a REST API, which is super common, if not probably the biggest use case for a model, be it, you know, internal or external, they can still attack the model by performing what we call as an inference attack. So, inference is essentially used or inference attacks, I think we'll we'll be going to inference in a second. Yes. Oh, yeah, we'll we'll talk about inference in a second. But inference is basically where we can use evade correct classification. We can understand what's going on inside the model to create things to, you know, bypass it and and so on. Or we can extract the whole model and and steal it, um which is, yeah, yeah, which we'll again talk about. So. Yeah, sorry. Uh back to you. Thank you. So, let's let's look um more in depth on the um at the poisoning attacks. Um Poisoning attacks um are basically the attacks where the attacker can poison the data set that the model is trained on in order to make the model inaccurate, biased, or giving mali- malicious outputs, for example. Uh so, um in this scenario pictured here, uh we have a vision recognition model um visual recognition model which uh uh takes a picture and says what what's on that picture. And if the data set is poisoned enough, it can, for example, misclassify a picture of a cat as a turtle. That's a really benign scenario, really. But um uh with a little bit of imagination, we can, for example, think of security scanners that could misclassify a gun as something benign or other way around. And that that might have a profound consequences. Um In order to um poison uh the model, the attackers have to have um specific access to at least to the the training data. And in in some options like um uh in static or traditional uh machine learning uh scenario in which the model is trained just once and deployed once, this is not as much of a risk. Now, uh most of the models that uh are surrounding us in everyday applications are uh trained on live data, on the data that user inputs, that user provides. So, um we call uh this uh online learning or continuous learning or adaptive learning. Um and in this case, the model is more adaptable to changes in user behavior, for example. It's more flexible and uh yeah, it's it's it's a really great thing to keep uh the model um the model's predictions accurate. But it's also a double-edged sword because the users don't have to provide uh uh an honest data. They can modify their behaviors in a certain way, manipulate the data, and send uh a manipulated data to the model. And if there is enough of those users, the models can be skewed. So, uh those can be users or those can be even bots. Cybercriminals could come up with a huge networks of bots that are sending manipulated data to the model in order to um to change its behavior. Um and a um variant of online learning is federated learning. I think it uh it it's good to mention it just briefly um because it's used in many applications that are for example running on our phones, applications that are dealing with uh highly sensitive private data. So, federated learning was something that um uh that is it's something that is supposed to um address the problem of privacy. Um although it's not uh perfect, but at some level it it does addresses it by um training the model on the user's device. So, the data that we input into the model doesn't go anywhere. It stays on our device and the model is trained on our device and then sent to the cloud to be merged with um with the global model. And in this way, for example, uh face recognition in Apple Photos works. So, that's why Apple says like we are not sharing this data this data stays on your phone. Uh it's um it's a great thing, obviously, because it um it attempts uh to preserve the privacy even if it's not perfect. Well, nothing is perfect. But, it also opens up uh for uh to to attacks uh by uh by malicious actors who can, for example, manipulate the model the trained model on the device, which is then sent to be merged with the global model. And uh at the moment uh probably there is not much uh of um validation made on the data set that is coming from the user or on the the models that are going up to the cloud to be merged. So, in in this way uh attackers can also uh try to manipulate the model. Uh so, most um stark example of uh a crude attempt that uh data poisoning is uh the Microsoft uh chatbot Tay, which was released in 2016 and lived how many hours or maybe a couple of days, I don't remember exactly right now, but it was shut down pretty quickly because users started sending um well, users just were were being users, basically. They were not even malicious. They were just interacting with the bot in a way which made the bot uh racist racist, biased, malicious, and uh obnoxious. And uh Microsoft had to take the bot down uh immediately and uh rethink their way of um training training the the chatbots. Now, uh with the the next generation of chatbots, which is GPT-4 chatbots, uh this is becoming also uh a problem. And um we can already think of um of nation s