
next up we have Tanya s sadani uh who will be talking about how adversarial noise protects my selfies from the AI based deep fake tick Tok dance TR so let's welcome her to the
stage hi everyone um so welcome to my talk on adversarial noise and deep fake technology um so let's get started u my name is Tanya I am an AI security researcher at maleva security lab abs and my background was in computer science I'm currently still researching at theu looking at AI safety but be before coming become before becoming an AI security researcher I worked in governments in several roles um in technical roles as a data scientist and in strategic roles in policy teams um but that brings me here today with you guys to talk about deake Technologies uh before I get started I just kind of want you guys to have a look at the screens so one of these
images is not like the others um and for those who can Spot the Difference come to me after the talk and I've got some coins but by the end of this talk you guys will know what I mean by one of these things is not like the others and what this has to do with AI security um and defending against the misuse of deep fake Technologies so on today's agenda I hope to go through the problem so share some Trends we're seeing in deep fake misuse um the idea so what is AI security and why is it relevant to um defending against deep fake misuse uh and an application of this so using adversarial noise to prevent
against non-consensual deep fakes and then like ending on a fun example applying these ideas onto the Tik Tok dance filter so what really got me motivated to start this project was all the news we've been seeing about non-consensual deep misuse um as you can see on the right we have an example of those kind of like tools that can be made can be used for this misuse um the screenshot on the left I guess from your perspective um takes an image of a person and undress thems um and in a recent New Zealand report they found a telegram chat where over 600 women had their photos um unknowingly used for for this purpose um and that's what got me really worried
um and we're seeing more and more of this misuse happen as AI gets more accessible so my image my choice is an ad advocacy body that is trying to find Solutions against non-consensual deep fake misuse um and they track over 600 apps and services who provide this like non-consensual misuse uh service um on the clear web and they found found that 80% of these tools were developed and made in the last 12 months and the trends are showing that this is not slowing down um actually out of can I get a show of hands who would consider themselves like a machine learning or a data science person in this room yeah okay so for those people you
might know like what hugging face is but for everyone else hugging face is a pretty common space where data scientists put their models and host their models um and what I found really like shocking is the wrong word but sad I guess was that models on there like were like advertising the fact they can that they can be used for not safe for work purposes um when I first heard the news about you know people getting being victims of image-based sexual abuse I thought you know you'd have to go to some like Cy place in the internet to get someone to do that for you but no it's just kind of out there really easy
to use um so what are we doing to address this misuse uh we've got people working on developing water marking techniques on detection techniques um there's a lot of like interesting and good policy legislation and educational Solutions um and these are important uh for remedying the solution but don't necessarily do anything for preventing uh the creation of non-consensual deep fakes uh which is why I really wanted to focus on this kind of um solution the detective Shield or what we'll call today the adversarial noise um where you add some sort of filter onto your photos so that machine-based image processing things can't process them as intended um I really was motivated to work towards a
tool or solution that gives the user more autonomy over their data so that they can post and like interact in the web without fear of their images being misused um so that's what we're going to be talking about today but instead of um to make this a fun talk um for you guys to enjoy it we'll be applying it on the Tik Tok dance filter um so does anyone does anyone like kind of know what I'm talking about no no well let me play a snippet and talk you through what it is oh I can't play snippet anyway so the Tik Tok dance field so is a trend that happened about a year ago um and it lets you like take
one image and make the the person in that image dance to a Tik Tok sound of your choice and it's really fun um actually no this is where I can play it it's really fun um and it's been used in many places but for the sake of this talk our goal is to be able to post photos um but make it so that my friends can't use those photos to make me dance to Jojo Sea's Karma song um so yeah that's the goal for today and we'll start with a quick overview of what AI security is and why it's relevant to this idea and topic so AI security is the practice of securing our AI systems from threats um AI
systems are special because we we have to consider all the standard cyber security threats they're vulnerable to as any software system is is but in addition to that they're vulnerable to AI specific threats uh that take advantage advantage of unique um weaknesses that AI components have uh in general these AI specific threats can do can induce kind of three fail modes it can uh disrupt the system so make it stop working uh it can deceive the AI component so you can make it act or like predict in the way you want it to and then you can get it to disclose or leak private information the key idea like that was a really brief overview but the key idea I
want to like share with you guys today is that adding AI to a system increases its exposure to other types of attacks I mean this might be like really intuitive adding anything to a any component to a system increases its um attack surface um but I think this is something I've in my experience that both data scientists and cyers Security Professionals tend to underestimate oh and this is for two reasons one it's super easy to overlook all the extra dependencies that AI systems have um and all of these like different dependencies can be manipulated and change to influence the system and get outcomes you want when you're like dealing with like a model API or maybe the model packaged up you
it's easy to forget that um it relies on a lot of like open or software um maybe pre-trained model weights a lot of data um all of these usually taken publicly because they're so big uh which makes it kind of like Prime ground for it to be manipulated for malicious purpos purposes and the second thing um there's like a bit of an informational Gap around the risks posed by these AI specific weaknesses um and this is not necessarily even on a person level even though we know more and more people are being aware of AI security which I'm happy to hear um but on a system level and an organizational level so I don't know if anyone's heard of the T um like
Microsoft's tbot uh but there was an incident where people were able to influence the model to say like non like not great stuff um and I think this is just kind of an example of these systems not being able to catch the risks and additional like vulnerabilities of AI systems um the process that we have currently today don't account for them um I'm talking to you guys about def effects today but that's kind of what I do um on a day-to-day that's my job is to kind of look at the examples of models failing in the wild and see if we can learn anything from them and help organizations um prepare themselves and adjust their processes to account for
these AI risks uh so these are other examples of attacks um against a systems and I think for the keeni people out there I've chosen the examples that kind of fall under the three failure modes we talked about before so we have an example of disruption we have an example of uh deception and we have an example of um disclosure so today though we're going to be using the ideas Behind these different kind of um attacks and incidents uh to defend against non-consensual deep fake misuse so these are like generally the five steps of any deep fck technology I've highlighted in blue the elements that have machine learning in them uh which makes them like ideal like ideal targets
for like an AI security attack um where there's a lot of variation for deep fect Technologies is this kind of third face merging section there are very different approaches to it uh all with its pros and cons but today we're going to be focusing on a specific model and this is in swapo 128 the reason why we're focusing on this model is because it is so easy to use and find when I Googled like deep fake on um GitHub like the second most popular app has this in its like back end and the screenshots I have are like other tools that use this model in the back uh the reason why it's super easy to use and find is just because it's so
convenient it performs out of the box without any further training it only requires one image to create like video and image defects so you only have to have one photo of your like Target um it's really quick and easy a cheap to use um computationally so I was able to like run this locally within 5 minutes of like setting it up and it produces really high quality results so because of its ubiquitousness uh this is why I'm focusing on this model today um as fun s side law um this model was taken down because of ethical concerns so we don't actually have an original source for this model instead there's just like copies flying around
um all the different hashes don't like and no clear information on versioning uh which has some people worried about you know this is another potential concern uh with downloading machine learning artifacts from from the internet uh but something I just found super interesting there's not much information about it there's no official source for it and there's no way to further train it um cuz the original like developers have taken that all down but for those who are interested from inspecting the model it seems to be some sort of variation of a Style again if that means anyone anything to anyone in the crowd so my project aim is to use this adversarial noise idea to defend against
defend our photos against defect technology so what is adversarial noise adversarial noise is a pretty wellestablished AI security technique that can help that can dis uh disrupt and deceive models so it takes kind of the input to your U machine learning model you make small intentional choices to try and get the outcome you want you can kind of view it kind of as a search problem you're searching for the like change you need to make to your input to get the output you want it's been used in many other domains I would say this is one of like the original offensive AI security techniques back in the day and it's been used against like facial
recognition object detection it's been used in the audio domain for sound related things but today we're going to be repurposing this something that's traditionally been an AI security attack into an into a defense against non-consensual generative AI so my project kind of started with an overview of the tools that exist but I quickly found that there is no existing tool that is fit for purpose for defending against inser um because all these tools um aren't quite right they all have a bit of a different use case for example ones used to kind of disrupt models that go from text to image and other's used um another tool is used to like help artists defend their images against copy
like being copied by models so like not quite right they all have kind of varying requirements for the like for the method um some require you have the model and be able to run it locally um which is not really the use case we're looking out today either all of them like Target a specific version or type of AI um and they haven't tested it uh to see if it transfers to other types of versions of AI um but even though my hypothesis was none of them were fit for purpose I thought it's super important to kind of Benchmark um and see how good these tools are out of the box so these are the tools we're looking
at today um these photos have had noise added to them I'll show you this is the images without the noise and this is with the noise um they all appr it a bit differently uh forks and Loki were int um initially made to full uh facial recognition the idea being if you post online and some thing scrapes your images online um they want to make it so that a CCTV footage of you can't be matched to your social media uh on the other hand we have photog guard and Mist so photog guard and Mist they are targeting those kind of big uh image generation models we're seeing today they were made um for artists or to kind of prevent them from like
having their Works stolen but I also just wanted to put a side note um just because these tools I wanted to focus on the tools that were like aimed towards like a wide user uh not necessarily like a data scientist but because they're out just because they're like out of the boxes and make them user friendly uh none of them actually worked out of the box all of them had issues that I had to like fix and that's not necessarily accessible for somebody who has doesn't have the time to do that oh one of the tools glaze that people might have heard of I was super excited to use it um but they actually don't I had bugs and they don't
publish their source code and they OB obate the code so we weren't able to test it but I think this just shows again these tools aren't necessarily accessible but so let let's go through the results so I put in an original photo I use a photo with just random noise as a benchmark um and I also put in the photos with noise added to them to our deep fake tool to see what outcome we'd get um and this is the outcome we got um so you can see a bit of like changes to the face specifically um for the ones for Loki and forks I think the reason behind that is because the tool itself
was targeting a model that had a very similar style to ins Whopper whereas on the other hand photoart Mists uh were targeting kind of the big diffusion models that inser wasn't as a fun so the ways I measured disruption were there were two ways one I measured the change in the pixels and two I wanted to I used as a proxy a machine like face recognition model to see how different the faces were and forks and Loki really disrupted that facial recognition model um because it was what it was designed to do in the first place so I just found that as like a useful kind of result um but again I think like visually they were like they did make
changes to the face uh but they w't weren't super effective as they could have been I've seen examples of adversarial noise be more effective um and I think this is coming back to this idea of transferability uh when these tools were made um they didn't really consider the importance oh well they didn't test for how they transfer to other models AI systems um but I think that's that's the important thing here uh we need to find an accessible way to develop and test different defenses against like not only just one but various common deep fake models so the project I'm working on right now is kind of trying to do that uh the idea is that we then upload our
photo we add the defense we want to add and then it runs it against the common models on that hugging face space that I told you guys about uh just to see not only how does it perform against one of the models but how does it perform against uh various models I also just wanted to talk you guys through like the limitations of adversarial noise um so pretty big one is they can be circumvented by like relatively simple techniques there are like complex techniques to remove adversarial noise but some of these techniques are not well like they're not robust enough that like when you compress them or when you rotate them it kind of mitigates or removes the
defenses we've added uh and the second limitation is like uh another like coming from the difficultness to be like transferable it often doesn't stand the test of time as we're updating models developing new models old defenses might no longer work however I still think that adversarial noise could be a really useful like technique and Tool here because it could address a specific problem we have and that's this Rise of unsophisticated and automated deep fake attacks um so like the bot that scraped social media to make like pictures of women we've like seen demos of people making like automated fishing attacks I think this is the specific problem adversarial noise uh would address if anyone tries hard enough they will be
able to make like some sort of synthetic media of you um the question is like can the people who don't put much effort into it do it yeah and yeah that's coming back to like why adversarial noise I guess it's been proven in other domains um and we need a method that prevents the making of rather than remedying the solution after Okay cool so we've gone through the project let's use what we've learned there um back on the Tik Tok dance filter so what is the Tik Tok dance filter so the the organization behind it is called figal Ai and they're pretty cool they've advertised this like new approach to creating deep fakes um and ultimately I wanted to know
if the previous adversarial noise methods designed for like the older deep Technologies would they work against this new technology that bigle AI is proposing um so I reached out to vigle and they were cool with me kind of like trying out all the the tools we saw before here so how the system works is you take an image this again actually I should backtrack uh vigle AI is like proprietary so I don't actually know what's in like happening this is just based on what's publicly available and interviews of people talking about the tool um but you take like your v um image input you kind of craft like a human 3D skin like for human model sorry
at the same time you have your video um input you take the moving subject in it make it into a 3D model then you kind of smush smoos it together put the skin on the human and create a deep fake video so the systems behind this would be like this gst1 model which is what they've been claiming is the like the new thing the Leading Edge but also like some sort of image to image model that we're probably more used to and that's what our attack will be looking at well defense will be looking at today um the model will still be moving but we're hopefully going to disrupt the process of translating an image into a 3D skin
so that the thing moving won't look like us uh to get around this um to understand a bit more about that like system or the like set of systems that exist in that second step I kind of just like put in images to try and understand uh attributes of this system so for example I one of to know if it would handle non-human subjects I wanted to know if it crops faces to what extent the background influences the output um how does it handle multiple faces how does it prioritize faces uh if I had like I think another thing I would have liked to try would be how does it handle not safe for work things does it
just block it but I didn't want to get banned so this is what we came kind of came up with so based on the behaviors of the model my addiction was that it's sort of some sort of diffusion model um and Mist one was one of the attacks that a was aimed towards a defusion model so that's what we're going with today um and for the result we kind of got this I can't like can't see if you tell but there is a bit of noise on her face but I wouldn't say it's particularly like the disruptive effect we were aiming for um but yeah yeah so maybe it isn't a diffusion model is what I walked away with but this is
like yeah where we kind of got um no overall it kind of shows the need like this need again to be able to like develop transferable defenses um specifically towards the models that um The Bad actors are using most commonly cool so wrapping up today we talked about how the accessibility of generative AI technology is making non-consensual defect like a serious and growing problem specifically looking at these like automated attacks that can uh happen at scale with very low effort we looked at adversarial noise as being a promising technique to defend against non-consensual Dake generation uh but there's still work that needs to be done to make this a reality and hopefully I've like shared
why I think there is a need to continue investing in protective defenses um that transfer to various models uh on the screen here we have a AI generated image of I think the Pentagon um being attacked it was when it was posted to Twitter uh it was like quickly like tagged as being AI generated but despite that despite like the detection um it moved the American markets and I think this just really reminded me that detection and watermarking isn't enough there needs to be some sort of protective measure at least for our personal photos um we talked specifically about uh non-consensual deep fake misuse but there are other ways uh deep fakes can be Mis deep fake Technologies can be
misused so it's an interesting space to watch as uh we see Solutions come up against them and finally LA ending on like our AI security of idea of the day the idea that adding AI to a system increases its exposure to different attacks today we use this idea to try and defend our photos against deep fake Technologies um but with like our model owners or like if we owned AI systems hat on um this is an important idea to remember um as we have measures to mitigate cyber security threats for AI security systems um but yeah thanks for listening I hope you guys enjoyed um