← All talks

David Hawthorne - Deepfake Deep Dive: Building a Deepfake in Real-Time

BSides Knoxville47:38523 viewsPublished 2025-07Watch on YouTube ↗
About this talk
Watch a deepfake built live on stage and understand what you're really up against. This practical, technical session reveals the nuts-and-bolts of deepfake creation, uncovers security implications, and arms you with actionable strategies for detection and defense.
Show transcript [en]

All right, deep pick deep dive as Joe said. Uh my name's David Hawthorne. Uh we'll get into that. Uh possibly. Um but what I wanted to say was when I was selected for B when I was selected for Bides, I was so excited. Demos terrify me, right? So I was jumping up and down in in quiet reservation very loudly. Um, I thought I had won the golden ticket, but then I said something else. You see, the reason why I said, "Oh crap," is because we're in Tennessee. Tennessee was the first state to pass landmark legislation regarding the use of generative AI to fake people or manipulate or anything like that. And what the Elvis Act does is it get you

gives you personal rights over your name, image, likeness, and voice. There's a lot of opinions over whether or not this is good, whether or not it's bad. I'm not here to debate ethics, regulations, or legislation. My job here is to educate and inform. Speaking of which, the No AI Fraud Act was originally uh bipartisan legislation. And as our friends at the EFF point out, it's not without its issues, right? One of the problems with new things when it comes to the law, and I'm not a lawyer, by the way, is that laws already exist, right? There's not just laws, there's rights that are granted in the first amendment. Now, knowing this, what do I do?

Do I go for it or do I run away like King Arthur from the French and my Python's Holy Grail? So, I called up my attorney. I spared no expense on this. And I asked them, is it legal to create a deep fake in Tennessee given that they have the Elvis Act? One of the big lessons in this talk is don't trust what you see or hear or AI. Yeah. So, we're going to find out how this goes. I'm looking in the audience. I don't see any uniformed officers yet. Uh, I'm David Hawthorne about to fall off the stage. I'm director of cloud engineering at 03 Solutions. We're a growth stage startup. Um, how I got into

security was really by accident. I wore a lot of hats over the years. Not because I didn't do a good job. Hopefully, I did. But because at my core, I'm naturally inquisitive. I want to learn as much as possible. So, I've done network. I've done database. I've done ML. Uh I am not a data scientist. If there's any in here, I'm sorry. You're going to cringe. So, what are we going to do today? Well, first off, I am not going to show you how to build a deep fake, but we will build one. I don't want to be that guy on the internet that gets in trouble. I don't want to show up with the best

tutorial on how to do this. My goal here is that there is not any security training out there or enough of it regarding this topic AI generative AI. So we're going to take a look at how deep fakes work. We're going to recognize deep fakes and their signature tells or will we? We're going to talk about detection strategies. I got some great demos. If that all fails, I've got backups. Um, and then we're going to uh have some fun.

So, let's start off with a brief history of pure imagination and understand how we got here. And just a reminder, please try to enjoy each fact equally. You see, with the Elvis Act, I've got to make it satire and parody. Do we have the parody thing covered? Okay, good. Good. I'm safe. So, there's lots of types of deep fakes. Uh, some of you may remember Jordan Peele as a Barack Obama. I'm not sure he brought Luther, though. Uh, there's facial res resurrection like in Star Wars, uh, when they brought Carrie Fischer back. Rest in peace. And then, uh, deaging with Jeff Bridges and Tron. Um, I'm a big Labowski fan. and then voice synthesis and full body puppetry.

If you've never seen the Mortal Kombat videos from the 90s where they're actually taking the inputs and putting the markers on them, check it out. So, how did we get here? Well, there were a lot of breakthroughs and one of the biggest ones was optimization. This is a academic paper. I've included it in the repository that I'll have a QR code for later on GitHub. Uh, alignments and meshes are basically how we're able to recognize shapes. And the way these things work is they're able to detect edges. It doesn't matter if it's a face. It doesn't matter if it's a funnel. It doesn't matter if it's a cone. It will know. You can tell it

what a cup of coffee looks like and it'll detect the edges and tell you that's a cup of coffee or it'll say no that's a mic stand. But the real threat of deep fakes is the fact that this technology is accelerating so quickly. Look at the right. We've got all kinds of different apps. Uh, one-click apps are things like face app, right? Remember when face app came out and everybody was posting to social media weird photos or check out my new hair? That's a one-click app. It actually sends your picture off probably to a foreign country and then it processes it. We'll talk about that a little bit. So, we start off in 2017 with Face Swap.

Face Swap is built off of some very significant academic achievements because the technology that I'll be demonstrating today is very, very, very important. Think about this. If you're a stroke patient and you get picked up by the paramedics, isn't it great to have an app that says, "Well, actually, their face is drooping. You just don't know it. Right. Um then came along Deep Face Lab. Uh and then finally what we're going to see today is Face Fusion. Um Face Fusion uses uh some other technology. Um the big difference between all of these is that face swap required so much work and it worked off of 2D alignments. What's the difference between alignments and meshes? anybody?

One's 2D and one's 3D. Think about the boost in quality you might get from that. So, we started off with AlexNet. I'll move over here. Alexet was a convolutional neural network. It had eight layers. So, we started off with a Nintendo Entertainment System um not too long ago. Then came ResNet. ResNet's 128 layers deep. Those layers enable you to generate more tests. Now, convolutional neural networks work by taking uh an encoder, sprinkling on some salt, which is called randomness, and then saying, "Hey, does this look better or worse?" Now that's actually a gener uh I'm sorry I confused that with GANs but convolutional just generates and generates and generates. Um

so face swap was awesome because now you could put yourself in a lightsaber battle. It didn't look right because it was 2D based and at the end of the day, you almost wondered if Photoshop would have been easier frame by frame. So, let's go over the process. We already talked about face detection and align. We talked about alignments a little bit, but face detection is where we're able to recognize the actual face. And what these apps do is they take and cut out the face and save it frame by frame by frame if you're doing a movie. And once we have all the faces, we start getting into every single phase, drawing our alignment, drawing our meshes, and then

finally or uh number three, move on to GANs processing, which is uh generative adversarial networks. Uh if you've heard generative a lot, it encompasses multiple domains uh or subsets. Uh LLMs, it's just another generative technology, right? And essentially what these do is they're two neural networks uh and they generate synthetic faces. Um and you basically got a vote. Think about back to those layers. How many people can vote? Um, and the judge says yes or no. Is this better or worse to the extent that you want it to until it's refined? Right? And then we take all those faces and we stitch them together and put them back in the frames. And when we put those back in

the frames, we take the meshes and the points and we kick them back together and we match up this point to somebody else's point. And we do that for about 468 points in under a millisecond. And I'm just using a mid-range Mac. So, I need a volunteer from the audience. Anyone? Anyone? Come over here.

All right. Now, this is very important. This has a big bug in it. I fixed it in my code, but it's not fixed in their code. So, what I'm going to need you to do is keep looking at that at the webcam. Get your face really get in there. Good job. Back up. Back up. Fingers crossed. Okay, it's working. So, I'm going to share my screen now. Keep keep staying there. I know it's hard. Um, we want to duplicate

Well, I know one way to do it. >> Hi.

You didn't see it. It was working. If you can't tell, this is really If you can't tell, this is really resource intensive.

And it won't work if there's two faces in it.

Really get in there. There you go. You can take it from here. But this is running in real time and that is utterly frightening as we'll see in a moment.

This is better. I got my uh performance back. Would help if my hands were cooperate.

There we go.

Decimator. I like that.

Gosh. We'll just go from here.

Maybe I should stick to videos. There we go. Or not.

I'll remember to hit shift F5 next time. I promise. So now we got the Slugworth protocol. the frightening part of this. So, everybody familiar with this incident? Anybody recognize this person? Probably not cuz they don't exist except for Yeah, we got a few people nodding their heads. But no, before was very awesome and shared a post incident review and lessons learned regarding a breach that occurred. Um, we're going to talk about that. So, how that worked was the threat actor that w was working for DPRK, uh, took a US citizens identity. employment fraud isn't anything new. Stuck it into a mixer, deep fake tool, and then uh blended his photo with a stock photo. The next thing that happened was four

separate interviews and for that they passed. There was nothing to tip off anyone that something fishy was going on. And I use that term deliberately because deep fake detection isn't an option. There are a lot of smarter people than me working on this problem in universities and research centers everywhere, but we still haven't found a concrete method to do it. And I'll show you why. Originally, what we thought was eyes are hard to replicate. I don't know. This is JFK at the the Kennedy Nixon debate and doing his opening statement. And what I've done here, and all the code uh for it is up on GitHub, is I've written a thing that goes and takes and calculates the I

aspect ratio. uh if this distance to this distance falls within a certain threshold I know where they blink and you can see up there the graph is each time uh President Kennedy blinks now if we look at this with a deep fake look familiar that's our threat actor uh it passes humans blink at a rate of 2 to 3 seconds. Obviously, we can't use that. And that's lesson number one. We're in an arms race. Earlier, Dave Lewis talked about democratizing uh security. Well, what a what generative AI is doing, it's it's democratizing threats, right? Because now I can go and vibe code an eye aspect ratio. And if I can do that, I can vibe code a deep fake, right? A

deep fake maker. Any technology- based solution given the time horizon of how fast this is advancing will likely be obsolete. And the worst part is you may not even know it. One-click apps were thought to be 20, 25 years out. Face fusion, face swap, these technologies were thought to be ahead out. We didn't need a reason to talk about it because it wouldn't happen, right? But Bill Gates said that most human jobs can be replaced in 10 years. It took almost a generation for Americans to have cars ubiquitously, right? Anybody can log into chat GPT. So, I'm going to lean towards my backups. Uh, if that's all right. Um, we're going to look at how meshes are

created.

So this is me and if anybody's seen Home Alone, pull a Kevin Mallister and what you'll see is that the mesh changes. It's not right. So maybe we can use aspect ratios. Maybe we can use our brains. Uh, all the code for this is up on GitHub. Uh, and this is calculating, like I said earlier, 468 distinct points on my face. You can increase that. Play around with the code. And I'd also say if you have young children, get the coat. they'll go wild just staring at themselves uh and playing around. The other thing we could do is we could take a look at the edge of the mask because a lot of times it's just a

little off. It's just blending. Now, that wasn't enough to work. I'll show you what that looks like next. I'll start off with an actual human. I think I'm real unlike who was it? >> Yeah. And what you'll see is that the variance, it's really hard to figure out a good threshold of what's real and what's not real. In this demo, what I'm doing is I'm checking edge color artifacts landmarks is the position of the mask correct blending where where the edges are again um and getting an overall score. Now the the good news here is if you vibe code this, it will try to average all five of those against each other. That will not work. A better way would

be to have a voting quorum, right? If I have three false or three thresholds too low, then it's fake, right? But there's so many different variables that come into play. There's lighting that's hitting my eyes. There's the angle of my head. There's talking and motion. So many things that could go wrong and they seem to keep going right the more time goes on. Here's our threat actor again.

Kennedy was staring right into the camera when he made his opening statement earlier. I turned my head. Artifacts are old news and you can't really use the uncanny valley. Otherwise, you might call a human a bot. But what you can do is little things like a livveness test. If you're in an interview, ask the person to get up, turn turn off the lights, do something. And that's something that didn't happen in uh that case, as far as I know. Now, when we talk about artifacts, artifacts are interesting. We've got some here. that jawline. At certain times, you'll get some pixelation around here. But if you're on a Zoom call, whose internet's being the problem? And and and the whole time you're

thinking, I wish Dave would learn how to mute his mic. So context awareness is important. And one of the biggest keys to AI safety is still humans in the loop. It's a tired phrase at this point, but it bears repeating because it's so critical to maintain contextual awareness. Um, behavioral systems are hard to create technically. When I was moving around, for example, that's that's what makes this so hard. And the other thing to be aware of is that AI will give you bad information when I vibe coded these up. I can't show that as security conference people. Gross. It's easy to miss the little details. Case in point, I was having a conversation with one of my co-workers,

one of the best data engineers I've ever met. And I thought I had it working and then I tested it against me and we were both deep fakes. I hope not. I hope the basilisk isn't hitting me. And I said, "Wow, that wasn't very hard." And he said, "What about the elf here? That's not Spock. I didn't notice. And that's the problem. And then I asked him if he mind if I use this. So don't trust your brain. As humans, we are hardwired to make to filter out details. When you're driving a car, right? There is so much information coming at you so quickly your brain can't process it all. That's why we have dreams in our sleep

where our brain tries to reconcile what happened that day in a way that makes sense to it. Don't trust your brain. Verify. So what we do when the world turns into Waka vision? Thinking back to that case where we had a state actor, something interesting happened. The first one is it fit into every single incident I've ever been involved in with a human. What are you doing? That doesn't make sense. When confronted about the activity, the user said they were trying to fix their router when really what they were doing was VPNing back to, as someone put it, their slash24 in North Korea. Um, but the big thing here that I want to point out,

something you already had worked their endpoint detection or events detected suspicious events and they immediately responded because they had a good response team. They contained the breach in 25 minutes. And I got to tell you, my goal is always an hour when it comes to database outages. And the reason why is because it's going to take me 15 minutes to boot up my laptop, 5 minutes to verify the problem, and probably about another 15 to fix it. That is awesome. That worked. Good on them. So, what defenses work? We already talked about livveness. I saw some marketing materials on the internet when I was researching this and the marketing materials. Well, they had things like, "Oh, the user is

wearing a mask or their eyes are closed." I told you data scientists would cringe because you're running the wrong experiment. All you're really doing is challenging if they're having a mask on or their eyes are closed. So, we got to think correctly about this. Have them stand up. Right? Code words I've heard a lot. But code words where you share a secret with someone and you validate that secret. I need you to wire a million dollars to this account. Hey boss, is this right? tacos, right? Unfortunately, that only works with someone you know, but it's a good measure to protect you and your companies. Then there's awareness training. I think I've got Violet in for that one.

Extortion and fraud isn't anything new, right? These attacks, whether they're extortion or fraud, are the same attacks just done differently. And there is data to support that security awareness training works. In a survey of 150 organizations, Hornet Security found that 78.5% agreed that it helped limit attacks. And I would agree with that statement, too. When we do security awareness training campaigns, the ones that I actually get survey responses for are when you have the the exploit and the victim side by side, especially when it looks so easy because you understand the nature of the problem and it looks cool. So, if you want to engage your team members, that's an awesome way to do it. I just happen

to be doing it today. Um, and then you've got your controls, policies, and as I said earlier, training when we look at the no before breach, it was their existing controls and policies that caught it, right? And for decades, we have had good controls that have been tested and challenged across all kinds of domains like finance, information security, um medical if you're going into surgery, they work and they've been tested.

Your existing controls are your golden ticket. to staying safe.

I'd like to leave you with one other thought to ponder and that's one of my favorite teachers in high school was a guy named Ernest Klene. Anybody ever read Ready Player One? It's not him. Anybody seen the movie? Ah, okay. Good. Um, basically people live in a virtual world, right? And their bodies stay in stacks of trailers in giant trailer parks. Um, it wasn't that Ernest Klein. It was a different Ernest Klein who gave me a copy of William Gibson's Neurommancer. A screw-up like me uh has to ask, have we already leaped forward into that reality? The really high quality one that you saw earlier actually looked more fake, which is kind of odd. Are we reaching perfection? And are we

having the right conversations? When you go back home, are you having the right conversations around this topic? And I don't think there's a good or bad conversation to be had because we don't even know what the conversation is. We as a culture define our culture as movies. Right. Movies have become our culture and our mythology. Uh Lisa Bode with University of Queensland uh wrote an editorial with some other people called the digital phase and deep fakes on screen. Uh it was published in a journal called Convergence. You can find a copy of it on GitHub in the repo. And basically what she was trying to get across is that identity has been pretty much become ambivalent

in Hollywood. And I would challenge you to uh consider whether or not identities are be are changing. What does it mean to have an identity? I hate to reference blockchain after the last talk because I totally agree with that meme of the uh snake oil medicine with blockchain, right? But it's starting to make sense now. That's a scary prospect as a developer. Uh but uh I want to make sure that I leave enough time for questions because I I really did a lot of research. I understand this stuff uh from a pretty good level on the technical side. Um so I'm going to leave a QR code up here for GitHub and I'll let Joe pass around

the mic. We'll see if there's any questions.

Thank you. Great presentation. Thank you. Um, yeah, I find the topic of deep fakes fascinating. What about using watermarks to help identify deep fakes? Watermarking technology. uh I would think back to the fact that we can fake humans any system will be exploited and uh blockchain I don't know but uh anything is better than nothing and it gives you a little bit of a measure so like internally you can use that as code word right So that's where I go with that. >> Thank you so much. This was a wonderful presentation. I was just wondering if there's any use for deep fakes on the organizational side. I feel like whenever we talk about it, it's always

about how threat actors are going to use it in a negative way. Is there anything that organizations can do uh to utilize this new emergent technology? Most jobs can be done for by AI in 10 years. Wouldn't it be nice if I could put my face on something uh for the purposes of training, right, or teaching? Um I think there's valid cases for this and I don't think it's posting yourself in Star Wars to social media, right? I think some of the use cases like stroke detection with the meshes um public communications um where you can't actually be there right so if I can't make a meeting I can have somebody else make the meeting for

me and point at the screen right and these are all valid uses of deep fakes right that actually help an organization Um I don't think we're there yet. I'd like to be in a way to a certain extent. >> Yes. >> Um talking of uh watermarking, I think C2PA is kind of an interesting initiative. So basically digitally signing content like some camera manufacturers already have that. So like yes, there's a private key unique to each device and yes, we'll see that at Black Hat next year that somebody's ripped the key off it. Um, but if we can get that low enough cost to get that into consumer webcams, then it raises the cost of deep fake because

you can basically you don't have to detect all deep fakes. You just have to validate that the content came directly off your Mac webcam. >> Yeah, that's a great point and I really like that you said device IDs because what Well, we're in Tennessee again. Um, when you validate your identity for a website, right, it has to take a picture of you. You have to give it your ID. Wouldn't it be nice if that was linked to my device, right? So that my identity wasn't compromised. And we as a culture have to evaluate whether what data we're willing to give up so that we can maintain the appropriate level of privacy while still being able to function.

and not give away your driver's license regarding stuff right? I don't want people knowing what books I check out at the library. Anybody else? We got uh about six minutes left. Oh, in the backudites

are country organizations. What What all do you think they're doing with some of this stuff? >> Well, the uh blue hat in me hopes they're using it for academic research by getting a large enough data set. The other thing is is facial recognition technologies come a long way and you can map out various things to where when you upload your face. I look at your IP address. I know what city you're in. If I know what city you're in, I can go to LinkedIn. I can look and see based on the images on LinkedIn who you are right now. That's all hypothetical, but could be one use case.

>> Yeah. And the the question was what would be what what are state actors doing with data if they are collecting it or the internet. >> I'll uh I'll ask another question and I'm sure you know all this stuff obviously but like this is deep fakes a deep fake you know you used your camera on your Mac so it's obviously a real human right to your Mac. I'm sure you've heard of Unreal Engine 5 the new Unreal Engine 6 that's coming out. these metahumans, these digital twins. You know, some of the highest paid influencers on uh social media are complete AI avatars. Your thoughts? I think there's a place for it, right? I would like to have an avatar that can do

things for me just like an AI agent, right? I don't want to go to meetings. Not to not to have a tired refrain, but I don't want to go to meetings, right? But I do want to be present. How do you balance those two? Right. >> I don't do social media. I am not going to monetize my Tik Tok. I don't have one. But yeah, it's it's really fun and really scary.

I have a question. Is it possible with deep fakes, speaking specifically to say disinformative content, is it possible to put that toothpaste back in the tube? >> No. >> That's what I thought. >> No. And here's why. I can just go to chat GPT and write the code. Those samples took longer to copy and paste, you know, or copy paste was faster, but those code samples I vibe coded deliberately to prove a point. And that's the point I'm trying to make. You cannot put this genie back in the bottle. You have to talk to each other, right? >> Excellent. Thanks. Yeah. Because with disinformation, by the time the deep fake is detected, the damage is already

done. >> Yeah. Yeah. Because what ends up happening is people fall for it. And it doesn't matter if it's true or not. If there's a quorum of people that believe it, it doesn't matter. I think there's one more question in the back Joe.

So um in the beginning of the presentation you mentioned um AlexNet and ResNet and everything um which I've used personally for um image classification. How does that like how does that um how do I put it? How does that contribute to the whole uh I guess more generative part of uh deep fakes and everything? Did you use it uh to try to um identify deep fakes, which I know you mentioned that that it's kind of impossible to identify versus like it um it's impossible to do that, but I was just curious uh where that kind of comes along versus like the classification versus the generative part. >> Yeah. So classification's important especially when sorting faces and the

reason for that is is that face swap was able to do an entire scene by sorting faces. Right? If I wanted to replace one face in this entire crowd, I would have to be able to classify each face, right? and it had this function that you could sort it all right and if you sort by face you'll figure out which one uh the big breakthrough with face swap is that it uses diffusion but back to your question um that's the big case right there for classification is being able to replace one face in a scene there is a reason to have

This is kind of tangentally related, but with facial recognition technology and a lot of governments kind of increasing surveillance on citizens, with some of the research that you've done, do you think that there's an effective way that people can kind of protect their privacy or protect their face in a sense whether Whether it's, you know, things like big sunglasses or some people do makeup or some people do refle uh reflective shielding on their faces or is there even a possibility of maybe putting honeyotss of yourself out there? So that way if somebody was trying to use Osent tools to try to figure out who is me then if I maybe had some fake profiles out there with fake

images of myself that are slightly altered, maybe you can't actually find me with my facial ratios and things like that. I never want to rely on security through obscurity generally for anything. It's a great measure to have but won't protect you. Uh so the answer is no. But in security we always like to say yes but I'm going to say no but there are certain things that I have tested. I'll tell you what doesn't work. Facial tattoos. I had my I should have put this in here. I had my son draw with magic marker all over my face and it still worked. That's frightening. I think it's going to continue to change. Uh if you watch Alissium, they

had facial scarring, right? For that very reason, to avoid facial recognition. It doesn't matter anymore. In fact, what you could do is take the encoder and work it out. And if you needed to, you could train a secondary model using hyperparameters to do exactly that. Get around whatever protections you have done with your photo, etc. So my advice would be to always look good because it doesn't matter.

You've been talking about the fast rate of change of all these things. I'm >> sorry. >> You've been talking about the fast rate of change of all these things. Uh you come back here in two years from now. You're giving the same presentation. Where do you think we are then? >> I'll be a deep fake. I'll be in Asheville, North Carolina where I drove in from and I'll be giving this talk but not using video or not using zoom right. uh what we will see is the application develop, right? It's not so much whether or not we can face swap. We're there yet. We're there now, right? We can make things that are indistinguishable from

reality pretty easily. But I think where the thing lies um in the future is the application of the technology. We've been talking about that a little bit. Um what are the vulnerabilities? What are the exploits? No. before's incident seems novel. It's not. It's just a little bit of salt on your dinner. I think we're going to find some interesting things in the future. And that is beyond my capability to think that far out because it's just changing so fast. I think we're at time unless there's a

All right, everybody give it up for David. >> Thank you.