
wa hey folks uh great to be here to talk about how your voice or maybe Troy's confirms my identity so my name's eth so my name's my name's Ethan and I generally go by skas online I'm a security consultant at pan Security Group back in New Zealand on you know a little island on the other side of the sea also you know an an open source develop pretty Aid so got a bit of experience on both sides of the metaphorical security fence but enough about me and enough about this presentation let's have a little Story Time raise your hand if you've ever heard of dark net Diaries quite a few of you now I don't
know if anyone listened to the episode on April 2nd you know exactly where this is going then so in darket daries on April 2nd we're introduced to a lovely woman called Rachel now Rachel is a social engineer by tradeit and she got asked by 60 minutes if she could like fish one of their journalists live or not necessarily live but on camera now that sounds complicated you know surely it's pretty hard to fish someone on camera when they've given consense to be fished because you know you can't just go out you can't just fish them without their knowledge but she did it she went ahead and went oh cool investigative journalist there's a lot of interviews
of her online wonder what would happen if I download those interviews and train a voice CL and then I found her phone number alongside those interviews you know she's an investigative journalist she wants people to come to her and talk to her so she's put her contact details online so I'm just going to go ahead and spoof her phone number make a voice voice clone of her using all of her Clips all of her interviews and I'm just going to call her colleague hey um I'm down the travel agent right now and I forgot my passport and we've got that trip coming up next week do you think you could read my Passport details out
from the company system so that I can book my travel and we can go do this this journalism and her colleague's like oh that makes sense to me it's it sounds like my colleague it's coming from her phone number my phone says it's her it says it's her contact Absolutely I'll just read out your passport to you I'll read out the security details but it was actually Rachel sitting in the Next Room on her phone with her laptop open now there's a bit more detail to that and there's a whole episode on it if you want to look it up it's a 60-minute interview um with Rachel toak or you can look at the darket Diaries uh April 2nd
episode but I thought it was just a good introduction to talking about the following things and we're going to go over some deep faking within our current Society you know another example of kind of where we're seeing it the rise of voice as a security boundary and how it's coming up how to clone your own voice and then cloning someone's Voice using their digital presence and then some you know discussions around industry Trends mitigating factors and my thoughts from that hoping you'll get a few things one of those being you know just a better understanding of the voice biometric sprace you know it's a pretty new area as well as that some practical voice cloning skills how to use them to
bypass voice authentication and then you know on the flip side how to use voice authentication securely and then you know also just understanding the risk voice authentication poses to you as like a business owner as an end consumer but on that you know I did some Googling recently I looked up some news articles and all of these news articles are from the past relatively 4 to 5 months you know there's some pretty big titles in there you know the FCC Banning AI generated Robo calls must be pretty serious and honestly I'm just seeing news titles like this all the time it's almost every day but why why is our voice and voice fishing and scam calls from cloned voice
and Robo calls coming up so much because voice authentication is kind of seen as the next step forward in a simplified user experience if you think about you know a standard password flow you know good practice dict TX it's a long complicated password that's different on every single website or a series of words and phrases you know easy to remember but long and different for every single website whereas voice well that's not hard I I know how to speak most of the time at least and it it's just simple yeah on the flip side you know it's kind of seen as a step backs for users security if you get breached on a website with a password you just go
in there and you H reset password but if your voice get breach your voice gets breached you know your voice gets recorded your voice gets cloned how are you going to change your voice you can't just click reset voice ever think about that but you know surely to that you know if you can't change it it's out there in the world nowhere's no's using it you know I haven't seen too many places so again I did a little Google search and I found some backs now to the Australians in the audience and to the kiwis you know you might recognize azid now they're the people I picked initially for this because I bank with azid and it popped up every single time
I call them they go hey do you speak this phrase for me and my polite answer is no thank you because on their websites you know they're like cool how secure is it it's uh secure it makes it difficult for someone else to imitate you even if they've got a recording of your voice you know it's easy it's fast it's secure your voice print is unique just like your fingerprint you know just like your face ID and you know from that the key key phrase here as well is some of these banks are going hey look you know customers can choose to speak a simple pass phrase instead of entering other information you know once you've set up
your voice ID you won't need to remember extra information you won't need your MFA codes you won't need anything else just your voice you know stand a password flow if there's not MFA immediate finding an append test but voice hey voice is a new era with we can throw all of that out and then just one little bit there you know highlighted right down the bottom since your voice is unique someone else won't be able to use it on your behalf but but in bold to protect yourself though don't share your details with anyone else you know your voice don't share that with someone else I'm sure that's pretty easy right but let's think about it let's
brainstorm for a minute let's think about some ways you could bypass a voice Authentication system let's start easy let's mimic someone's voice let me just give you my phone or I give Troy my phone and go hey um you're Ethan now just say my name out loud and you're good to go maybe I record the victim without the knowledge you know I give them a call on the phone and go hey I'm interested in you know joining your company or buying your product could you tell me a little bit about it now here in California you know dual party consent state can't do that but there are places where you can and you know there are also people who
just ignore those laws and then from that also you can splice together existing audio to form the correct phrase you know free to like audacity drag some audio in there clip out the words you want because most of these phrases that you need to speak a public and then you've got the person saying the correct security details wow interesting and then like the good buzzwords we've got we can clone their voice using AI but we're going to take a look at a couple we're going to do some looking at graphs with some mimicking voices that I did back home and then we're going to teach you how to clone your own voice and someone else's using AI so let's
look at some data back in the office I sampled 15 people and I got four audio clips from each of them now on the graph each access is going to represent two audio clips from a given person and the numbers just essentially are an ID of the person so let's have a look no that is way too complicated let's think about it like a computer let's do green means yes red means no lot simpler now we've just picked an amatory threshold uh for this because unlike a password where you either get the exact password right who you're not let in voices a sliding scale do you want to make it harder to get in on the
security side do you want to make it easier for everyone to get get in on the usability side and now if we focus on a specific row here we're going to look at 2B up at the top now if we look at that initial Square on the left we can see that 2B is not being recognized as the audio clipse from person zero pop across one more we can see now that person 2 is not being recognized as person one great one more and we can see that person 2 is being recognized as person to that's ideal so that diagonal of green dots essentially just means hey can you get into your own account now the answer isn't always yes
there's a few people who maybe in their recordings they sounded a bit different maybe they were tired they can't get into their own accounts so like the good business owner I am I don't want to lock people out of their accounts they'll go somewhere else let's lower the three hold by 1% and see what happens so a couple things happened and they both happened in the same R person 14 to be specific we start at the right hand side of the graph on the diagonal we can see that person 14 is now getting into their own account great but they're also getting into person Zero's account and we've still got a couple people who can't get into their own accounts so the
humor take a guess in your mind really quickly at how many green squares you think there will be on there if I have to lower the threshold by a few more percentages to let everyone into their right accounts I'm willing to bet you didn't think it would look like this now a note on that is that that is only about a 2% lower threshold compared to the one from before and while this isn't exactly a voice authentication platform um it is pretty representative although there's a few extra squares in there for comedic effect um but surely with that right if I've gone ahead and got onto all of these people's accounts I've had to had
a highly complicated setup for these audio bypasses you know I've rented out a sound booth I've got a XLR microphone it couldn't be as easy as sitting on a couch with a laptop on my lap playing audio from the speakers with my phone on call with the system on the couch next to me except that's precisely what we did we recorded some audio using the microphone on the right you know a pretty entry level microphone and then made some voice clins played them out of my laptop speakers not even you know directional speakers or anything to the phone on the couch next to me and we got into a real customers account I'd say that's a pretty low
barrier to entry so what do we get with that well you know there's a low barrier to entry it's essentially do you have a phone and does that phone have a calling plan I'd imagine most of you would meet that criteria and then you know we've got some high value targets in there I mean I would consider my life savings high value at least don't know about it R else but with that I'd like to think that we get a hackers field day because you've got low barrier to entry and high value targets and if we put our Blue Team hat on for a minute here let's have a think if you're a voice platform let's
think about the kind of logs you might be getting you know if you're calling into the system what gets logged what can the team look at if fraud happens well there's a phone number obviously that's great we can check if the phone numberers is the phone of the person who's signed up to that account except we can spoof that and depending on where it's coming from that's not even illegal and then cool okay um well we've got the call itself except in half of these methods the audio from that call is going to sound like the person or sound like the account holder yeah that's not a lot of logs to work off that's um non ideal situation
so have a think about that how would you go about you know investigating fraud in that case because to me I would just think that the victim in this case just took the money out of their account it sounds like them it's coming from their phone number don't look into it too much more but hey you know maybe that's a bit too much durog glum let's have some fun let's train a voice Clan so the aim here is to produce High Fidelity audio you know we're going to use something like a soundproof room a decent microphone and speak clearly because the aim here is just to produce clear audio without background interference you know the way
AI works at the moment the better data we can provide it the better imitation it's going to give us back so the goal being you know clear audio single speaker you know no office noise in the background um that kind of thing so let's take that let's take that training data roughly 3 to 5 minutes worth and build a voice now we're going to be using a platform called 11 laps but that's primarily because I'm from New Zealand and I have a kiwi accent I've probably got a list now of about 15 to 20 platforms you know both online platforms where you just upload your training data as well as offline models that you can just train on a 1080 TI now
we tried all of those but unfortunately you know most of these companies most of these startups are American backed and the American dialect which is the majority of the training data doesn't work too well with my kiwi accent so 11 Labs is kind of the current go-to we' have found that works well for us for the majority of the audience out there I'd imagine most platforms would work for your needs but with 11 Labs as well it's extremely low barity entry do you have $1 us do you have a credit card with that Dollar on it and do you have an email address that you can access the inbox of to click the confirmation link when you
create an account wonderful you're in and of that you know it's probably about a 30 minute turnaround time from when you first hop on the website to once you've got a fully functional voice clone and honestly most of that time is downloading the audio data but let's hop on the website it's really simple you hop under the voices tab we're going to add a voice and there's a few options we're going to do an instant voice clone which is you know as it says there one speaker over a minute long no background audio they do also have the option for professional voice cloning honestly haven't needed it once you're in there you know you give your voice a name I prefer to give
it the name of the target just for when you've got quite a few targets you can easily distinguish and then you can upload your samples you'll see there that you can upload up to 25 samples of 10 MB each but as they stress there as well we're looking for quality over quantity here once you've gone ahead and done that click the button down the bottom you know clone voice done and now we have a voice so let's go ahead and use it we're going to click a different tab might be really hard to understand it's the speech Tab and then once we've clicked the speech tab it's a free form text field and rhetorical question here but any
guess is what goes in there if you guessed the text that we wanted to say great so it's quite simple you just type the text and you want to say and hit generate now it takes about four generations for me to get some plausible kiwi accents coming through which means it's not quite valid for like a real time application like if you're on a phone call with someone now each of those Generations only takes about as long as it would take to speak the text you've provided so if you're of you know an American accent or you can get it to a point where when you click generate you're happy first time every time you
could likely use this for nearly real time but unfortunately not quite the case for me so let's listen to some examples let's listen to some examples of my voice now as you can all see you know I'm a remarkably you know I'm in my mid-40s um so let's just listen to my date of birth the 30th of June 1981 yep that sounds like me you've probably had about 15 minutes now my voice you can take a guess as to whether or not you think that's me and possibly another one you know less numbers more text I say that number all the time there go it's better you know play it one more time for people I say that
number all the time and you know hopefully you can kind of pick up on that being me or at least very close to me and then if we put some extra things behind that you know if I'm calling in with a bit of urgency maybe some background noise on the call while I'm playing it like a baby crying just you know pulling out our redam hok essentially you could imagine how easily you could get past someone with that like that wasn't 100% success rate back home but it's about 80% for some of those audio clips and that audio clip I played did to also get into my account on one of the jobs I was doing but it's
all well and good cloning your own voice um what happens if you know my voice sucks with the software well they've got a beautiful settings tab uh stability for example essentially says every time I click generates how different should it be for the Kiwi accent we're kind of requiring relying on that difference to pull towards more of a kiwi accent un less of American so you just change that depend on how you're finding it similarity is just how closely it's going to match your input audio files so you're going to put it as high as you can before it starts giving you audio defects and then style exaggeration depends on your target if you click
generate and it doesn't sound like them maybe it's speaking too fast maybe it's gobling it a bit push that up a little bit and just take it a bit more towards what they sound like but that's enough of our ears what does the software think what if we put it to towards that graph we did earlier on we did a comparison of my real voice compared to those voice clones we heard o there's a green Square in there now sure not both of these squares are green there is a piece of audio in there which isn't being recognized as me but in the case of what is primarily being used for an authentication flow a
bypass is a bypass you would still get into my account with this and that's that's not on but also it's remarkably simple to clone your own voice you know I can just walk up to a microphone and start speaking so let's go ahead and clone someone's voice you know using their digital presence maybe more so on an engagement you know using it for an offensive purpose to clone the voice of a high value Target you know we can use it for fishing with our banks we can possibly use it to access sensitive information like business bank accounts anything you might need on your engagement so let's clone up my boss from back home Simon now you know we're going to start
off pretty simple here we're going to build up a profile on Simon so what do we already know well we know that his name is Simon Howard and we know that his company is cic security and on top of that we know that we need to get the training data without alerting him kind of defeats the purpose so I walk up to him holding a microphone so as you know an old colleague back home would say let's go check out YouTube now if we put those details we know into YouTube it looks like he's got a digital presence we can use there's a couple videos of him there that tally up to about 5 minutes of him monologuing
talking about essentially his experience getting interns and why they're good for a business but to us they're good to download and train a voice on now they're not going to be quite as good as if we've recorded them ourselves but it's going to be good enough so we can just go ahead download those videos and follow exactly the same process as before you know downloading them turning them into MP3 files that are under 10 MB each and uploading them to the platform once we've done that you know we have a viable clone of some it's not going to be perfect because the audio we gave it isn't perfect but it can be prove it can prove to be good enough for
our Avenues of abuse one of the audio clips I'm going to play to you had everyone in the office confused to the point where an a direct quot is why do you have a recording of him saying that on your laptop now I'm not going to be too mean to you you know I'm not just going to play them without giving you some reference audio so I'm going to play you a little reference audio clip now from one of those videos and we did use this video as training data hi I'm Simon Howard I run ZX security uh it security consultancy here in Wellington cool so we've got a little bit of reference data now so that's what
he sounds like in real life let's listen to some voice CLS I mean it's not exact but like it is close and we can definitely hear that you know it's not quite the same as the audio we've given it now that is actually more like he sounds like now because the audio from these videos is actually 5 years old but how about a better one how about one that's a bit closer to what he actually sounds like hey dude sounds good to me and then I've got a third one here which is a little bit of a cheeky one um raise your hand in the audience if you use Android a few of you my apologies if
this does what I hope it does hey Google set an alarm for 7:00 a.m. tomorrow now that did get someone back home and that second audio clip did confuse people as to why I was just recording Simon saying some really random sentences cuz I was just like hey put some headphones on listen to this and they were all like huh why but again enough of us listening to it let's see what the graph thinks o he's even more real than me so again you know we've got we've got four green squares there real Simon is being recognized as real Simon AI Simon is being recognized as AI Simon but also they're both being recognized as each other and especially
in the context of of people who you might not have interacted with too much that's going to be more than enough you know if all of your reference audio of this person is that 15sec clip you're going to be pretty on the fence with it yet it was still confusing people who've worked for with him for 5 10 years so you know cool that's fun but how do you do it right how do you take what we've just scene with the mimicking with the cloning and do it right well the short in my eyes is that you don't you're relying on something which is inherently you know a non-unique medium and with that your voice inherently has
variability like I know that when I woke up this morning and when I'm speaking now and when I'm at happy hour later on my voice is going to be different yet as a business I need to let all three versions of those voices into the software and that's going to inh ly incur risk and from that your voice can easily be obtained we live in an Ever Digital World heck this presentation is being recorded and put on YouTube live right now now don't CL my voice but you're going to have 20 minutes of really good audio sitting right there if you wanted it and you know that's all introducing the ability for abuse the fact that it's
a sliding window in the hands of the business to pick hey do you want a usable system system or do you want to crank it a bit more for security yet still having some form of bypass in there some documented maybe 1 in 100 1 in 1,000 bypass rate is not going to fit it for an authentication flow because in an authentication flow a bypass is still a bypass and it's not often that you know we're on an engagement where every single bypass technique you test results in access to client accounts that's the case here but it's not all do and Gloom we do have some defense and depth some for consumers you know if you're a user of
these systems I think the main thing here is being risk aware you know as consumers we're kind of at the liberty of the business but being aware of what we're signing out for and having it on our brain that okay maybe voice ID isn't all that's being talked up to be uh it gives you the option for Alternatives and with that you know use alternative offerings if they are there you know if your band goes hey um you can use voice or you can do the old way where we'll manually verify you and we'll get you to read this this code out from the app you know use your the alternative I'd rather you know log into the app and have to
click yes then uh simply use my voice something which is readily available for anyone on the internet but ultimately as a consumer you're essentially dependent on the business you're using which is unfortunate now to the companies companies implementing this in their workflows you know add additional forms of authentication if we're thinking about a standard password authentication flow if I'm testing that and it doesn't have MFA immediate finding yet with voice Au like authentication having looked at a few of these Banks and what they say online as well as what I've experienced back home it's often the only layer of security when used you know it's this big buzz word new thing and maybe because there's so little
articles out there of it being hacked they think it's fine but if you're a business with it use it as secondary authentication use it you know with some other authentication I think A&Z is a great example if you do a transfer of over 10,000 examples they will force you to authenticate with your voice but that's on top of already having authenticated to the app you know layers of authentication here increasing the you know the hardness of the system system it's not going to be perfect you could still bypass MFA but it's a lot more than just do you have a phone with a phone plan and then on top of that let the users pick a private phrase like every
bank that I've looked up for the research of this they write on their website hey we're going to get you to say the phrase your voice confirms your ident your identity like that's that's not very private um that's not very secret you know if we're letting the users pick a private phrase that they're going to say you now have to fish the individual user for that phrase it's no longer cool we'll just clone some voices off the internet we know what they have to say all is good you're increasing the barant entry and on top of that ensuring that the spoken phrase is actually a sentence and not numbers we have found to have like a measurable effect because if you
think about me saying 1 2 3 4 5 versus 543 to one you could record those and flip them and it sounds relatively fine whereas the way we say words are often dependent on the context around them you know you think about the way you're saying this then you look at the word beforehand and after and you inherently just have some form of link from each word to word where when you clip it out of context it sounds weird and while it's not going to be perfect it's giving the systems a better chance to pick up on the uh the fraud essentially and you know just giving the underlying system the best opportunity to detect fraud that we can increasing
the barrier to entry taking it from you know 1 in 100 1 in 1,000 to something like 1 in 10,000 or 1 in 100,000 add M on top of that and then it's just a force multiplier and to anyone out there who might actually work for one of these vendors I've got a couple thoughts and that's you know providing means to verify if the content was produced by you there's a couple vendors out there now which are you know starting to provide this as an optin on the side um but you know things like cryptographic signing you know if you produce a form of content sign it and then people can verify against that using you know
existing measures such as you know maybe some Hardware signing or public private keys but just something but on the flip side you know actually push for the adoption of these checks as well it's all well and good providing an API endpoint to go cool you can give me a piece of audio and I'll tell you if we made it if no one actually uses it what's the point like alongside you know putting it in your documentation that nobody reads actually push for it you know have it as a part of those flows by default because I'm sure as we're all well aware users will inherently tend towards whatever is default not necessarily whatever is secure but hey let's have a recap let's
let's just go over some things and that you know your voice is not a security border there's a lot of different ways for someone to get your voice and they might not even have to get your voice to get into a system and on top of that the technology is only going to become more and more common if we think about other Biometrics such as face ID such as fingerprint you know they started out small you know maybe it was just Apple at the start and now look at where they are everywhere I think voice is that as well we're starting to see it become more and more prevalent where someone goes oh cool that bank's got it we've
got to have it as well or our customers will leave give it 5 10 Years it'll probably be just as prevalent and on top of that voice cloning is only going to become better with time when I started this research about 8 months ago there was probably only about 1 two three vendors whereas now I've got a list of about 20 and that's a cat and mouse game over time I don't think we're going to win also the fun part we've made a few voice clients taught you how to make your own and had some fun with them and I just like to end you know with some acknowledgements you know to all my colleagues back home I think they're
watching it's it's been great you know for putting up with me continuously walking around the office going hey just need another minute of your voice please you know to the organizers of this conference and all the volunteers it's been great and to you the audience I hope you've learned something and I just like to end with one last thing and that's when we base our authentication on Solutions which are not inherently unique or secret we are destined to fail thank you [Applause]