← All talks

The One With The Foreign Wordlist

BSides Las Vegas · 202242:48134 viewsPublished 2022-09Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
TeamRed
StyleTalk
About this talk
Dimitri Fousekis and Ethan Crane explore password cracking techniques optimized for non-English languages and scripts. The talk covers transliteration rules, keyboard-walk attacks across multilingual keyboards (including emoji input methods), and practical tools for generating efficient foreign-language wordlists. They demonstrate real-world applications from password audits and discuss how targeting language-specific password patterns significantly improves cracking efficiency.
Show original YouTube description
PW - The One With The Foreign Wordlist - Dimitri Fousekis, Ethan Crane PasswordsCon @ 15:00 - 15:55 BSidesLV 2022 - Lucky 13 - 08/10/2022
Show transcript [en]

um sort of a background for this talk is that i think i believe that when we are discussing passwords we are giving our password advice when we are talking about password cracking which we love to do we are most of the time focusing on english names and words a vast majority of all research is heavily towards english even me with norwegian being my native language there is a lot less research about that we had an excellent talk several years ago in stockholm from somebody living in south africa talking about exactly that he said that if you know swahili as an example and you can do a very simple password in swahili chances are pretty small that the bad

guys will be able to crack it because they just don't have a word list in swahili available and today i'm really happy to have back again dimitri from south africa and his colleague ethan that will be talking about the one with the foreign word list and i'm not going to say that's a continuation of that previous talk but again research into passwords and passphrases that are not necessarily strictly english or using standard english alphabet so please take it away

thank you bear and thanks thanks everyone for attending today just some background on us so i'm dmitry forsekis i'm the chief technology officer at bcs group which also owns a company called bit crack cyber security um we're based in south africa and we operate throughout africa handling cyber security pen testing assessments advisory password auditing and so forth so ethan is one of my colleagues he's a security consultant from the company so we'll both be presenting this part today as per mentioned we have done i've done some talks before on passwords having efficient password lists but what we wanted to do was look for new and different ways that people are creating their passwords especially now

and especially since that talk was given i think you said seven or eight years ago i can't remember how long ago it was we've also changed a lot of the dynamics around how we enter passwords now a lot of it is more done on a mobile device than it is for example on a keyboard in front of a computer which is why we're also going to touch on a few things related to entering passwords on a mobile device but when it comes to foreign word lists what we wanted to touch on was what makes an efficient word list right very briefly i'm not going to go into too much detail on that but it's the

reason for why also we want to discuss foreign word lists by my english is when we will now start looking at the foreign word list the one with the emojis so coming to the cell phones again generating passwords on on cell phones is very common these days which means we have a keyboard that can also do emojis now and that also creates some complexity some websites will die a horrible death if you try and use emojis but some will not so if those passwords are leaked and there's password hashes up for being cracked emojis could be in there and then putting it all together we'll just show you some brief tools that we created to help you do that they'll be

on github shortly after the talk and we hope that you'll extend them and work with them and get them to do more and then of course the mandatory cat picture for pear i hope you'll i hope you like it good so we can now begin okay so what makes an efficient word list and as i said we're going to spend too much time on this but i wanted to highlight actually the opposite what's not efficient and why we thus want to have these targeted wordless so big wordless they're not efficient right especially if you've got slow passwords that you want to crack if you've got a 25 gig word list and you've got a ton of gpus and you're

cracking md5 that's great you'll you'll get through them switch to assaulted hash switch to bcrypt something like that and your great grandchildren will still be watching the job running so we want to create efficient wordless especially ones in foreign languages what about numbers in wordless we want to leave these out as well the reason i'm mentioning numbers is because they'll come up in the emoji side as well we generally don't want numbers and word lists because brute force rules can generate those very quickly we can append them to words or stick them anywhere in a wordless candidate so there's no point to really having numbers in our word lists except maybe when we're dealing with

emojis and you'll see why when ethan does his part there may be a good point to having your rules output some words with emojis and the golden wordless rule that i've stressed before all those many years ago if a rule can do it your wordless shouldn't have it okay because if your word list has all the things your rules could do instead of using the computational power of your gpu you're just using text files and hard disk io and it's pointless because you're shoving those planes into gpus and you're just doing a glorified comparison rather than getting work done on the process of a gpu so try and keep the word list as simple as

possible let rules do all the work that needs to be done on that so for the first part the first technique we're going to look at to today is something called translation when it comes to foreign word lists now i don't know if any of you have had experience with chancellor alliteration if you've ever had to translate into other languages you may have come across transliteration what is it so according to the dictionary it's defined as representing or spelling in the characters of another alphabet right so it's not translation it's transliteration but it can happen after translation okay so if i've got a word in one language i translate it to another language right and that language could be say it is

russian or arabic or greek or something without a standard latin character set that's all good and well and we can find many dictionaries on the internet with russian words with um greek words without with arabic words but what most people do is that they're sitting in front of a keyboard that's got latin characters or english if you want to call it there but let's call it by the proper name which is latin character so what did they do they think of the word in their in their home language but they type it out the way it sounds on the keyboard that's transliteration so let's look at an example if we take the word alphabet okay

that's what the translation to alphabet is in russian okay now a russian person entering a password on a website may think to themselves well let me take this word and rather type it as it sounds and so i'm not going to enter the russian character so we could have a dictionary with a lot of russian words in it some nice unicode there you could brute force if you're using hashgate or john unicode characters yes you might come past this word at some point but there's a good chance and we see that a lot in the passwords we crack that the person didn't actually type it in that character set what they did in fact was

they used transliteration which is here on our next slide okay so they took how the word would sound in their language and they wrote it out in latin characters and then they add all the bells and whistles to it so we can have adding of numbers we can have special characters we can have case sensitive toggling uh we can add special characters to it we can do quite a lot right now if you were only working off of english this word wouldn't come past in your dictionary right because it's not an english word its base started off in russian and we see it sounds like the word alphabet and excuse my russian pronunciation but i'm dmitry the greek version not the

russian one so my russian's not good and alphavit could have been written like that and thus we would have missed it in our in our dictionaries if we were trying to crack these passwords even if you were brute forcing okay we're not assuming that was the password we're assuming it's going to be a complex one and something that's got to go through rules to actually be effective so how do we generate these words okay because we could do it manually by manually i mean you go take russian word what is the what does it sound like how is it pronounced okay let me write that in in latin characters all right like brute forcing b crypt with a large word

list your grand great grandchildren will be doing that work as well what about dictionary mappings not exactly easy because it's long and it's hard and the other thing is it may not understand the pronunciation of the word to actually give you an answer right so when i'm saying that word in another language like arabic greek russian or chinese how does it sound and how do i get that into english or non-uh standard characters back into the latin characters so we leveraged a service and both google and microsoft offer the service the microsoft one just is a lot easier to use it's called microsoft's azure cognitive services and it's actually not designed for this it's designed for all the fancy things like

uh auto completion of words on your apps translation translating sentences and keeping the right intent in it but one of the features it does have in the future that we're using is transliteration where it can take a language like a russian word or a sentence give you the translation of that into how it sounds and then give you the english translation sorry the latin characters translation for that so if we come back to this slide right that word running through microsoft's cognitive services came out with that okay now something else to keep in mind is that these characters might not always be what a person will use what do i mean what do i mean by that right maybe some

of you can pick pick it up but if i'm a russian person and i'm pronouncing alphavit and i'm not using an english or a latin keyboard to type alphabet i might not use an i i might use a y so it might be a l f a v y t okay we'll touch on that just now because that means some rules are required in these word lists to get them to be even more effective than just dumping out a bunch of words and using them so azure cognitive services just a warning about this okay like any service out there especially when you spin up your aws machines and you do something and you forget about it and

then you have to sell your house something similar might happen here okay there's a free tier in in azure where one million characters can be transliterated for free after that you need to use a paid service and that's going to bill you per character so if you take the the tool as it is and you shove a 25 gig word list in it well it's gonna it's gonna it's gonna be a bit expensive what did you find out yeah the hard way [Laughter] okay so it's going to get expensive what we've done and i'll get that on to the next slide for you we've done some things to help you out and we'll continue putting them on our

git as we go forward but what i want to show you quickly if i can just escape from here is coming to our console here okay

okay i'm just going to show you the code here it's very easy to use to go forward so what you do is you come to the bottom you'll get yourself a and a subscription from microsoft's cognitive services it'll spin up an end point for you that that you can use you simply add your location uh you add the endpoint using the translate endpoint this code will be online you can take it and work with it and then you go from from there and when we get to looking at the tools i'll show you what the output for that is now what we did is we took the english dictionary and we shoved it through the service

okay and it's a two-stage process but thankfully microsoft cognitive services returns both in the same reply so we took english we said take the english dictionary and convert it to russian right then how would that russian be pronounced give us the transliteration in latin characters and that's when you get output like this now okay so it's the english dictionary and it's obviously the end of it i'm not showing you the whole thing it's the english dictionary and we've translated it and at the same time taken the pronunciation and transliterated that and now we have a whole bunch of nice bass words to use when we're cracking our past passwords to help you we've done three languages

through the engine i'll put those on github for you you can use them as base words and you can extend it further um obviously like i mentioned be careful because once you pass the free tier your credit card's going to start being built for the output coming coming out out of this it's not overly expensive i'm just saying just keep an eye on what you're doing and the reason i mentioned rules are so important is never ever give this your dictionary with a whole bunch of junk in it right because if you give azure airplane one two three azure is gonna say well this looks like a spelling mistake i'm sure you meant airplane right

i say you've got a thousand of those you've actually paid to translate the same word a thousand times because a zero is going to either give an error or it's going to say i think you meant this and it's going to give you the word so you actually do duplicating work so make sure the input dictionaries into the cognitive services when you use these tools are very based dictionaries they have nothing extra added to them so that it can just give you the raw output of what it sounds like when translated in to english after that there's some rules you you can consider like i mentioned and we'll put these up on the github as well

okay because people are not microsoft's cognitive services right they think differently they get into habits so someone who is russian maybe and typing these words on on an english keyboard the way they sound may not use an i where it should be they used a y instead maybe they didn't use a k they use the c instead so now you don't want wordless with this because like i mentioned you want your word list to remain efficient and fast but you want rules to plug into your password cracking offload at your gpu let it do all the hard work here of figuring out all these iterations of what the person could have typed while they were busy with it

in case you forgot what you don't know okay this this is a standard hashcat rule the s means replace x with y so replace a y with an i replace an i with a with a y replacing c with a k i'm sure you get the drift you can build on these we have done quite a lot of this so i'll publish those on git for you as well what we've done in the various languages is we've looked at what people could have used right so for example greek if you're translating the word greek say drink in greek is right so potter could be p-o-t-o when written on a latin keyboard but greeks have the omicron and the

omega which looks like a w so the person could maybe have used the w instead of an o but the cognitive engine won't know that it'll give you p-o-t-o as the output and so these rules will then come in handy to fix those minor things where just human humans have changed it by using a character that could have been some something else so that covered the first technique the second technique we're going to look at is keyboard walks with a difference when it comes to foreign languages and this is an interesting one because i didn't realize how often it actually gets used but it's used quite a lot not really on on cell phones and tablets

but certainly on computer keyboards and keyboards where people type okay a normal keyboard walk this is not what we're talking about right i'm sure those of you in the know will know what a keyboard walk is you have a pattern you don't know the the password you know the pattern on the keyboard right so i go and a password comes out of it okay and these various tools that can do that hash cat's got a few others have a few right not complex there are many ways to do that but how does it tie into foreign wordless right if the person's used that well we're going to look at a keyboard now it's called the

perma keyboard it's a chinese keyboard one of the many chinese keyboards out there they're simplified ones there's there's uh non-simplified ones there's a whole lot of different ones but what we're going to do now is we're going to look at a keyboard walk based on the assumption that two things have happened either the person typed the word in chinese but left the keyboard set in english or type the word in english and left the keyboard set in chinese right so what happens then is i'm thinking of the word password right so i'm a chinese person i speak english and speak chinese and i think let me type password in but i'm going to be clever as it were and what i'm going to

do is i'm not going to leave the keyboard in english i'm going to set the keyboard to chinese and then type the word past password now if we do that and we press a p and an a and an s and an s and a w and o and r d that's what comes out in characters okay it means nothing you can't translate it it's not a word right it looks pretty random and if you came that across that in in the past in a hashes you were trying to crack you may look at it and think but we're not trying to translate it it's not working that's because the original word was actually thought of in a different

language but it was typed on a multi-language keyboard with it set in the other language so the person hit the right keys but the keyboard entered what each one of it in its chinese simplified is and that's what came out as characters and then they might add their fancy things like a few exclamation marks and or well we knew we know users it's not complicated chances are it was at august 22 or something but the fact is that this is pretty random from our point of view and the opposite could be true right the person could have thought of the word in chinese but typed it on this keyboard set to english in which case it'll look like

gibberish to you but there's actually words so what we've done is we built a tool that will also be on on github and it's it's going to take a few of these keyboards and say okay give me the input language and i'll show you what what would have been output if the person entered it in that language but set the keyboard to its actual language or the other way around and so you would get nice wordless filled with these random characters as it were even though they're not actually random random characters so those were the two techniques i've covered for you now the transliteration which is how does it sound in that language but i'm typing it in latin

characters right very common so when i crack some hashes using those base words it does crack a few of the open stuff that you can find lying around on the internet remember to think about your about the target right so if it's a russian website where you've got hashes that that you're legally obtained and you're and you're trying to crack them okay you're not going to use chinese transliteration as much as you are going to use russian you could you can if you want to but i'm saying give it some thought as well when you're working out that and then on on the keyboard things to think about your targets as well because it could be that it's an arabic keyboard

right but the person thought in english but typed it in arabic mode on the keyboard and so we've got a bunch of arabic gibberish which is actually makes sense if you look at what was timed when they walked the keyboard so up next we're going to introduce ethan and he's going to take us through the one with the emojis

[Music] the one with the emojis so why do we need to account for emojis in our password lists well for starters there's increasing support for it we on our mobile phones we've had emojis for ages and you see now in windows 11 they've just released an emoji keyboard max also had it for a while so a lot of devices have support for emojis so it's no longer the case where you'd maybe make a password on your phone and then when you get your desktop it wouldn't work because there's no emoji support there also our heavy use of our mobile phones everyone has a mobile phone and using emojis has become part of our lives we use it in our text messages

it's really become part of our language so it's only natural that it also become part of our passwords as well again a lot of accounts are only used on our mobile devices if we think about things like instagram tick tock uber they're all accounts that we only use on our mobile devices where we have emojis and then there are a lot of emojis over 3600 emojis in unicode standard so this is a very very large character set that we need to take account in of when we do our passwords perfect so what does our tool do while our tool works on the assumption that people would use emojis in phrases in their passwords you can generate a random list of the

most frequently used emojis and use that to try crack some passwords but what our tool does is it will take a phrase for example i love my dog hashcat and then the plaintext password of that could be i love my dog hashcat 2022 all together without spaces and then the emoji fired password of that would be i beating hearts my dog emoji hash emoji etc etc there were a couple challenges that we faced when making this tool first off there's a lot of emojis for example the word love could have well over 20 different emojis related to that specific word could have the heart emoji the blue hoten heart emoji heart eyes kissy face all those

different ones could portray the word love and same with things like dog so what we had to do is generate all those different combinations and someone might have only used one emoji in their password they might have used two they could have used three et cetera et cetera so with this tool we really try to generate all the different possible uh emojis with this or passwords with this well that does mean that you're gonna end up with a word list that's quite big if you don't have a really specific input file with just a couple words or phrases another challenge was the different skin tones if we think about just the thumbs up emoji or good emoji there are five

different other skin tones and we need to take accounts of that as well and how it works with emojis and and things like skin color or color is there's actually two different unicode characters we have one unicode character for the thumbs up and then the next unicode character for the color brown and combine they generate the brown thumbs up emoji so we also generated that in our tool tool inputs so our tool can take in a variety of inputs you can take in something like a book or a wikipedia page about someone some sort of article and we're also using engrams to to do this and what an engram is is collection of n number of words or

sample text with with the highest probability of what the word will be next so for example i've got the wikipedia page of formula one driver max verstappen and in there we can see that after the word max the most common word that would come after that would be verstappen and then when he was born born 30 september and so on and so forth so you could feed it anything from a book or wikipedia page and generate passwords like that another thing you can take in is just a text file of passwords so you might have in your password list something like i love my dog hashcat 2022 you would have to have the numerical value in there

otherwise it won't get transferred into emoji and then although we'll just split that up into the different words in there so i love my dog and then i'll change all the those words into emojis so options for the tool obviously just the infile option the file you want to take as input for it art file spaces you can choose if you want to have spaces in between the words or if you're just like them all stuck together numbers if you'd only like to emojify the numbers in your in your password you can also choose to do that sub char substitutes all available characters instead of just the words so for a password like fox moon 19

there's uh emoji for the character for the letter o for x for m and what they'll do is they'll just generate fox moon with all those characters being emojis and also generates all the different combinations of that dashing this is for if you're using engrams for something like a book or an article and then dashing is just the number of engrams the depth that you that you want to go to so just for a quick demonstration

escape okay cool

so just go to our tool python.name you can have our input file

i think it's just yeah and then our odd file okay

so this would just do change every of all of the words in that text file into an emoji so for just examples we can look at what our input was all right sorry

apologies is very difficult to see up here so that would just be our example uh file i'm a fan of the radar chili peppers i'll put that in there and then the outputs

did you see our outfall

so there would be our output file of all the different emojis and all the different combinations where's it gone

of all the different combinations of that word and i can't do it and they'll go through all the different combinations of that just with the emojis in there cool handing back to dimitri [Music]

okay just so you can see it a bit but clearest okay so if you go back to the beginning okay so that's what it took with red hot chili peppers okay so obviously it's going to iterate as ethan mentioned through the different options of what could be red hot right um okay that's the mask okay as we go down we get to the next one okay

so as ethan mentioned it can get complex with emojis because different people use emojis to mean different things right okay like when i'm when i when i'm happy i might use an emoji with a smiley face someone else may think thumbs up means i'm okay so i'm happy so we have to iterate a lot based on intent as well which does create make the file bigger but depending on what hash you've got it can get through them quite fast you can also use std art on the tool and and pipe it straight into hashcat so you don't even have to generate the uh wait for the word list to generate either i'm just going to exit this quickly

okay so if we come back to our um

slides where we were okay so putting it all together so we've had a look at the one of with the emojis we've had a look at the keyboard walks so looking at transliteration right if i have a file let's call it test1.txt okay and we put some words in there base words that we want in our dictionary so let's take airplane let's take potato uh someone give me some words nouns garden gar garden

good okay so what we'll do then is um we will call our tool right and we're going to say that the in file is test one okay we give it an art file test1.out okay and then we give it the language so what i want to do is i want to make this let's take arabic right ar so i'm asking the cognitive translator translated to arabic how does it sound if i speak it in arabic and give me that in latin characters and hopefully if it works we should get some output here okay so there we have some upward i'm not sure if it's big enough to see okay so what it did did is it translated

aeroplane to tyra i'm not my arabic is also not good right so if i'm pronouncing it wrong forgive me but so it took the word translated it out and then gave you what it what it would it would sound like again the tool will be on github and you know take it and change it make it better get it for what you want it to do there's no threading for example the reason i didn't put threading in is because i was worried that might burn my credit card even more than than it has already but um yeah so just just be careful because if it goes wide microsoft's going to be smiling at you with the amount of of or what you're

doing but like i mentioned we will over time be translating based dictionaries for you into these uh so just visit the github page every so often um which like if i go back to here um so it's it's not there yet don't visit it yet we haven't copied the stuff over but uh from this evening or tomorrow if you visit it i will tweet it out as well you can then visit it and we'll start putting this stuff up there and you're welcome to fork all the tools and create your own change them as you want to and do that so that's how we covered foreign word lists i'd like to thank everyone also uh some of the stuff we've done is

not has been studied before we're just changing the way we're adapting it so we've got some people to thank for research and stuff that they've done as well um the word ninja emoji translate the natural language toolkit hash cat of course um those people whose names i don't want to destroy and then uh yeah for microsoft azure cognitive services i'm i'm thanking them for taking my money thank you very much but the service is good and and it works well like i mentioned there is a google one as well if you want to play with that one we're not using it in our tool but you can also play with the google one as well give it a try see

what it's like and you can build off that thank you very much [Applause] for dimitri and ethan sure while they're asking i'll put it back up i have a question uh in my language we have a danish character called and if you try to write that in english often danish people will type o and e so do you take that into consideration that one character could become more characters yes so what we're doing uh i'll quickly come here to this one okay it will fall under here um with rules okay we didn't do danish rules but thanks for the info we'll we'll add those as well you know you you can also yeah i mean there's

nothing to stop you doing it in fact it's better if the person who speaks the language does it because you know like i know in greek what we often transliterate to right i might not know in in danish so but do it under rules so that you don't have to cater for it in all your word list output someone wanted this one i think just going to add the twitter account as well while you're asking questions so that we can also i have a question for you uh yeah can you say anything about any success rates you've had with this i mean the tool is awesome the concept is awesome but have you been able to crack like one

more hash or are we talking uh what are we talking about how you know how much have you improved your uh password cracking capability with this so we did crack a few hashes on some publicly available lists um we didn't try to target them specifically so we didn't actually create the word list we created were more for the talk than for actual wordless targeted lists out there we tried them on a few from the hash killer website but but again it's too vast to actually know what they they what they could have been they were uncracked hashes that some of the password crackers have tried and didn't get and a few cracked out of that

we i would see it working a lot better is in a corporate environment so if you get ntlm hashes especially in in corporate environments where they've got the multiple keyboards pretty sure you'll see quite a bit of that and you'll see it if you can get a hold of hashes well maybe add the disclaimer if you can legally get hold of hashes from websites with mobile registration you'll come across a lot of emojis on there like ethan mentioned although the keyboards are available in windows and mac i don't think the uptake is as high as when they're creating them on the mobile devices your example of the keyboard works uh walks is something that i actually saw

when i was tracking passwords illegally in ukraine like 15 years ago so people were doing that back then they thought of something in russian or ukraine and they sort of looked at the achilles characters but they actually typed in what became english characters into the system and to me was complete nonsense until i realized what they were actually doing more questions

i i think there's a question probably more for ethan some of the conversions to emojis that you had ended up being very short like i love my dog could be i heart dog do you have and i'm wondering if uh websites and so forth are uh counting the minimum password length wrong you know you'd if it's supposed to be eight characters do they are they counting you know each emoji is like four or something like that because that's that's its length in unicode uh it well it depends so a lot of the emojis are just a single character a single unicode character but then again some of them with like different colors and stuff it's two it's two characters

right you'll see it as two characters but but even the emojis that are a single character it takes multiple bytes to represent that so it might be a four byte long unicode character to represent that emoji yes but it will still be only seen as one character okay yeah i i hope hoping that they're seeing it as one rather than four otherwise you're going to get some really probably easy to crack two character uh passwords yes cool sorry just to add to that so what i think a lot of websites are doing as well is that they're taking the input if it's unicode um they're then catering for okay it's unicode so don't look at the actual

unique code itself but work out what how many characters i've i've i've got in unicode rather than how many plain text with the actual unicode written out as u1 or whatever the heck's value is for that okay so thank you again dmitry and ethan great talk now there will be a break until five o'clock uh some changes are coming jeremy gosney couldn't come and he had to cancel his talk and magic happened so he is now at a hotel and hopefully he will do his talk passwords but make it uh nihilism at five o'clock i will gladly skip my talk for listening to him so i really hope that you will be back at five o'clock and i

will go look for jeremy right now thank you