← All talks

Crafting tailored wordlists with Wordsmith - Sanjiv Kawa, Tom Porter

BSides Las Vegas47:07180 viewsPublished 2016-08Watch on YouTube ↗
About this talk
Crafting tailored wordlists with Wordsmith - Sanjiv Kawa, Tom Porter Passwords BSidesLV 2016 - Tuscany Hotel - Aug 03, 2016
Show transcript [en]

crafting Taylor word lists with wordsmith Sanjeev and Tom from payment software company please go ahead thank you yeah thanks guys for coming out there's a lot of cool talks happening in this 10 a.m. time slot so yeah we're thankful you guys came to this one and we're sure that some people had a late night last night so thanks for thanks for making it to this one some some quick formalities Tom's a guy with a beard on the Canadian don't hold it against me would we're both pen testers with PSC PSC specializes in pci assessments as well and we also do pen testing and non PCI contexts our day-to-day kind of just involves going through large and Fries organizations

and going through various Network segments trying to find cardholder data we're also looking for pen testers so if you know any or if you're interested in pen testing either come and see Tom and I after or joe over here in the front and we'll be happy to talk to you so Before we jump into a quick primer what's wordsmith well it's just basically a tool which can generate dictionaries the only thing that we're doing differently is that we are generating dictionary is based on US states and specifically geolocation data geolocation data can just basically going to be boiled down to city's landmarks zip codes area codes towns and that sort of thing we'll get into exact

sort of data sets that we're collecting and we'll also go into some statistics a bit later but we're taking these word lists and we're just going against and cracking against large hash sets or just hash sets in general to identify what sort of passwords people have and if they're introducing geo base passwords into their phrases so yeah we're going to go through a quick primer here just basic authentication process as well as the difference between passwords and hashes and dictionary attacks we've timed it it should take about three minutes it's about eight slides in total for those of you already who already know about password attacks and dictionary attacks and and hashes and that sort of thing there's going to be

an image on the next slide here and if you can tweet us the hash type that's in this image we've some swag that we're giving away so we've got like case logic backpacks and I think phone speaker amplifiers and a single selfie stick for someone who really wants to help me stick I guess so that's our Twitter handles I think it's also in your brochures or go and check out wordsmith I just made the repo public so you can find it there and I guess just as a quick show of hands how many pen testers are in this room is anyone do pen testing Joe a couple of a couple guys are there great so if you've ever

done any sort of man in the middle of attack on your on your network and I guess if you're a blue team who's ever done a man-in-the-middle attack on your network this might look this hash type should look pretty familiar to you so yeah here it is you got kind of a couple seconds here to take a look at that and and send us a tweet here at our Twitter handles and we're happy to give away some swag after but for now we'll head back to the primer and Tom is going to walk you guys through the primary thanks Andrea so let's talk about a simple authentication process this is Bob and the extent of my Microsoft Paint skills

despite what Bob's prohibition-style hat might suggest bob is a user in a windows environment and this is Bob logging into a Windows host with a user name of Bob and this might be locally on a workstation and ordered unlock yet this might be remotely via something like our DP and we've taken the liberty here to unmask the password field so you can see that Bob's password is password 123 on submit our own Bob clicks in turn that input ups password 123 is put into a one-way hash function and the output of that is a fixed length character string what you see represented a blow that's what we call a hash and what's important to note here is that this is a one-way

function we can't reverse this process by putting a hash into the hashing function and retrieving the original clear text password so after we've hashed the password we put together Bob's username and password hash and we send it to the authentication server this might be locally in a same database if you're joined to a domain this might be Active Directory domain controllers ntds get this back-end database holds and this is a little bit of oversimplification but this backward back-end database holds a listing of all the users and their password hashes so we're not storing passwords and clear text here and from there we basically just do a lookup of Bob supplied credentials we find the record for bob

and we match the supplied password hash with the stored one fits correct we allow the login if it's incorrect we'll bump up the failed login count and deny the login so we can traverse this hashing process so how do we convert a hash back to its original string the answer is there's no direct way but what we do have are a very particular set of words words that make pastor cracking a nightmare for hashes like these and we particularly use a dictionary attack so what are dictionaries there's simply just large list of words usually grouped together by some type of theme so they might come from password reaches like LinkedIn yahoo or dobby there's also a

great wordless out on the internet you can find for free like rock you tap 10k there's even some paid ones like unique despite its price tag unique is a password list that any pen tester or auditors should have in their toolkit so in order to carry out dictionary tech first we need a few prereqs first is a solid dictionary or good word list second thing when you know is the hash type so in this case with Bob we're using NT hashes are ntlm if you're authenticating against a UNIX or Linux type server it's going to be some variation of md5 or shawl 12 usually with assault the third thing that we need are a list of password hashes and

these are usually exfiltrated from compromised systems maybe like a local Windows workstation or an Active Directory domain controller and these what we're doing the lookups against so the steps are actually carrying out a dictionary attack boil down to this three-step process we call the guests encrypt compare cycle our guesses are words that we're plucking from the word list and we iterate through them one by one from there we take the input word and we put it into our encryption algorithm in this case an NT hashing algorithm and it gets outputted this fixed length string then we take that hash and we do a look up against our list of obtained password hashes if we if they match then we know

we can map that back to the original word that we guess and we have our clear text password and sanji when we've on to wordsmith yeah so as we mentioned briefly at the beginning of the presentation where Smith is just a wordless generation tool for US states and I guess geolocation base data so what kind of Geo data is in a word list well we've got things like cities and towns we also have landmarks so in Nevada you're going to have area 51 things like the Hoover Dam that sort of thing we've got streets and roads we have zip codes sport teams colleges common names and area codes now why geolocation data well it's really

interesting I guess it's kind of a marriage between curiosity and password analytics and just general human behavior I remember I was testing I was an internal penetration test or client in a really small state and as part of my post exploitation process I tend to go from system to system and scrape credentials out of memory just using kiwi or maybe cast or something like that and that usually enables me to collect a large amount of passwords to then enable me to move into another network segment eraqus access applications which unlock in greater depth into that environment now collecting all these passwords I realized a common trend for several of these users and that's I couldn't crack

these because these are specific geo location based passwords that weren't going to exist in any sort of password list is currently out there things like sport team names or colleges or other things we might be in it related to geolocation so thought to myself well be pretty neat if someone put together a wordless generation tool and then that kind of transformed to well we'll just put together wordless generation tool for geolocation data and as we'll get into some statistics of it later we kind of find out that we've limited some guests encrypt compare cycles and been able to actually turn this into something quite useful so if it should probably mention where's all this data

coming from well Wikipedia in the US census have a ton of this data and it's really available to the public all we've done is we pulled it scraped it and put it into nice little phrases and words which appear in word lists openstreetmap is another good source as well we've also had to put together a collection of data sets for area codes because that was a little harder harder to parse using our posting engine so and required a bit heavier parsing so we actually have some custom data sets which we made as well so tom is going to talk to you about how wordsmith works and then we're going to jump into a demo so yeah take

away cool so the github repo is live now when you do your initial get pool you'll see these files listed there on the right you see the actual wordsmith ruby file there's also next to the sources amo file as basically just a simple configuration file for all the internet sources were pulling down from data and we broke it out like that to hopefully make it a little easier in a modular design to be open to extension and for easier management of our internet sources next to that you see a data tar.gz which is basically just compressed data archive where we've already pre scraped all of the data that we're using from where forward smith and

compress it there you'll also find a gem file just to make installation a little bit simpler and there's a readme there which you'll see in the repo that walk you through some of the dependencies and installation so what do you run wordsmith for the first time it's going to do a couple checks for some of its files that are needed if it doesn't see them unpack that data tar.gz file and expose it into the current working directory in a subdirectory called data that data directory is mostly categorized by state with the exception of some of the custom data that we've had to massage into place the top level you'll see some the directory for the

area codes names which we've pulled from us censuses first names last names baby names sports which are mostly big for sports in each state at this point and then the state's themselves and below that you see an example for what kind of files you would find in the California directory so you see a city's de HTML their colleges I am L landmarks Road zips these are words very specific to that state if you notice the dot HTML extension these are actual HTML source files that we've pulled down from our internet sources and the reason we've done it like this is because we've added an update option with that wordsmith it's a dash you flag so sometime down

the road if you'd like to update your data manually you specify the update flag and actually go out to all the sources and update your local data repository for you to parse this data we're using gems like nokogiri and spider hmm and we do all these lookups offline so locally just for speed performance so a word lists has been generated by wordsmith kind of looks like this so I'm using an example that Sanjeev went through earlier with roads from Nevada particular Fremont Street so the word as it comes out of wordsmith looks like that's a capital F there is a space in there there's a period at the end so we add in some just very basic

mingling for words so we can split on spaces set breakout Fremont Street into two separate words we can remove special characters we can remove spaces there's also options to convert all the words to lowercase if that's your preference you can also specify a minimum character length so let's say you've compromised a domain where you know the password policy has a mem character length of eight we can specify the dash or at the minimum length here and it will truncate all words that are not at least eight characters in length and now Sanjeev will take you through the demo I've seen some really bad things happen with live demos in the past so let's hope this all

goes well all right so is that text how's that can anyone see that there you go get bigger cool okay so yeah as Tom mentioned these are kind of the initial get pull files and if we just go ahead and run the wordsmith for the first time you're going to see that all these files get unpacked there's also a warning message that you see here it says that cool is not found in path now that's because I'm running wordsmith on my OSX system i don't have cool installed but if you're running this on cali or something it's going to pick that up in your path variable and you'll be able to use cool now the purpose of cool is

because we've integrated support for things like domains and in files so for those of you who are in familiar cool basically if you specify a domain name let's say client calm or facebook com it will give up to that domain and a look for unique words and scrape it from that application and the default cool settings i think stays within the scope of that domain name and it only goes a certain recursive depth and follows any hyperlinks that take you to any other links on that scope and it'll pull the clients name it'll pull other unique words and strip out some of the common words like of the any connection verbs things like this and you can also have

an in file where you specify various client domain names and all this is just a better populate this wordlist so we've integrated support for cool however you don't need to use it because we also have some other options now as as we kind of mentioned from a top-down approach it all starts from a state and typically when I generate a wordsmith word list I'll use a dash all option and alls going to give me cities colleges landmarks phone numbers roads teams zip codes and also names name so names would be like common last names baby names you have no idea how many times I see a first name or a baby name as part of a

password yeah so I guess this one through an example so step to set the state for California and we'll look at some common sport teams there so as you can see these are all of the sport teams in California and that's just doing a basic look up on the HTML files that we've already pre polled now these are great but we can also mangle these and get every single permutation of these words so as you can see here we've got Sacramento Kings Sacramento Kings Sacramento and somewhere up here they'll probably be kings as well and there's only one instance because we've there's probably some other teams here which have Kings in their name but we

also do a sort and unique so you don't get duplicate words and things like this what's also pretty neat is that we do things like zip codes so we've got every single zip code in Nevada or landmarks so if we set the state to DC we can look at landmarks and we probably tend to see things like white house or or whatever like the Lafayette building things like this yeah I guess we set the state to maybe Massachusetts we can look at some of the colleges that exist there so these are probably see Harvard or MIT at some point in here yeah so as I mentioned before these are some of the options the singular options that you

can set for per state we can also do multi states or for example CA and Nevada and grab me all the area codes for those two states now this is pretty verbose output so i guess the inverse so that would be quiet output so let's set the state for california we're going to grab everything we possibly can and we're not going to have it as verbose as this and so you can start seeing some things here how many landmarks there are any zip codes there are but this isn't really useful because we're not getting any words so we can go ahead and output this to a file like California txt you know collect all this data and stick it

into a word list for you but as we kind of mentioned before these are just the actual words and sells these aren't the mangled versions so you can specify the M flag change that to California mangled and now you can see that this will attorney time here for especially the roads because there's 250,000 roads that were now going to have to mangle which now helped but almost double the amount so like Tom and mentioned earlier we've split on space we've concatenated we had stripped symbols and and things like this any other options that anyone wanting to see from this help area or see if a particular college at someone went to a state shows up I'm wondering

how much overlap there is and how much benefit there is restricting advice date sure so you we also have an option built in here for all so you can turn through every single state and create a mega word list for everything as well but so that's yeah it's totally an option as you can see you'll just start spitting out everything is just an array mm-hmm well yeah so but that's the with the quiet option if I remove the quiet we're going to grab basically it's just going to keep on churning like that yeah yeah so sorry there's another question yeah

it's a great option yeah sorry it's a great term yeah so sorry yeah he said how do you account for local businesses so the dash D flag for the cool integration so let's say you're testing a particular client the client is probably going to be some password variation in there that's a client named 1 2 3 and so that that cool integration will script that client's web application pick out the clients name and put into this wordlist for you as well I don't actually have internet connectivity here because we're at bsides and I don't want to connect to Wi-Fi so I can't show you the cool integration aspects of it but yeah that's that's basically what our

wordsmith the demo is and was really interesting or some statistics that we're going to kind of show you guys and I think you've heard me talk enough so let Tom just kick that off cool

yeah so we wanted to measure how effective geolocation bayless based wordless where see this we need a couple of tests some of the prereqs for this test we first built a hash cracking rig locally in our shop and we got our hands on some real NT hashes so we grabbed some hashes from actual real-world internal pinter penetration test from clients in Massachusetts pull over 400 hashes wisconsin's about 2000 and New York which is about 500 hashes the hash cracking rig itself our weapon of choice for cracking NT hashes is hash cap the hardware is fairly modest was just an nvidia grid k 5 20 but even with that we could get about 3 billion guests and

crypt compare cycles per second so returning through passwords fairly quickly and just last week Sanjeev put a post on his blog for those interests is that want to build their own hash cracking rig and it goes through the process of doing so in amazon's AWS it takes you through the steps of spinning up on ec2 instance getting a hash cat installed configured ready to run and so you can start crackin caches so some of the test cases that we're going to go through first we're going to crack hashes from each of those states using some basic word lists and some rules and then after that we're going to supplement that with word list generated by wordsmith the syntax that we use to

generate those worthless and wordsmith is shown at the bottom there the base way to read that is the dash S flag is for wisconsin the dash a were grabbing all categories so cities colleges roads etc we're going to do some basic mingling on those words and we're going to output him to a text file in this case wisconsin generated about 112,000 words massachusetts was around 82,000 and New York was 158,000 so the input parameters for this cracking session look back that guess encrypt compare cycle our input is what we're guessing from of those word list so we're using top 10 K to 10,000 words we're using Rock huge izle over 14 million and then we're using word list gineering by

wordsmith for each corresponding state so I Wisconsin Massachusetts in New York and we chose the rule set of dead hobo just because it's one of our favorites it's got about 57,000 rules for doing some advanced word mangling the compare phase encryption algorithm choice is NT because we are using we're trying to crack passwords exfiltrated from a domain controller here n NT is based on md4 and for the compare we're doing lookups against the obtained password hashes that we got from each of those domains and each of the three states and Sanjay will discuss some of the results yeah results are in so yeah Wisconsin 2011 and tell'em hashes which kind of can be translated into 2011 Active

Directory user accounts which can kind of further be translated into 2011 employees although that might not necessarily be true because there might be some accounts being shared so we'll just say 2011 Active Directory user accounts now as Tom had previously mentioned the top 10k wordless is just a collection of the top ten thousand passwords things like password love God anyone who's seen hackers you'll get that reference but yeah we're taking this 10,000 word list and we're injecting those individual words into the dead hobo rule set which is 58,000 rules so these rules can prepend symbols append numbers lowercase words camelcase words so for every single word you're doing 58,000 different permutations of that world's word based on this rule set

so yeah the the 10,000 top passwords took about two seconds to run against these rules and uncovered 237 of these organizations passwords which means that 237 Active Directory user accounts had a password that was in the top 10 word 10,000 word list as a root password in there in there his password string yeah rocky uncovered another 1094 passwords which is in our sixty-six percent of a cracking success in total of all these hashes and now our wisconsin generated word list took 12 seconds to run and uncovered another eleven percent so as 229 passwords now which should be key here is that this wisconsin word list is solely it solely consists of geo based passwords so

at this point we've cracked sixty-six percent of the passwords but the additional 229 passwords are all things like city's support teams landmarks zip codes and things to that effect so there's a question over there there would be duplicates so this is a collective cracking session it's called basically your hash cracking pot would be populated with sixty-six percent of these passwords at this point and so as a collective addition so yeah we incur bird another 12 oh sorry eleven percent of passwords that were all geocentric now yeah so if i remember correctly wisconsin some of the really common passwords like green bay packers first names and baby names as well massachusetts had a smaller hash set

about 400 Active Directory user accounts that makes top 10k with the 58,000 rules right about a second and recover 52 passwords which is about an eighth and that's really surprising because it just goes to show that some organizations have really weak password complexity rules and enforcement's rocky you recovered is staggering sixty-five percent in 24 minutes and with the dead hobo rule set and our wordsmith generated were list another 56 and again this is all geo base word lists or sorry geo location based words which which are found after in about 12 seconds and Massachusetts I mean people always you support teams names like Red Sox and things like this but what's really interesting is you'll see a lot of city

names as well like Boston Boston Marathon Cambridge and Harvard and Fenway at Fir landmarks as well New York 552 hashes and 552 Active Directory user accounts what's really surprising with New York is that 0 were recovered with top 10 K which can we can allude to it several things here either the Active Directory domain controller has a third-party plugin so this new york organization has imported a list of known compromised passwords or bad passwords into active directory through some sort of third-party module which technically refrains users from creating bad passwords or as a non technical control they have just great security awareness programs or long and complex password requirements or things like this that being said the rocky word

lists or takes about 26 minutes to run uncovered about 220 passwords and our new york words with generated word lists recovered in additional 59 and as you can imagine some of the popular passwords would be landmarks like empire or i think i have some examples for ya empire broadway there's also one user in particular who had this state NY abbreviation and then five numbers which are the zip code and then a symbol as part of his password so yeah tom is going to kind of summarize that lasts or a segment there cool so some of the conclusions of this yep uh did you try the new york data but New York hash is against the Wisconsin list no we didn't

because that would be very useful to see whether you're actually getting value from your from paying attention to geography as opposed to just what's on these lists absolutely so you need to do that crosswise yeah so another to further extrapolate that we can also just do all states in general against that New York word list as well but the real takeaway here was just kind of for that particular state and that guest encrypt compare cycle so for one state it took 22 seconds but for all states it might take an hour who knows we have to actually try to do that yet that's a great question so some of the conclusions from this testing we've got

a little bit of confirmation bias here in that will get into the psychology of Hell users choose passwords but we know that users like to choose passwords that are near and dear to them they choose passwords off things that they know you know the street they grew up on the name of their child and that's what we're seeing reflected here in our results and with that there's a little bit of a time CPU cycle trade in that instead of using a blanket word list and looking for low hanging fruit or spending a little extra time up front to craft a more tailored list and we're spending less time or less CPU cycles I'm crunching those less pertinent words

it's a small sample size here but with these cases we had cracked at least an additional eleven percent of passwords in a reasonable amount of time I think that speaks a little bit to the relevance of the generated passwords so next steps word of wordsmith where we see this going we're always thinking about data we have ideas for more it's difficult to have a night to actually marry an idea for data and actually find good sources for it but we like to expand on say the sports we've seen users who love to have their favorite athlete or the favorite player as a the base word of a password maybe team mascots or names of stadiums you could

include famous people like politicians or actresses or actors state symbols things that are very relevant to a state such the motto or state song state flower and we've gotten recommendations from the community to Larry Pesch she recommended looking at regional food or cuisine or agriculture from the design the code design perspective it is modular we like to like to be even more so just to have open to extension we think that this framework could be extended to not only include states but also include provinces or territories or even other countries and we could even change the granularity of how we're looking for this when Sanjeev and I were concepting wordsmith we're thinking about what's scale do we want to start

with do we go you know as macro is continent or we drill down to country state City Road address actual geo coordinates so maybe the future version could instead of a user in putting a state they input an address or a pair of coordinates and they specify a default radius of say fifty miles and say give me all the words that wordsmith can generate in a 50-mile radius so Sanjeev and I are both believers in free and open source software we believe that everyone should have access to all the source code at all time we also believe that we're not the smartest people in this room so if you guys have any ideas for data for features if your

have experience with looking at a query in AP is for geocentric type data we would love to talk to you please send us a pull request submit hit us up on Twitter the repo is listed there we'd love to share this with you so with that this is contact information also if you reply to sanjeev's hash challenge earlier via twitter feel free to hook up with us will either be at the back room and the passwords and just bring some verification that it's actually you on the other side of the tweet but with that we thank you for coming out yeah and the Florida question yeah thanks guys well i'm pretty sure that questions for this and I have presents myself tons

of them actually but as an example I just wanted to tell you that in the UK here's one government organization that keeps on tweeting again and again and again that are safe and you know a good password that is also easy to remember is made up of three ec words that's what I keep saying all the time three easy words now on December 4 2015 hash cats put out a tweet saying important announcement and there was a hash value in very short letter words solar the sign of replied to that tweet saying the hash if you can crack its s hash cut open source so that's the way hash cut announced that hash code was going open

source and solo design of tracked hash cut open source that's a three world pass race with spaces in between worlds so Jeremy ghastly he responded by asking so because solo said that he cracked this by doing a 10 line focused world list so ten words he put into his word list and then he cracked a 3 worldpost race and the words that he put in was hash cut is open source will be will be without a space in between sourced GPL under license and he cracked it so that's pretty much you know that was like dude yes that's pretty good now i'm gonna kick out one of my first questions first for this first of all have you been

looking into the simple fact that there can be as an example fiscal geographical locations that consists of what more than one name like you have a space in there so the sea of something sea of oceans or whatever you want to call it I does wordsmith today actually take that into account or will you just say that anything with the space is two different words so no we take that into account so if it if it does have a space we keep the original string we split on that space we concatenate that stirring as well so we get the common permutations of that particular word and it also has that word in the word list as well on

its singular at level now we didn't want to do too many permutations because we thought that the hash cat rules that people would use afterwards would do that for them so we didn't want to make the word look weird list to inflate it or too big because josh kappa rules would take care of that in the cracking process okay and to keep the keys unique so of for instance when your input in a state like for instance of north carolina which has a space in our district of columbia with two spaces our keys we sub out spaces for just we basically URL encoded just to make sure we keep the keys unique yeah okay so in

your when doing passwords come that i have had sebastiaan louisville do to talks at different times about generating word lists based on different including Wikipedia and his his his talks are essential about you know how he created difficulty in Wikipedia word list and the issues of identifying what's a password was a passphrase was just random gobble in there and i highly recommend his dog from cambridge in last year because he actually also could prove that Han Solo is mentioned in the Bible which is kind of cool it just depends on how you actually break up all those spaces and so on but Han Solo is actually mentioned in the Bible so questions so first thanks for a really

good talk um it's as much a comments as a question which is I think you have really a very powerful tool and having these three sets and I hope you can do some more studies on it and a particular the new there was a discussion earlier about the question of and one of the earlier talks about whether block blacklists dictionaries and password generation are dangerous because people when but you reject their password they just add one or one two three two that and you have a place to actually do an empirical test there with the New York database of saying okay go back to that top 10 10 K word list and do apply some

software that does you know very smug into that and see and see how many in fact users having been rejected just did a simple transformation so I encourage you to do more oh great if you want to send us your organization's hashes I'll be great well you actually Allstate quested to the guy you invited I swear so yeah you guys have a wordless at least you provide so to ask kind of a different question um when evaluating an individual word within the context of a dictionary or hash is there any metric that can be generated for understanding how likely that is to appear in word list globally this is for example a password god I love you 1 2 3 etc

we commonly are told that yes this appears in many many many many dictionaries but we don't have a metric or score to tell us how likely it is to appear and how dangerous it is within the context of a password is there any way we can generate that is there any you know Bayesian analysis probabilistic analysis that has been done on corpuses of words that could tell us given a word how likely it is that is going to be cracked by one of these tools to my knowledge in terms of the community there hasn't been any sort of collective analysis on every single breached word list that's out there as well as a collection of words to identify I guess

the singularity or the commonality of a single phrase across all these lists however that being said in our penetration testing reports we use a tool called pitbull at pipl and that shows the i guess the in terms are percentile how many users are using this word or this root word in context of all the user councillor hashes that we have or have recovered so the only thing I can think of is marrying pitbull and all those common word lists out there but that take a lot of a lot of computing power to do that and so it's almost a project in its own right that's a great idea great talk yeah you should you should think yeah questions I was

wondering on the on the hash set that you were working on did you do any combination of the word list did you combine the word list that were generated from wordsmith with the top ten thousand or with rock you yeah I know that there's like a function and hash cat for doing that yeah it's separate from the mangling rules right yeah no we didn't combine wordless so we were we were kind of more interested in the chunking aspect so this is what the top 10k recovered this is what the rocky recovered this is what we recovered after that entire pot had been populated that's great yeah so if we can combine password and white house together that'd

be great so this may be something else that we could have done when you started not on doing this I mean obviously you do you know there's data available massive amounts of data available for for doing the geolocation part of this but when you start out doing this was it you know did you start making wordsmith because the data was available and you saw you could easily use them or did you actually have you know a theory or eventually did you prove that lots of people actually using new locations as part of the password so we need this input you know what was the reason behind starting it so part of it was as we compromised Active Directory

domains we dumped password hashes we start cracking one things we do after we were turning through the rock use in the top 10 case is we would come to generate our own custom word lists based off root words that are for instance like the company name and we do some of the common translations for that converting zeros to zeros a stat signs etc and also just some of the names of local you know like the street that the company is on or the address we're in the building resides I mean notice we started getting hits after hits after hits using that pattern mmm so then wordsmith came about as a way to kind of automate and

weaponize that process and as a second part to that as well we just thought it'd be kind of cool to do this because no one else has really done this before and going back to like the inception of words with the story of that I just as all scraping credentials out of memory I would see that people have been using these phrases in their passwords so that was kind of like another catalyst for this tool you can invest us an email later on I couldn't do anything later on yeah I'm more interested in getting your feedback for everybody this will come up in your review don't worry about this is Joe he's our boss yeah you've hit for us

to be okay during the course of your testing and analysis other than the one hash get ruleset did you find any others that were more efficient yet that extended what you were doing on top of the the targeted list and what would those rules be for the rest of us so we use three different rules but we found that those two the rule sets existed in this hobo 64 real set so we used rule one rule which contains five thousand rules and we used hobo 64 which contains the top 64 in the dead hobo real set so we do also have metrics on that which I can post out to Twitter or dot to the github repo but

this Herbert real said just was a collective 58,000 rules of that just sort of encompassed all this I just like to say that it looks incredibly useful and I'd like to encourage you to make it work in other countries and particularly the UK we were we would love for you to help us do that yeah anyway you know so well what you people go against letter you hello leo put some points on how to tweak it for the UK or for our country yeah that's a really good point i was in the I was in the Netherlands and and also in the UK recently doing a pen test and I kind of showed this to some of the

clients I was working with and just told him to check out the talk at their time they mentioned the same thing and that kind of as Tom and I were speaking about that and that's why we kind of extrapolated and made it modular now so we hope to build on that and hopefully there could be a UK based implementation in wordsmith too I didn't know if this was comfort but so mass processing so if you want to say like show me all passwords to have capital their length 12 and all that stuff was that covered in this version of the worst myth so there is a forgery so you may have you put up your yeah there is a feature to

specify omentum character lengths there is a feature to know convert to a lower case by default it's going to come out as capital cases how we pull it out of the HTML sources but you can massage the data to a degree so yeah we set the state here for California Z is or z is for the zip codes and k is for the character length so going back to password policies if your organization has a minimum of seven we could specify seven here we all know that the zip codes have five characters they're string so if we specify a minimum length of three we're still going to get all the zip codes for we're going to get all the zip codes five

we're going to all the zip codes but if we set it for six we're not going to get any and we're just drilling down on zip codes here if we went for all which is every single option available in wordsmith and set for six you'll see that will churn through everything every single road every single City landmark zip code area code whatever that is under that six character specification will be removed from the word list I was just curious since you mentioned sports teams and I've seen a lot of users definitely sports teams um if you use cruel um to do like the FIFA website would that be able to scrub those team names or would we have to manually enter

them because soccer is huge at least in the Bay Area so absolutely give me 10 minutes after the talk and I'll answer that question for you because we'll do it together well type in FIFA calm and we'll see if it gets some sort names yeah yeah one of the things I've been telling a man a man in temperature over again a password as well is that if you are a student looking for an assignment something to do or if you are security researcher and you for some magical reason actually have spare time please feel free to contact me and I will make your life a living hell for next ten years we have to work to do you know I

have lots of things that I would like to see being big research and one of the things I'm looking still looking for is and that's also the reason for why i asked you know why did you initially start doing this one of the things that i've been interested in doing and I'm kind of asking around if this could actually be done at all I would like to see somebody make a tool or some kind of you know thing to pretty much put words the base word of passwords that we can find from leaks into different categories so as an example house or car is a physical object one dream or anger is not something physical so my question

is could we analyze password leaks or password from hell tests and categorize them and because I'm interested in looking to you know what kind of categories are worse of people actually using when it created passwords one of the things i have done is i have analyzed passwords based on gender & facial here and hair color i have done that i have statistics on that women with red hair right here have the best passwords and guys that looks like unix viewers have the absolute worst passwords I have evidence of that but I'm very curious about that so if somebody is interested in in work to do you know give me a call and with that I

will say that we are going to do a 10 minute break before we go move on to the next speaker so again kanji and top it is