← All talks

G1234! - The Effect of Constraints on the Number of Viable Permutations of Passwords - Randy Abrams

BSides Las Vegas42:5735 viewsPublished 2018-09Watch on YouTube ↗
About this talk
The Effect of Constraints on the Number of Viable Permutations of Passwords - Randy Abrams & Briana Butler Ground1234! BSidesLV 2018 - Tuscany Hotel - Aug 07, 2018
Show transcript [en]

This is Randy Abrams and Brianna Butler doing the effects of password constraints on password viability. No further ado, please take it away. So before we get rolling, thank you. Before we get rolling, please feel free to take photos. We're definitely okay with that. I know that was in the conduct code. So there we go. And here we go. Ten, nine, eight, ignition sequence start. Six, five, four, three,

How many of you remember adjusting the rabbit ears on your black and white TV so you could get a better picture of the Apollo 12 launch? Yeah, me either. You actually adjusted the rabbit ears for the launch or just remember adjusting rabbit ears? Yeah.

So that countdown was from the actual Apollo 12 launch. You might notice that there's only a limited set of symbols on this screen. I deliberately did not place the entire symbol set there. And this will be important. All right. So I'm Brianna Butler, an engineering data analyst at WebRoot. I spend my days looking at data, understanding where it comes from, how flows our various systems, and ideally gleaning some insight from that data. And I'm very excited to be here to present our analysis on the password constraints. And thank you for having us. I would also like to send a thank you out to our intern, Chad. He helped a lot with our analysis and writing a lot of the code for what we're going to present. I'm Randy

Abrams. I'm going to be Brianna's lovely assistant today. Another shout out to Chad real quick. He was never intended to be a presenter. We just felt his contributions were such that he gets on there too. I'm a senior security analyst at WebRoot. Prior to joining WebRoot, I worked at NSS Labs where I did analysis of anti-malware testing. Before that, the director of technical education at ESET. And prior to that, I designed and administered the processes that Microsoft used to prevent the release of infected software. As an analyst, I like to present a unique analysis, a view that someone else or few people have actually given. Not the standard hype. I've got personalized hype. So with that,

we're going to get going. All right. So I'd like to begin with a few questions by a show of hands. Do any of your employers enforce any PASO requirements or complexity? I'm sure a lot. Yes. How about your bank, your brokerage firm, ISP, email, social networks? Right. Now for this one, go ahead and keep your hands down if the answer's yes. Unless it's yes. Do you only use your employer's minimum length requirements? Huh, no hands out. Interesting, all right. And lastly, do any of you use on symbol only passwords that are eight characters or longer? No, all right. Well, think about that again, because we're going to come back to that, sort of as Randy mentioned. All right, so we all know from

the Hitchhiker's Guide to the Galaxy that the meaning of life is 42. Well, we're here to present that the meaning of passwords is 629,551. Now, how do we come to this number? I will tell you. We began with about 20 years of dates and, or excuse me, 80 years ago of dates and 20 different formats of those dates. And that came to 584,000. We then added 500 names. I'm sure none of you have used those in a password. Next, we added 20,000 words or what we might think is clever words. Interestingly, in some of our analysis, we found entropy was used, which we thought was a clever word for passwords. Then we added 50 numeric passwords, followed by about 25,000

typos, which I'm sure all of us at one point or another have typed into a login, the wrong password with a typo, and then we've reconfirmed that again with another typo. only to find that we are then asked to again, and we entered the correct one, and we've now forced the password reset. We didn't know if we should include this because really it's just 25,000 password resets. Many don't last long enough to actually be correct. And then we have one more. All right, I'm gonna give you guys a hint on this one. Listen close. You may not be able to hear it. Anyone? No? One more time. No? Alright, well, it is NCC-1701.

It's only logical, folks. And that is how we came to the meaning of passwords. And that was Klingon for don't be silly. In case you were curious. Alright.

So this presentation is really about probability attacks versus possibility attacks.

The genesis of this was kind of a perfect storm for me. I'm always frustrated that I can't use a great passphrase, yet I can use something stupid like cat one dog two exclamation point. And it frustrates me. Also at the same time, I was looking into passphrase token attacks, linguistic passphrase token attacks, and I wrote a blog about why enterprises don't care about the top 10,000 most frequently used passwords and it's because they don't meet the constraints. They don't have length and complexity. So why should you care? They're not possible in that environment. So we're going to look at a probability attack. It's the dictionary attack. Why do you start with a dictionary attack? because it's probably going

to nail the password. And when that dictionary attack runs out, you might turn to less probable attacks and then you turn to brute force, which actually starts testing the impossible. Another kind of attack. Are you familiar with passphrase token attacks? If not, just let me know. I've got a very brief. OK, so a password brute force attack, you check one character at a time. For a passphrase token attack, you treat each word as a character. Now, a lot of people get the math wrong. They think that 95 to the fourth, let's say, for a four-character password, translates to the same thing for a passphrase. And the problem with that is there are actually over a million words in the

English language. So a four-word passphrase, each word has over a million possibilities. a million to the fourth. Now, in order to reduce the probability, nobody uses a million words. Nobody has that kind of vocabulary. However, research I've seen indicates that about 5,000 words are in the average person's spoken vocabulary. 10,000 in the written vocabulary. So you narrow your dictionary down to about 5,000 words perhaps, and now instead of having a million to the fourth, you have 5,000 to the fourth, which is still a huge number, but you've made a big dent because you're using a probability attack. Now the linguistic passphrase token attack is rather esoteric, but the idea behind it is let's narrow down the probabilities some more by

looking at how words are put together. So a verb frequently follows a noun. He ran. But you can also go beyond linguistic parts of speech to common combinations of word. Like if I say, oh, my is a probable follow-up word. Oh, my is very common. Now I could say ship, but I'm probably not gonna say oh, ship.

Now password constraints will differentiate between the possible and the impossible. As you know, Twitter may know, Twitter has a couple hundred banned words. So now you've got 95 to the eighth for an eighth character password. Permutations of characters minus 200 permutations of characters. But this stuff gets a lot better. All right, so with that, I'd like to introduce a few of the questions we were attempting to answer with our analysis. First, is there really an effect of imposing constraints on password? Does it really matter? And one of the first questions is how many passwords do constraints eliminate? And then how hard could it be to get an exact count? We would look at a Monte Carlo simulation to

look at, to get an approximation with a margin of error. We could also look at the inclusion exclusion principle and we would get an exact count, but now inclusion exclusion, who hasn't used this lovely phrase in a love letter? But it's really good with a small amount of variables, but as that increases.

it gets quite complex. As those variables increase, this kind of becomes your brain on the inclusion-exclusion, and so we went forward with the Monte Carlo simulation. So what it really came down to for the difference between these algorithms was runtime. The inclusion-exclusion would run in exponential time, represented by big O of two to the K times N, with big O representing sort of the limitations or the performance of the algorithm. And with that, it's at most, inclusion and exclusion were run at 2 to the k times n. k representing the number of variables or the combination of the constraints and n being the number of password iterations. In this case, we iterated through 10,000 passwords

and it took nearly 15 seconds, which may not seem like a large number, but as you get into larger variables and you're looking at computation resources, that's something you'd want to consider. Whereas with the Monte Carlo, which would run in linear time, big O of N, so most would run with N, that ran through the same amount of iterations 130 times faster than the inclusion exclusion. So of course we kind of moved forward with the Monte Carlo just in the thought of time and processing power and we wanted it to finish before Halley's Comet came back in 2061.

All right, now I'd like to walk through a little bit of this Python script. I'm not gonna go line by line or anything, and if you'd like to follow up or have any questions, please feel free to email us. We're happy to talk through it. So we began with importing our Python libraries, and then we had a function, makeRanPass, that generated our random passwords. Then we identified our variables, assigning the characters from our ASCII list. From there, we were able to define the character lengths in the pass length, and how many passwords we would like to create. From there we had a for loop that iterated through these constraints basically, you know, these conditionals were the constraints. So, and we were able to vary

that by hashing out what we didn't want to include to get the combinations.

Here, print statements, we basically, we printed, you know, how many passwords were rejected due to these constraints and standard deviations. And this is an example of one of the outputs where we required a character passwords, a million of them. And what we saw here, or kind of goes back to my earlier point, with the lowercase and uppercase, inclusion, exclusion, that number would have been the same. Here you see a bit of deviation due to that margin of error. Yeah.

So the idea of an infinite number of passwords is a myth. That would mean that it's completely unconstrained and that is not actually possible for us. So in this finite world, there are limitations, constraints. You don't have enough space in the password input field, storage space for that password, and that's a constraint. Life is another constraint. You can spend your whole life typing in a password, and with your dying breath, you'll realize you never confirmed it. So you wait for another go around with the reincarnation and spend your whole life typing in this password. And with your dying breath, you realize it didn't take because there was a typo, and so on, and your dad never got to use it.

And that's really, really a bummer, I think. So it turns out that If n is 1, n being the number of characters, 95, then yes, x to the n is the maximum number of permutations.

OK, we're back. But if x is greater than n, that's not true. The length constraint. If I say that you can only use eight characters, or if I say you have to use at least eight characters, then from that 95 to the eighth, 95 to the seventh permutations of characters are not possible. They're impossible. Go ahead and brute force them. You're spinning your wheels. Choosing a character set, actually the act of creating a password imposes constraints. You have to choose a character to start with. and the choice of character sets is going to introduce some constraints. So going forward, when I talk about constraints, I'm not saying that you can only use a symbol or you can only use a number. I'm saying you must use that, but you

can use any of the other character sets. So symbols, are the least impactful of the constraints. But as soon as you say, your password has to have a symbol with no other constraints for an eight character password, you've eliminated 3.3% of the maximum number of character permutations. So 3% less are actually viable passwords. And as a result, Piglet is the first casualty.

was one cute little dude is my favorite example for teaching people how to create passphrases. Piglet was one cute little dude is very memorable. Because it has more than four words, the power of exponentiation means that a passphrase token attack is very unlikely to succeed. But because of corporate constraints and there's not a symbol in there, Piglet is not kosher.

The lowercase constraint, this gets interesting when you go and say it has to have a letter, it has to have a lowercase letter in it, you've eliminated 7.8% of that 95 to the 8th character permutations. 7.8% cannot form a password. If you try to crack those permutations, you're spinning your wheels again. of course upper case is identical in impact because the character sets are identical in count. Okay, this is where it gets really interesting is when you combine the constraints. Now you can see with an 8 character password, just the lower case and symbol constraint eliminate 1 out of 10 possible password permutations from this character permutation set. And it 16 characters, you broke the 1%

barrier. And when you get to lowercase, uppercase symbol, now you're eliminating almost one in five potential passwords. So one in five brute temps of force, brute temps, theoretically is going to be trying something that could never, ever work. Now you remember the countdown? 9, 8, 7, 6, 5, 4, 3, 2, 1 0.

10 numbers. Numbers are one of the worst things, or one of the bad things that you can do to a password. I shouldn't say worse because humans know how to do worse. But the moment that you say your password has to have a number, 40% of all character permutations for the given password, for the eight character password, and it goes down with length, are no longer possible. That means Only six out of ten Bruce Forrest attack attempts are actually trying something that could exist. And so we tell people you have to have uppercase, you have to have lowercase, you have to have symbols, you have to have numbers. Looking at that chart, do we really need to make it complex? Do we actually have to say you have

a number in it? I kind of question that. Okay.

After the zero came that set of symbols. And that was a constrained set of symbols. Have you ever had to enter a password at a website or an application where it says you have to have a symbol, but you can only choose from this small set? Well, if you limit it to eight symbols, what have you done? You've taken this 33 character set and reduced it to eight characters. So right off the bat, what you've done is taken away from $6,634

trillion and dropped it down to a measly $576

trillion, which is roughly the national debt.

So what happens? Now, four out of ten of those fully constrained passwords are viable as passwords. As a brute force attack, 6 out of 10 password permutations, character permutations you're attempting aren't even passwords. And so if you understand this and you're able to manipulate it, then your probability attack grows in the probability that it can succeed before Halley's Comet returns. All right, now what password talk would be complete without a nod to entropy? Before I sort of dive into this, I'd like to give Randy's highly technical definition of entropy. It is a mathematical equation that proves humans are not nearly as clever as we think we are, especially when it comes to password creation. Now, technically, entropy here is a measure of randomness.

It's a measure of the strength of our passwords, and it's measured in bits. You should keep center column there. Here we were using entropy, or actually let me explain the equation here. We take log base two of what the password is composed of or potentially composed of and all that to the power of the length of the password. So for this case I was looking at the entropy for an eight character password using the full ASCII set. So it was log base two of 95 to the eighth. And I was comparing it after taking that full number in taking the percentages we looked at, so a handful of those constraints, and applying entropy to what was left. And we really didn't see much of a change

in the entropy, you know, from a character kind of went to the entropy of a 7.8. So nothing huge and, you know, no big changes. And as we looked at longer passwords, we saw an even lessened impact. You know, really in what that said is length matters. Length is huge when developing a strong password. And to know entropy really doesn't look at the human factor. Humans are not able to make these maximum entry passwords and if we could, it'd probably be a lot better for everyone. Now, when comparing the maximum entropies, we decided that entropy is really more of an esoteric afterthought and that humans predictability and the way we make passwords is something to be contended with. And

this is just sort of reemphasizing that. Randy will sort of talk about this password dump that we used in the next few slides, but I'm going to talk about it a little bit here. So on the left here is the top most frequently used symbols in the password set. And the top four actually represent about 75% of the symbols used. All 10 of these represent about 95. You know, that's 10 symbols representing 95 out of this million password dump where there are actually 33 available symbols. And when you look at a probability attack that there are 23 symbols you could probably leave out and would likely get the symbol that was being used. On the right side is the frequency of symbols out

of all the characters in the password dump. And all total, these 10 symbols represent only about 2%, 3% of the characters used out of all the characters available. Now I know three slides on entropy in a row might decommunate consecutive slides might decrease the entry for our slide deck, but sort of really drive home the point that humans make low entry passwords, we looked at the composition of 8 to 11 character passwords. And if you can see here, about 44% are composed of just lower cases and numbers, and 31 here are composed of just lower case numbers. And the top four in this list out of 12 possible constraints and combination of those constraints represent about I believe 93%.

So, you know, we're really sticking to some very, you know, small constraint factors here and continue to make low entropy with that. Here in symbols, only symbol passwords represent a very small percent.

So when I was thinking about this, I had a hypothesis that the nature of password composition probably changes with length constraints. And I wondered, is there a likelihood that the first character is going to be different if it's a 12 character password than the first character of an 8 character password? I also wondered, could there be a useful correlation between the length of the password and the predictability of the last character. And then what about the correlation between first and last? I wondered about a whole bunch of other things too, but I had to finish the presentation. And so I found it really interesting. We got these passwords from a million password dump. I don't know if you've heard

of Mark Burnett. He's a noted password expert. He dumped actually 10 million

usernames and passwords and wrote an article about why the FBI should not arrest him but this was back in 2015 and I can only find a dump of a million of those but these million actually have some attribution we know that some of them came from LinkedIn and some came from Rock You and there are other companies that are that have attribution and so I was hoping that I would get a good set of constrained passwords and it wasn't exactly what we wanted but let's look at what we found. None of you have long symbol passwords. We found that nothing over 12 characters was symbol only and very few actually in the whole set, much less 8 to 11 had

just symbols. And that was not unexpected. However, an interesting thing as a probability attack defense, if your password is all symbols, it's probably not going to get attacked first. Someone's actually doing it smart. And there's another reason. Numbers only. no surprise, the shorter passwords are numbers. Only about the time you get past the telephone number, the numbers only passwords tend to die off. The numbers and symbols, that was the yes! The no part was, this is a really small sample set. There weren't a lot of these long ones in the million password dump. However, I found it really interesting and I wondered Okay, let's say that it isn't the problem with the statistic significance, such a small set. Let's say that there's a method to the

madness. And I started thinking about it and I realized as a strategy, this is genius. And the reason is because of the ASCII table. If you look at the ASCII table, there are about 15, 16 symbols that come first. And next you have numbers. And next you have symbols. And next you have uppercase letters followed by more symbols followed by lowercase numbers and symbols. The symbols in the ASCII table are distributed throughout the entire ASCII table. If you're going to do a brute force attack, you're not going to hit all symbols until late into the attack. And if you're doing a decrementing attack, you run into the same issue. If you split the numbers in half, run two computers doing half and half, they'll never get it. Which

it's like, huh, that's really smart. I'm going to have to start choosing symbols a little bit more carefully. But how do you remember a 16 character password that's numbers and symbols only? And it turns out

Well, no. You just have to remember that password for the password manager. The whole point of a password manager is you don't have to remember them. And actually this could be a great technique for the master password for a password manager. So consider this. Open bracket, open parent, 95 carat, which is the exponentiation symbol, 12,

close parent, minus, open parent, 95

carat 11, close parent, close bracket. 95 to the 12th minus 95 to the 11th. Sound a bit familiar to any of you? That's so easy to remember. Just come up with a formula that means something to you or you can easily memorize. You've got your symbols, the number set's easier to remember because it's only 10 versus 26 letters, so it reduces complexity for you to remember. Your ASCII characters are spread out. And it's just like, oh, wow, if someone's doing that on purpose, that is so clever. They're actually one of the ones that are the exception to the rule that proves that entropy is mathematical evidence that people aren't so clever. That one is. Looking at the rest of the results, I didn't

really see much that could be used very effectively for a probability attack. It was a little anomalous that the 12 to 15 character passwords tended to have lowercase and uppercase only. I'm a bit skeptical about how well that would hold up. I think the numbers and symbols, the percentage would go down but not as much as lowercase, uppercase. First character analysis was really fun and we were blown away. W came out on top. And of course you expect number one to come out on top, one, two, three, four, and so on. And this was across all million passwords. If you looked at the shorter ones, then you see that number one comes in number one. And how did W

come in number two? I mean, that was baffling to me. And another thing, if you look at these nine out of The 10 characters used in the short passwords were the same as in the entire million character dump in terms of top 10 frequency. But S is the only one that maintains the number four position in both of these sets. And then you have one character that's present in one set but not present in the other. Now these long ones, as I said, these are what interest me. 12 characters and longer, and this blew me away. Only the 12 character or longer passwords had a very high probability of capital letters as a first letter. And it kind of makes sense, because if you're going with these long

passwords, you might be using a passphrase. And as you saw, Piglet was one cute little dude, The first letter in Piglet was a capital P. It's how we form sentences. So that part may be not so surprising, but again, uppercase and lowercase, four out of five are the same for uppercase and lowercase. And if you look at for all passwords frequency, you got numbers in there and you got no uppercase. So potentially this can help in a probability attack against your longer passwords which are the ones that are a lot harder to brute force. So the first ASCII value in these longer passwords, I wonder how predictable is that? And it's pretty surprising that the first four of the top ten first characters

used, a count of 61% of those longer passwords in the password dump. And in fact the top ten count for 80%. And it's like, huh, that can probably be used to aid a probability attack. And if you look at the last ASCII value, the most common ones are numbers except the period snuck in there. And again, if you have a constrained password, you have to have a symbol in it, a period ends a sentence. And that's one way to comply with the policy. The first character, last character analysis, maybe it doesn't matter, but it interested me. And this is correlated. So if the password starts with a W, it's more likely to end with a 1 than

anything else. The S, remember S is maintained number 4 in both sides? The S had a unique characteristic that it was more likely to end in a letter than a number. Now, if I have insomnia for a couple years, then maybe I'll try to analyze those million passwords and figure out if there's a pattern to S passwords ending with a letter. So, just as Brianna and Chad thought they'd done all the analysis I asked for, I came across a cool 2.15 million password dump on GitHub. Now, this did not have attribution. I suspect a million of the 2.15 million are from the same 10 million password dump that I could only get a million of. So it lacks attribution which may affect

the composition. But we found W disappeared from the top 10. I mean, that was, I never expected to see it in the top 10 in the first place. So that was an interesting and expected actually change. Number one, anyone here surprised that number one took the number one position? Nope, nobody has. S moved from number four to number two, and that's not surprising because I believe S is the most common consonant in the English language. So it's not surprising that it moved up there. And A moved from 10 to 3, and that's not surprising either because vowels are such an important part of any word except for Klingon words. Now, the S factor mystery continued. You can see where M now is three out of ten

M words, words that start with M, are going to end with a letter and the rest are numbers. And the S factor has decreased a bit but still, four out of ten passwords starting with S end with a letter. And I found that quite interesting. And particularly, you see, A, of course, number one is the last one. But if it starts with an A, it's highly likely to end with an A. And why do you think that is? My hypothesis is because the password was ABBA. And in fact, ABBA did show up in the list. All right, now for the 95 to the 20th dollar question. Can the impossible passwords be filtered out of the possible passwords enough

to really matter? Does it even make a difference? Will this allow people to crack passwords faster? And from that, we had a few conclusions. None of this would actually really matter if you aren't able to filter out the valid possible passwords and to do so with you know, to flexibly prioritize the guesses and to, excuse me, not use excessive computational resources to do so. Otherwise, it doesn't really matter if you can't accommodate those, especially today, but tomorrow, as processing power increases, Moore's law, the impact may be significant.

We know and we understand that to provide more statistically significant information, we need more data, more password dumps that have the fully constrained passwords, maybe representative of corporate password dumps, that type of stuff. We understand that. And we would contend that our analysis does provide some mathematical support for NIST password policy guidelines, decrease the length, or excuse me, increase the length and decrease the complexity requirements, please. So through much of our analysis and discussions, our intern, Chad, actually had an interesting idea. May not be unique, but what if passwords constraints were generated, you know, were different on a per-user basis. So, you know, every person who logged in or created an account with Facebook or with Instagram, something like

that, received a different set of constraints. Maybe one person gets 12 character, another requires eight, you know, you vary those and that's a much different game than, you know, it's really difficult to guess if you have all these who would know, than guessing x to the n or knowing x to the n minus x to the n minus one. This also really helps, would help fight these constraint attacks. Ultimately phishing attacks are the great password equalizer. You know, we can all, any

whether it's 30 characters long or password one two three it wouldn't matter they could take the path of least resistance and simply ask politely for the password. Politely is relative but the question really is shouldn't we be continuing the talk on multi-factor authentication or maybe it's biometric authentication although DNA may not be a great option as Ancestry.com has already got those passwords and everyone paid to give it to them. So maybe we're on the wrong business there. Lastly, we would like to acknowledge some folks. Maurice Schmidler, who is our machine learning guru and who wrote our Monte Carlo simulation for us. Guy Cohen, who helped us to manipulate that. And Chad, again, for his awesomeness in helping with the

analysis and everything. this whole presentation because I just wanted to know how much. And it took a while to find out. And I still don't know how much. I know percentages. And so with that, we'll open the floor to questions and answers. And we have our email addresses there. And I'm at RandyAB on Twitter. I do not have Twitter. All right, any questions?

Hello, you alluded in the end to the constraints from, for instance, the company policies. I work very much with Active Directory and cracking Active Directory passwords. I'm very focused on knowing what the policy is so I can keep that in my record because you may have a constraint of eight characters only, but my password is definitely longer than eight. So even though if my password leaks, the constraint will not be representative in the dump. So can you elude a little more to the constraints compared to the actual data? I know that the data was kind of small, you alluded to that, Randy, but can you talk a little about that? Yes, actually, I mean, I talked to a cryptographer and said, can you please help me with the

math on this? And also a theoretical physicist who ultimately said, I'll pawn this off to a graduate student. And that was one of the things I wanted to know. what if the constraint is 8 and my password is 9 and ultimately Maurice said that this Monte Carlo simulation is going to get out of hand. Another of the constraints I wanted to know was what about if you can't have three identical characters in a row, three consecutive of the same character because interestingly that has a lot less effect on a short password than on a long password. So I wish I could answer your question. I had the exact same question. And if there's any mathematicians, cryptographers that want to take a stab at it, please let

me know. The human aspect actually is most people are probably going to use the shortest password that they can. I was really surprised when I worked at NSS, I worked with other security analysts and they were watching me type in my 20 plus character password and go, how long is your password? And they said, I only use the 12 characters. So I think. Thank you.

Do you have any sense for how dependent your analysis is on being English speakers that were doing the passwords? Well, we did kind of discuss the idea that many of these passwords could include and I think you'd actually be best to talk to this because you have passwords that include other languages. Yeah. And I mean, that's another one of the things that I've seen on discussions with people that know as little as I do about it, except I figured analysis out, they're asked, well, does it matter if I use a word in a foreign language? And the person will say, no, I've got foreign language words in my dictionary. Yeah, it makes a huge difference because you've had to expand the dictionary size.

And then the foreign words that you choose, how common is that language? Someone might have had to change their dictionary from 5,000 to 50,000

encompass those foreign words. And then, of course, with Chinese, Japanese, Korean, Hebrew, Arabic, Russian, you've got these different character sets. And one of the things I'm still working on putting together an educational presentation about passphrases is make it fun. Aside from silly stuff like piglet, learn a foreign word in an obscure language. Learn how to say dog in Swahili. Because now what you're doing is you're increasing the dictionary size and it's not a language that's going to be one of the first 20,000 dictionary words. So yeah, non-English makes a big difference in those character sets. Does it also impact the frequency analysis that you were using for setting up your probability tables? I don't know. I don't speak Chinese.

I imagine. The password dump that we got is only English, so it'd be a lot of fun to do analysis on foreign language password dumps, but I haven't found any. If you've got any, please share. And understanding how they, you know, their heuristics of their password making, and that's a whole other topic, I believe.

Okay, I don't see any other hands. Let's give a round of applause.