← All talks

PW - Blind Hashing - Jeremy Spilman

BSides Las Vegas30:1320 viewsPublished 2016-12Watch on YouTube ↗
About this talk
PW - Blind Hashing - Jeremy Spilman Passwords BSidesLV 2015 - Tuscany Hotel - August 05, 2015
Show transcript [en]

recording y uh and as I told as I told you he was at the University in OS pass December 2012 and what fascinated me I mean he did he told us a little bit of the concept about this in a like in evening talk in Oso and I was fascinated because it seemed like a crazy idea but at the same time there were really good people in the audience that thinking like how the hell are we supposed to be able to attack this and with that J thanks um you know one of my big draws for me coming to password 12 and now 15 is the people in the room are the people who are probably best asked to evaluate

what I'm talking about and give me feedback and um besides is definitely promoting trying to make this more of a discussion and so uh as we go through the slides feel free to ask questions as we go and pipe up tell me what you think or if if I didn't explain something right I'll go back and I really want to communicate sort of how this works give you a really intuition about why it should work and then and then at the end we can have a little bit of discussion if we think this might actually change the environment at all the threat environment for passwords um how many just to get a feel for the

room how many people have heard of blind hasing before know anything about it so a few people but a lot of new inductees so that's good um I mean one thing which is blit blatantly clear is how hostile this environment is and has become and continues to evolve into um I love this quote from James KY of two kinds of companies in the US those who been hacked by the Chinese this is the original quote but those who have been hacked and those who just don't know they've been hacked um and and really you know we look at um a timeline of the big hacks and it's like uh it's a who's who it's almost like if you haven't been

hacked you haven't you're not there yet um and this only goes to 2014 uh so the environment that we're in is almost like being hacked is the expectation um and so the assumptions around how to protect the password file and what happens when that file leaks and what should users do about it and what do we try to push users into doing to to deal with that reality has really sort of taken over the conversation around passwords um but even the color I think it's just year um it says yeah bubble color is year and and then the the the radius is the size of the attack um the other option was meth method of

but this is uh world's biggest data breaches if you Google that you'll get s a link uh really neat visualization um I think that uh you know despite how rampant password that is I still have this uh I'm still incredibly incredibly uh infatuated with password I think that as a way to authenticate as a way to encrypt as a way to secure you know in the environment that we're in where pretty much all of our data is being set up by Third parties and commercialized and used against us in many different ways passwords are the one thing that you can keep your head you can share it with the breath they really have these unique advantages and if we can make

password secure there's all these really amazing benefits to society I think that can come out of having an ability to have a secure password that can't be cracked particularly for uh authentication and encryption um and so this is why you know I've spent the last few years trying to work on this idea of how can we uh really change the the model of securing a password um the cost to companies at this point um has become tremendous um you know if if you're a ciso your job is on the line when that breach occurs or even the CEO at this point we've seen fire you know like in the Target breach the the the actual

fall when breaches happen they have a really material impact on companies and the people that I talked to over the last couple years um they are very afraid of breaches um and that fear is driving investment and that fear is driving like a willingness to to try new things um and so there's definitely an opportunity out there um if there are new technologies new approaches um to to bring those to Market um I think there's there's definitely a recognition that the industry best practices don't really protect uh users in the way that we want them to um if you look at attack vectors uh we oftentimes separate out into online and offline there's sort of a thing in the

middle here with you know interception because of course the first first thing we do with passwords is we send in clear text from our browser to the server and hopefully through a protected Channel and so we're relying entirely on CLS but there's so much that relies on that TLS that you can sort of take that out of the discussion for a minute because if TLS Falls it's not just your password that you're worried about it it's really the world's on fire so if you take out the interception we have online attack and offline attacks and uh there's a paper um that Microsoft research put out and I think for talked about it earlier uh yesterday uh about there's a huge

dichon here between um the type of attack that can happen online the type of attack that can happen offline and that they call that the online offline Castle um and there's a little bit of an eye chart but you know if we're looking at on the y- axis the risk of a pastr gas and the blue line is an online attack and the orange line is an offline attack um on the on the x-axis it's log 10 guesses guessability of the password right and that's also a big topic and and so the whole driving force is we push users to move their password to the right on that curve um but what are we actually achieving when we do that so in

an online attack um if if you've taken even basic precautions and we've seen very noteworthy news you know where you know like for example Apple had an API that wasn't rate limited and people were able to exploit that to great effect um but if you can do some basic things on the server that are under your control you can limit the online attack the ability for an attacker to try to guess a password through your actual login interface right um to a fairly small number of attempts uh but in the offline sense once that hash is out of your control or if it's password based encryption once that Cipher text is in the wild there's nothing stopping

somebody except for their time and money and ability to uh run that hash function uh to try to crack that file and so there's you know you can you can reasonable people can disagree about how far apart those curves are but you're looking at six 8 10 maybe 12 orders of magnitude in the guessability complexity requirements of a password to survive an offline attack ver online and so what what really comes out of this is anytime the password database is lost regardless of what kind of hatching is used you just assume that all the passwords have been stolen uh and this puts companies in a terrible position because fundamentally the perimeter is permeable you cannot build a perfect

perimeter that you can you know you can beat some of the people some of the time you can't beat all the attackers all the time zero days happen um you know exploits happen and so the question is how can we make it more resilient how can we widen the attack window uh add a better opportunity to to defend when those windows open up so you know the limit of iterative hashing and iterative hashing could be you know pbkdf2 scpt bcrypt or even you know whatever that hashing is doing whether it's you know ultimately we're trying to use we're trying to take the CPU and the io um uh the compute resources at your disposal and use them to the maximum

capability AB ility so that ideally the attacker has to work approximately as hard as you worked to run the same hash you know an ideal hashing function doesn't give the attacker an advantage by switching uh for example over to an Asic or an fbj or a GPU they can't they can't move to a different architecture and and beat you out on a dollars per hash uh or Jewels per hash basis right so assuming we have this ideal hash ultimately what's the real limiting factor well how long is the user going to wait to get that login to happen um we have this like this hard limit and you know Google might measure it in milliseconds where between hitting enter

on the login form and when you need to get that response back and start loading the next page um and so anytime we're trying to protect something in a run you know I call it runtime which is you know while the users logging in um you have this thing where you have this competing forces security increases with latency but latency must be small uh and so it puts an upper limit on how much cost can you put into the equip while the users sitting there waiting um and you have this arm race where ultimately over time mors law will happen and these hashes are easier and easier to crack um and so we're left with the only option left is we let's

push the complexity of the end user because ultimately have it's the multiplication how hard was it to do one hash multiply by the guessability of the password so if we can move the guessability over then maybe we'll give ourselves enough time to at least reset the password before the passwords get cracked um and so now we see this huge push toward complex password polic policies unique passwords on every site all of this comes out of the fact that we can't protect the password to rest um and and and ultimately the company's in this position where they have to just assume All Is Lost if that file is lost so let's just talk about the goals

of a of a a different kind of hat hatching function what would be a different way a complimentary way and the the idea here is something that works on top of what we're already doing we're not trying to replace hashing iterative hashing we're trying to augment it with something which is achieving a different kind uh cost Factor so we want to decouple it ideally from the entropy of the password so right now with runtime itative hashing it's very much entangled with the cost of the attacker depends entirely on uh how complex the password is multiplied by how much work you did at that in that very limited runtime window um so to securely store weak password we need to

get something where we can add cost independent of runtime um ideally we'd love to be able to say that an online attack is impossible even if that database is popped even if there was a injection or whatever that lets people get access to the salts and hashes and and not like just impossible if your password's 22 characters but um I know that no passwords are going to be cracked even though this breach happened um we'd like to be able to prevent somebody from saying well I'm not going to steal your whole data B I just want this 64 bytes right here and then I'm gonna go attack you know Donald Trump and steal his money or whatever right

because I just have to crack his password was prob pretty simple um or whatever right so uh you don't want them to be able to go in and buy bypass the cost that you know you're putting all this cost into every user's hash every time they log in and yet they can go to the weakest link and empty that person's account um one absolute requirement is you can't you know there was a trend a few years ago where people sort of equated password hashing with crypto and they said oh don't roll your own crypto well don't roll your own password authentication leave it to the experts hand it over to Facebook or Google um and we've seen a big push back against

that and I think very rightly so because um when you hand over essentially the keys to your Castle you're handing over some very very valuable um you know user data you're you're sort of subjecting the the user to a privacy breach but you're also handing over sort of you never want to be in a position where you say oh is this user um authenticated and some some third party says yes they are like well how prove it you know you don't want to trust somebody else to say this person should be let in was was that the administrator oh yeah it was the administrator what the in no that that's not okay so you need if you're

going to use a third party it has to be sense an untrusted third party to the degree that they can't make an invalid login look out um obviously this has and you know we've had years of work just to get people to do hashing and salting uh and it's still an ongoing uphill battle uh and so anything new or on top of this is going to you know be an incredibly difficult sell um on on the flip side of that um which really interesting is in the last year we've seen data privacy become a competitive differentiator uh for companies so companies are willing to take the extra step invest the money to get that if they can see if they can

see that it's real if they can see that the technology provides like a real benefit they'll invest the money into it and they'll say okay this is a competitive Advantage for us to protect our user um so the willingness with people to adopt is there um and I think that you know uh from a performance standpoint no one's going to no one's going to be willing to move the budge that window of you know Google wants that response in 20 milliseconds or whatever there's a very limited budget to um to work within from a Time perspective latency wise and I think you know U to cap it all off the end user experience is so important I mean the

the history of sort of password security is all about blaming the user um and I Blame You know I would like to have a technology where you can blame the service provider entirely and completely and say if you lost that password it's never the user's fault um it's because you didn't do the service provider didn't do what they were supposed to do um if you can put the onus on them entirely and give them a tool that lets them protect that uh those passwords regardless of how weak comple they are from offline attack specifically then what you know how does that change the world we live in um you know the side benefit is if you have a

way to protect user password you can spend a lot less money trying to get people to pick these complex passwords your support test costs go down dramatically because you don't have passwords that people can't remember the usability of passwords overall goes up I mean there's this huge uh positive feedback here if weaker passwords can actually be secure against offline attack um and of course you have to respect the Privacy if they're logging into site a site a is the only person who should see their personal information um so let's get to the point um what is the blind hashing data for um literally what we're looking at here is uh kind of a Brute Force way to add cost

into the system and so we're going to do everything else the same as we would up until we get to the middle of this diagram so user submit pred to your server this is just standard web form standard username and password coming over a TLS Channel get to the server uh the server is going to run the same hashing functions they would ordinarily scpt bcpt choose whatever hashing function you want the key is it has to be assulted hash and assault should be uh coply secure random um once you produce that hash and we're going to I'll show a protocol diagram in a minute which really steps through it but once you produce that hash you're going to

send it to the BL what I call the blind hashing data pool and what the data pool will do is it'll use the hash as an index into a massive pool completely random data so literally this this grid you see in the middle imagine terabytes or potentially pedabytes of random generated data the data never changes the password isn't being stored in this block the block is static uh it can grow over time but maybe in a pend only we'll talk about that in a minute but the key is imagine you know you take you take a c secure Rand number generator and you fill terabytes of solate drives with random data um the hash point basically

you use the hatch as a way to point decide which locations in the in the block to read from uh and you collect those in at the bottom you hatch that and you return it back and the server can use that response so it's 6 you can imagine say for example 64 bytes in 64 bytes out that's what's actually happening at 30,000 ft that respond say a second solt another round of hashing and then you save the value so to look at pseudo code um username password you know the site has a user table which has a salt one and a hash two okay and these these values are not you can't get from a

point A to point B like you normally would if it was a salt in hash um when you do the first hashing function you get an H1 you send that over in a function I'm calling get solv to the data pool the data pool um uses it as a way to extract to decide which locations in the file to read from um collects it in the at the bottom hashes that sends it back to you um there's an ID if you can see it um which would uniquely identify either um the site the app or the user depending on the use case um and what that lets you do is that's just a token another 64 byte random that lets

you do things like say well for this ID um I'm going to have a private key that that's the K ID on the right side point for this ID I'm going to have a private key that that basically Keys the data pool for that site so even though I only have one set of solid state drives with terabytes of data in it um when when a site when a certain site is using it to do a blind hatch that data looks unique to them um if we throw away that key that that process can never be completed again because that data is no longer you can no longer get the salt to from the hash one if we throw away that

key so this gives you a means for revocability talk about that in a second if I have time um so one of the key aspects here is what does this actually mean to an attacker so if we have this array of data and we um get a lot of sites to come together and contribute to maintaining a very large array of data um we want to make sure that the attacker has to steal virtually all of it in order to craft even a single password right they can steal a little bit of the data and crack a little bit of password we're right back to where we started which is I I can do targeted

attacks but it turns out if you do these most every iteration that you use the hash to find you use the hash to generate an index to look into the data pool every time you iterate it and you you're going to be looking at a uniformly distributed random location in the data pool and so each one of those lookups is going to if there's data missing that the haer wasn't able to steal the likelihood that you're going to point to a location that they don't have increases and so you can imagine if they have 50% of the data pool and you make one lookup well there's a 5050 chance right that they're going to have

the data if you make two lookups well now you have to the coin twice and t a 25% chance so it's an exponentially decreasing attack rate advantage is the way you know technically describe it and you you you know just by sort of eyeballing it you can say well somewhere in the range of 32 or 64 lookups we get into this location where the attacker analysis they virtually all the data pull before they can crack even one in a million or one in a 100 million passwords um because they want have the data to complete the function and so now we've changed the cost Factor right so now we live in a world where I can

invest in arbitary amount of money right I can take $10 million I can take $100 million and build a data pool that the teer has to own all of virtually all of in order to crack a single password even if the password password and that is a very different environment because now if we can if we can get sites to band together and build this thing there's a huge Network effect right instead of every site spending $1,000 dollar trying to protect the password uh a million sites can spend you know we can we can collectively spend some number of millions of dollars build this service and uh I think make offline attack is you know uh

basically history um there a couple details um to add additional functionality so one of the things is who would ever adopt this service if it meant that if the service went away that they wouldn't be able to Au indicate their users it's the first objection I would hear going into a company I can't risk adopting this uh because I don't know you're going to be around forever and I don't want to reset all my users passwords because the data pool gets corrupted um so you you can you can basically escrow your hash one keep a copy of your H1 encrypted with a public key that's online uh it's important you know we heard about a um we heard about

like RSA pad a couple talks and Grace talk but it's important that it's not a deterministic encryption right you you you don't want it to just essentially be keyst stretching um so uh you do like an RSA ped random padded RSA and you can keep the you can keep the hash one around you keep the public key offline um like on a piece of paper in a vault somewhere and if you ever need to get your H1 back you bring you bring that private key online you you can re you can recover your H1 and so this this enables both the ability to peel back the blind hash which gives you sort of you know lockin protection because who

wants to get locked into the service but also it's it's the ability you once you can re once you can recover H1 you can now upgrade your hash and reapply that um and so you can build a system between between the the the public key that the site stores online and the private key data pool Keys you can build this lock step system where either side gets breached you can recover from the breach recycle the key and it's like it never happened so you can get back you can actually get back you can rewind the clock you can get back to a state where you know without even resetting your user's passwords that the passwords are

safy um some of the interesting questions that come up is for example well okay this data pool um which we built built a 16 terab data pool that's actually running you can use the API and actually run this passing function uh we've got a bunch of sites that are running it live in production today um but obviously you know a certain size is never is not going to be good for forever and you know every 3 to five years you're going to be doubling or maybe quadrupling the size of the data pole so you need a way to grow it over time uh there's a couple different you know two different ways that sort of come off the top of your

head is one is you could just add to the end of it generate new random data stick it on the end and then you just need to know when the hash was calculated how big was the data pool then and then how big is it now and you can easily get back to the hash that you wanted um the other idea is maybe uh regenerating the pool and having like generations of blocks and but that would require people to skip ahead so that it requires more interaction with the site um or else they might like you perceivably be left behind with hashes that that are pointing to data P that doesn't exist anymore um and one last thing I want to

talk about is um you know we've heard a lot about uh using uh passwords for encryption and there was a great talk last year de gave about uh he talked about the the cable gate the wi cable was an awesome story in 5 seconds basically idea was they they they put the cipher text on bits horen uh in like 2010 or something and then 2011 a book came out that had the password for it and there was this big effort to sanitize the cables and not leak you know sensitive information but in the end it was useless because once the password leaked people had Cy they were able to encrypt it and so there's this

notion that once The Cypher text is out there it's gone forever and and that password is now at risk right I mean uh and and the other interesting thing is u i I've covered them up in case people want to guess because it's kind of a fun game so new PG if you're using a password based encryption you're using G PG with the defaults of the command line what do you think the kdf is how many rounds and what hashing function does anybody know Jeff know I know you want people to guess if you want to guess you know how many rounds let's say it's pbkdf2 how many rounds of uh of uh iterations of so so G

PG by default is is two of the 16th iterations of sha one okay so this is like you know stage Rising hand and maybe they changed this since I Googled

[Music] it and then one interation of sh one oh okay so you know this is like easy to crack type of Realm right let's just do a couple more real quick because it's so SSH key genen right SSH key gen this is your this is the keys to your kingdom get you into the server right up until 2013 end of 2013 what was the kdf one round md5 we have a winner one iteration of md5 to protect your SSH kit up until the through the end of 2013 and even now the default they changed it but only for if you're using um um uh elal curve how about yeah I I can't the number is

blanking but you know it's a how about true CP which is now it's veracrypt and they improved it in veracrypt what was true Crypt you know the gold standard of keeping my of an encrypted file system if anyone did cracking you can they would know right how many iterations of a kdf was it using to try to protect your password 2000 iterations of mp60 and there was another there was a user interface choice and one of them would only give you a th000 ver ecrypt FS is something you bunch of users for encrypted partitions 65 this one's not too terrible 2 to the 16 at least a shot 512 so I mean the moral of the story is when

you encrypt with even the best practice tool you have to be extremely careful of okay what's happening with my password to turn it into a key and how crackable is this how much interview do I need the password and and obviously once it once once the cyppher text is out there like we heard from Greg it has everything you need to crack it so you can use blind hashing as a kdf um if you if you run your hash your kdf if you add a blind hashing step into the process of deriving the key you've now turned it into an online process so now decryption of say my hard drive although it will require an internet connection

uh is revocable I can go to the bline aing servers and say destroy the private key that corresponds to that app ID um or I can say when you try to decrypt my password when you try to complete the kdf step to generate my key to decrypt my laptop I mean send me a text message and give me a two-factor authentication so you can start doing all these really interesting things with encryption that's still fully within the control of the end user um and so there's a whole unex mostly unexplored territory of uh like an online KDs a network KDs that give the user more control um so I just want to not advancing Advance there we

go so you know going back to that original diagram imagine what a world would look like without offline attacks that would be boring I mean if we can truly add a cost level that puts an offline attack out of the reach of the of the criminal elements that are using it to ultimately make money right that's the reason people do it they do it to make money off the password if cost more to corrected password than you can make well a lot of people do pain but uh you know you can eliminate the the financial mod it um and potentially limit even the feasibility of it if you can if you can get your head around how big of a data

pull that might take what does the world look like where we only have to worry about online attack guessing it's an interesting world so um that's F ping I'd love to hear your question [Applause] uh where do you store this 16 tab of data and how do you plan to access it and how do you scale it it's a great question so at home is it yeah so you know right now we have data center in San Jose it's collocated equipment and data center in Dallas and we have two copies of the data and of course you can't just have 16 one tab drives you need redundancy within a rack you need redundancy across rack um there's a lot

of considerations as far as you know you need to securely generate the data you need to make sure that you can verify the Integrity of the data you want you want to know at the application layer that the data has you know that some sort of bit error didn't urve because obviously the blind hasing service would have no idea if the random data changed so you need check sus that go all the way up to the application Level um but the really interesting property is the bigger you make the data pool the more solid safe drives you add the more iOS per second the system can sustain and so you get this you know whereas in the

previous World um security sort of increases as performance decreases now we have a situation where security increases and performance increase um in concert so it's it's it's a it's a really interesting uh environment so like a 16 terabyte uh 24 dis array can usually do 100,000 blind hashes per second at you know 5 milliseconds for blind hash with 64 lookups for each blind hash so you know you more toly it out this drive this this technology wouldn't really be possible with with spinning rust but with solid state um it just really it it opened up the possibility to use not IO but actual disc you know actual bits as a cost Factor um very cool quick one one so I I

I feel like there's maybe something I'm not getting here but I'm aside from the the size and the cost of of these uh uh terabyte arrays it it kind of has this feel sort of like you're using them like a one-time pad except they're not using them one time and so to me it just seems like you take take something like an HSM and use it to generate the uh to to use the you know take the the the raw salt process it through an HSM and come up with a secret salt that you use is that um it effectively is an HSM um it acts the same and the question is what's your you know what's your trust level in that

hardware and uh what's your confidence that it can't be stolen so I think that using data add a dimension of um you know like for example our say secure ID was was an and they lost the key and all those devices were effectively compromised and so you know there's something about having a heft of pedabytes of data I think that changed the equation of you know this isn't going to be a tack over the network uh if it takes for example you know 64 bytes in 64 bytes out to do one blind hatch you can look at the network and you can say well I want to support 10,000 blind hatches a second it only

works out to X number bits bytes per second that I would ever expect to see over the line uh if I If I multiply that by the divide that by the size of the data pool you say oh well at Full Line rate it's going to take them 800 days to download this file of the network it just changes the attack window so if there's a vulnerability in HSM somebody can come in steal the key and go right if there's a if there's a vulnerability and there will be uh to the data pool they can maybe start SI siphoning it out but the simple heft of the data you know will give you time to react and if they

can't get all the data they've achieved nothing thanks very much guys [Applause] kach is that you Jeremy Jeremy G want to talk to you I was quoting your I don't know if you saw me but I stole some of your graphic

[ feedback ]