PW - Stronger Password-Based Encryption Using I/O Hardness - Greg Zaverucha

Name: PW - Stronger Password-Based Encryption Using I/O Hardness - Greg Zaverucha
Uploaded: 2016-12-14
Duration: 25 min 11 s
Description: PW - Stronger Password-Based Encryption Using I/O Hardness - Greg Zaverucha Passwords BSidesLV 2015 - Tuscany Hotel - August 05, 2015

BSides Las Vegas25:1139 viewsPublished 2016-12Watch on YouTube ↗

Mentioned in this talk

Tools used

SQL Cipher TrueCrypt

Concepts

AES PBKDF2

About this talk

PW - Stronger Password-Based Encryption Using I/O Hardness - Greg Zaverucha Passwords BSidesLV 2015 - Tuscany Hotel - August 05, 2015

Show transcript [en]

year before that we have one for uh uh that was basically about you know using randomized email addresses for every single account so I mean uh if any kind of site of service got hacked it would be very difficult if not impossible for them to actually identify your account in there because you know you look random and uh nice way to do it okay ready for yeah okay so we have with us Greg SAA from Microsoft research from Microsoft or Microsoft research uh Microsoft research yeah Microsoft resarch yeah don't you can tell resarch research over well I mean I do like research okay yeah good so I do some of both uh you know some research and some

software development too so okay so give it up okay thanks per um so I'll be talking about password based encryption today um with IO hardness um which is sort of pretty loose term could mean you know memory hardness um but we'll we'll see what I mean by that um as I go so why would you do this why why would you encrypt with a password um I'm not encouraging people to do this more than they already are you know um but good Key Management is is expensive um it's hard to do well you need a place to uh store your key and typically you want to back it up and um outside of Enterprises sometimes this isn't

Justified right um usually when you're encrypting with a password you're encrypting data to yourself so you're the sort of the sender and the the recipient um and it's it's overall it's pretty usable right most people can password protect uh a word file but if you ask them to encrypt something with pgp they they might not have such an easy time so yeah I'm not arguing we need to do this more often but if you're going to do it um I'll present an idea that might make it more secure so here are some uh use cases um so I mentioned office documents you Word documents um zip files pfx files can also be password protected password managers you know

encrypt a set of passwords with with another password um applications uh that secure data with like dpapi um or or SQL Cipher are using maybe indirectly but password-based encryption um the tar snap uh backup uh software has a a utility to en Crypt files with a password um using script and um the true Crypt uh dis encryption system uses password-based encryption as well so okay how how does this work um so I'll start with just a picture of you know normal encryption where um there's a plain text at the top we have an IV that's a random value uh which changes for each encryption operation those two together with keys are inputs to the encrypt function which

outputs the cipher text and and the IV so that's without a password um with a password so typically a password and ass salt go through a function called a key derivation function and that's where you get your keys uh and then everything else is is the same so there you have it um you know usually we also want to authenticate our Cipher text using a Mac so there's actually two keys um that are derived and we don't just encrypt we we authenticate as well and there's a tag output at the end and so some concrete examples here uh you might use pbkdf2 for your key derivation um this is an iterated hash function and uh as CBC and hmac are are

sort of examples of encryption and and authentication algorithms so the threat model for for password based encryption is different um from you know the having password hashes in in a in a password file so with a password file you know exposure should be rare let's hope um and it's it's a sort of a security event um whereas with password-based encryption the cipher text is like sent over email or put on a Shar somewhere so the hash um or the opportunity for a Brute Force attack is is there right away sort of by default um you're you're under attack um and changing the password is is ineffective you know once the cipher text is is out there and the attacker

sees it you can't um change your password um in order to uh protect the the plain text so um Jeff gave a nice talk about the the threat model for password based encryption uh last year at passwords so we'll assume that the attacker gets to see the the entire output right the salt the IV the cipher text and the tag everything that you would need for decryption the attacker gets to see this so to do a boot Force attack um so assume we just have the one Cipher text under attack um and and we'll also assume that the aaker knows some of the plain text so typically you know these files start with a standard header so

we'll just assume that that the the attacker either knows the header or can check that um when he does a decryption that the decrypted um value matches the format has the correct format so for each password guess he deres the candidate keys does decryption of the first part so say the first block of the cipher text and Compares that um to the the known plain text block and if they match then he'll continue decryption decrypt the whole file and maybe do some other Integrity checks so the cost for each guess is the work to derive the keys and decrypting one [Music] block okay so what if um the attacker didn't have this shortcut where he could

decrypt one block and compare it to something he knows so if we remove that if we remove that it would you can think of if we had encrypted random data you know the attacker would have to process all of it and then some you know check in another way whether this made sense this decryption made sense so there the cost would increase by um so I'm using S for the number of of of blocks but it would increase by by a factor of s um and if you know if it's a large file this this could be significant so as I said I called this IO cost so that maybe it's on a maybe the file's on a

dis if it's really large or maybe it fits into memory and then it's like a memory cost um but the important thing is that the defender is already doing this work right um when you have a memory hard kdf this is sort of purely overhead it's something that um you know the attacker or sorry to the defender it's purely overhead it's something he wouldn't normally do okay so how can we do this how can we force the attacker to process all of the the cipher Texs um and the the tool is called an All or Nothing transform um and so basically we're going to encode our plain Tex before we encrypt it so the encoding is is

randomized um so here we have an encode in a decode function there's no secrets here there's no secret key but the security property we get is that um if you're given part of an encoded plain text you can't decode it until you have all of it so if you only have the first block um that doesn't help you you know if you're only missing a few bits you might be able to Roo for uh them but you know if you're missing um say more than 128 bits you won't be able to to do do the decode operation so why would somebody invent such a thing um this is an old idea there's a good crypto uh paper from 1997

by revest and his goal was to improve the strength of encryption using export Keys which is sort of similar to password-based encryption you know it's something that you don't want to do but maybe you know it's it's your best option or your only option um and Export keys and passwords both you know they're both weak they both have have low entropy so later it was shown that oap which is a a padding mode for RSA encryption is an All or Nothing transform um and revest transform is a special case of of oap so I just chose to look at oap um in this work so just to review how how it works um our plain text is input um we need

two functions G and H G takes a random seed um it's you know a short seed and expands it into something like a key stream so you can think of G being being a s in counter mode and then you exor this with the plain text to encode it so the encoded plain text is y and then you need to somehow hide R right so that um y can't be decoded without it so you hash Y and exort it with r and then append that to the n and then so you can sort of convince yourself that um that this works so to decode um hash all of Y exort it with y r to recover R and then um recompute G

and recover the pl uh recover the plane text okay so we want to combine these these uh this encryption with this encoding um so if you start looking at this there's a a whole bunch show different ways to do it I I chose three and and looked at them in detail and so I'll just kind of informally present them here and um you know I wrote this up in a technical report with you know all the details um so I'll Point people to that if if they want more detail um so the first scheme uh is I'm calling it Mac than encrypt because that's what you do you you Mac the plain text then apply the

encoding um and then encrypt that the second one is encrypt in Mac here we uh encode then encrypt and then compute the mac and the third scheme um it's basic it's the same as the second scheme but we um make a special choice for E which lets us simplify oap so uh I won I won't present the scheme at all here um but they're they're all in the report so there's you know some detailed pseudo code uh which I've also implemented to check that you know that it's correct um so I'll have the URL for that report uh at the end of my presentation okay so um comparing these schemes uh as GCM is what I had in mind

as a kind of Baseline for authenticated encryption uh like GCM all three of the new schemes use a single pass for for encryption so they can you know work on a stream of data which is nice for decryption you have to do two passes um and and I don't see a way to avoid this because if single pass decryption is possible then after a constant number of operations you can get that first plain text block and check whether it it sort of makes sense um but you know often we do two passes anyways because we want to first check the Mac before doing any decryption operations there's a nice blog post by Moxy Marlin Spike where he calls this

the cryptographic Doom principles so he he shows some pitfalls of how security fails when you decrypt before authenticating uh and then yeah another note is that these things are more complicated to implement um um you have to implement the All or Nothing transform so you have to implement oap which may require you know New Primitives that you weren't already using and to get good performance you have to inter leave encryption with um the encoding so this um to do this on a stream of data you have to uh really interl these things carefully so now look at some of the costs of the attacks and and and the costs to the defender right because we're we're doing more work um this

encoding step isn't free uh and slows down things by you roughly a factor of two so what do we get for this what does this buyas um in terms of uh resistance against Brute Force attacks so it's kind of easy to do some estimates by by just counting the CPU costs you know the number of blocks for operations uh so that's what I did to get a sense of um of the costs so I'm still using S as the number of of blocks and um so the attacker has to make one pass over the cipher text in in schemes two and three and two passes in scheme one before he can check whether a candidate key is

correct um so I I compared the ratio of of Defender and attacker work so um for every you know attacker operation how many operations does the the defender have to do so here's a table um from my right up with some of the numbers so you can see that the the first scheme has the best ratio so two meaning um two Defender operations for one attacker operation um and then it goes up to four in scheme two and then in scheme three which has this optimization I mentioned it goes back down to to three um and they're all constant you know whereas in AES the attacker is doing one operation where the defender does s so

or sorry in in as GCM that's the case so the previous table kind of gives you a feel of how the schemes compare to themselves but um um to look at some more concrete numbers that that'll give us um a sense of how much extra work the attacker will have to do and this varies a lot by the file size right that that parameter s is really important and also the the work factor that you're using in your kdf if that work factor is really large and the file size is really small then it's effectively um The Brute Force cost is dominated by the by the kdf and doesn't really matter what you do on the

encryption step so again I'll make some some rough comparisons here by counting the number of of um block Cipher and hash operations so here's a couple different file sizes so 100K a Meg 10 Megs 100 Megs so the number of uh as decryption operations is there in the second column so that scales as you would expect then I fixed the the number of pbkdf2 iterations to 100,000 so that's about 200,000 uh hash operations um so then the attacker you know without an All or Nothing transform just has to decrypt one block and do the the pbkdf2 iterations and then in the next column you know it's the sum of the first two columns so this is what the attacker has

to do and so you you can see that it when you get to about a 10m file it's about four times more work and so that's probably where it it starts to make sense with with these parameters and of course this is you know it's imprecise so it's just just kind of back of the envelope if the kdf if your work Factor changes so um what does that mean um so if we look at 50k iterations for pbkdf2 that's that's on the bottom of the table um then it's about seven times harder with a 10m file so you get more more benefit

um okay so going back to my examples uh so SQL Cipher which um encrypted database that I learned about last year at pass words it encrypts each page of the database separately and each page is pretty small it's 64 as blocks so that extra decryption work that you're forcing the the attacker to do is probably insignificant relative to the you know the kdf cost for other you know for other file types like office documents and archive files where they they're usually larger this probably makes sense and uh would improve security but you do have to be careful because you know if you're encrypting multiple files with um the same password the attacker can pick the

the smallest one to to do his root Force attack with okay so uh that's all I have um so just to conclude you know password based encryption is still pretty weak don't don't run out and start using it unless you have to um and there are some scenarios where the the third scheme I presented seems to to give an improvement but overall the idea needs more investigation and um especially from the attacker perspective so I did some rough estimates and counted the number of of crypto operations but uh would be nice to hear from an an attacker who's familiar with sort of using gpus as to whether this is you know really going to make their life

hard or you know whether there's an easy way that they can can deal with um this extra work so the tech report is uh that's the URL um there's more details there you can email me or or catch me later if you have questions um that's

it and questions oh it's Jeff wow yeah haven't had a question before um could you go back to the slide with your file size table um I was just a little confused about the ratio column uh what's that the ratio between um that's the total attacker with All or Nothing transform divided by without okay because um I think what might also be helpful or useful is to have the ratio between the attacker and the defender yeah I tried to capture that um over in this table with this ratio so I maybe I should have called them different things because two different ratios but um okay thanks I I I Now understand the T yeah that makes sense

PW - Stronger Password-Based Encryption Using I/O Hardness - Greg Zaverucha

Related talks