← All talks

CG - SHA-1 Backdooring and Exploitation - J.P. Aumasson

BSides Las Vegas29:0269 viewsPublished 2016-12Watch on YouTube ↗
About this talk
CG - SHA-1 Backdooring and Exploitation - J.P. Aumasson Common Ground BSidesLV 2014 - Tuscany Hotel - August 05, 2014
Show transcript [en]

all right welcome to bsides uh today's talk sha one back dooring and exploitation by Jean FIP Alan all right thank you for intruction okay good morning um so um I announced this St a couple of weeks ago I said uh this should be exclusive for besides K got rejected from black and Defcon um the tell me was kind of too technical so I don't know what it means but well here I am and yesterday night I fight from uh from Switzerland uh was really tired but I couldn't sleep at 4:00 a.m. I was still you know Eyes Wide Open um so I publish something on Twitter and I created a website for this project and I publish

all the details the papers and even do slides so I don't know how many of you have already seen the slides and the details of this work no one okay really good yeah okay so I'm ging this talk but I'm not the only contributor actually I'm not the main contributor of this talk um the my friends from grat from um this University in Austria Maria FL and Martin they did most of the um most of the work let's say all the Crypton work from D grat anel bertini which is the guy who did this fancy logo he worked on the uh binary binary K Fu all this binary stuff and I did what I call the

theory so the initial IDs and the marketing things okay so I have three main part in this talk the first one so what what is har function back door it's about back doors you know what a crypto back door is what a cipher back door is okay you can sabotage Cipher but it's not really straightforward to Define what is a har function by door so I briefly talk about this then I explain how we back door chaan and then how we exploited this back door um if you want to leave the room now okay this is what we find two images with the same hash okay I'll give the the details so who is interested in crypto back doors um I

hope all of you uh you might know about um an organiz an organization called NSA National Security Agency uh they've been in the news lately um and has this kind of interesting document they created a cipher called Duc I won give the details but it's something based on lipti curves and some people argue that it's been bued door by Ana it's complicated story but well they've been interesting back doors since at least 1993 with this Clipper chip um some people will say it's not back door it's a bit different key scoll but you know the end of the day it's the same the got your keys um did crypto researchers pay attention to um to

crypto back doors and to Hash function back doors uh so I know that the cryptographic community is doing lots of paper lots of research in very sophisticated things lot of mathematics uh but not much about um I say offensive crypto not much about back doors uh it's not completely true there's two guys mod Adam Yung who did this scrp toy project couple of years ago it was really interesting um really good ideas um they had some blackbox ciphers some back door ciphers but it's not the IDE back door you can imagine because it's a blackbox you don't see the specs and tell you there's a the cipher Tex is leaking key bits but if you see the internals if you

see how the algorithm is working you will see the backr which will be in front of your eyes okay there's been much more work about uh how back doors so in integr how put how to put a back door they call it Trojan so Works in how to insert a back door and how to detect the back door on the harh function side I've been doing this uh paper in 2011 I was really excited yeah let's do Har function back doors had some crazy ideas but I really failed to make something interesting make something exploitable so it was presented a workshop but not really um not really what I imagined so what's a crypto back door it's not uh

first of all it's not an implementation back door cryp what I mean by crypto back door is something in the in the P code in the algorithm in the MTH not in the code all right you might know this example of RSA so I don't know if you see the bug door here um of RC rc4 sorry rc4 rc4 is pretty simple it's just swapping bites and that's interesting back door here I don't know if if you see I don't know maybe some of you know know the trick okay I highlighted some lines in right that you easily find the the thing so here um we get a bite we get another bite and we swap uh two elements of the

array a and these guys they are pretty clever they don't use the simple squap there use the ex ex swap so that you don't have to use the B you you ex x with Y and Y with x and x with Y and you per the two values okay but the problem here is that if you swap a i and AJ using this trick then if you have I equal J if you have the same index you will set your state to zero so instead of having random values in your internal State you will have only zeros and as you know the all zero generator is not a good to the random generator okay um what a bu door is not

a trap door the trap door is over you know that there's something you know that there's a secret that if you know you can invert a function a bug door is covered it's not supposed to be there all right so I say the RSA function has St door which is the private key NSA has back doors and RSA has NSA back doors as well and you have a function called vssh very smooth hash which is a TR hash function it's interesting because it's based on the RSA function on big numbers arithmetic and if you know the secret if you know the private key if you know the factorization of the big number then you can find collisions efficiently and what

demonstrat that if you don't know the secret you cannot find Collision unless you break RSA all right now what about back doors um so this informal definition it's some secret property that will allow you to efficiently break the hash so what does it mean to break a hash function um so it can be about collisions so I think two messages that give the same hash prag is inverting to Hash function or something less powerful maybe finding a distinguisher I know and now how do you modelize how do you formalize back do because in crypto to start studying things we need to have strong definitions we need to have useful definitions otherwise we can't make anything

useful and once you construct this sabotage hash function how do you exploit it so maybe you you will create a hash function for which you know one collision and nobody else will be able to find this Collision but you just know one stupid Collision not you cannot Generate random collisions but you can imagine more powerful back doors where you can generate as many collisions as you want or you can gener as many Prim images you can invert the function on any image so you have different levels of har function back doors and when we try to formalize this um as with in crypto in terms of attackers and Defenders you set that we sort of

inverse roles here so if the bad guy which is usually the bad guy she's now the designer she creates the crypto she defines the ciphers and she wants to defend her back door whereas the good guys what we call Alis and Bob usually they are the users and we want to fool them we want to convince them that there is no back door that the function is safe and in case they suspect the back door we don't want them to recover the back door and to exploit to exploit it okay so I w't go into all the details you can look this up in the paper but just very briefly we Define a m such

function as a pair of two algorithms the first one is generate so we take some randomness some seed some whatever and you generate a has function so some piece of code or sud code and a back door so the back door can be a number can be a string can be whatever and then we have the exploit function which takes this function H the back door and optionally a challenge so the challenge might be an image if you want to invert a has function and if you have the back door then you can exploit it and solve the problem so find collisions or invert the function all right so we Define to notion static or dynamic it's a bit

boring so you can look at this in the paper but we have for example static Collision back doors you what I mentioned before you just know one stupid Collision you have Dynamic Collision back doors it's more interesting you can generate as many collisions as you want or a lot of them static preimage is that uh you cannot really invert the function in natural sense but you can sort of invert you can generate a har value which is for example the old zero string or something that looks a bit suspicious now we have to Define sty so how strong is the back door in terms of um detectability so first of all you want to hide the back door so given the harsh

function the good guys they should not be able to spot the back door to have a suspicion of a back door and in case they know or they thought that there a back door then we don't want them to discover it to have this notion of discoverability if you give them the harush and how it works to exploit it to exploit the back door they should not be able to recover the back door okay so that's for the theory and our results are what I call Dynamic back door and detectable but not undiscoverable so that's a bit boring so let's move to Shawan um so have you heard about Shawan yeah a bit less popular than md5 but uh

no it's used everywhere today it was designed by by this guys again Ina standardized by by nist so it was designed in 1995 or the design was published 1995 and it's used almost everywhere so in RSA for encryption in what is called RSA with Shan signes hmark pbkdf2 for key derivation plus ring so you see H you see sh one in TLS in SSH IC and many many more protocols it's used for integrated check so in something like G in bootloaders to for code verification uh in host based idss and for file Integrity monitoring to make sure that the f haven't been modified uh how it works so I could present you 10 slides we giving all the

details but you don't need to know the details about sh one just very briefly what it does it just a bunch of additions exorts which not addition exors and shifts and rotations that's pretty simple and it use these four constants so you see the the exort here uh The Logical end The Logical negation and 5 a82 7999 6 ed9 eb8 1 8 F1 BB C DC and C a62 C1 D6 so do you see what this means do you know what those constants are oh wa square root of two square root of three square root of five and square root of 10 10 multiply by 2 to the 32 and uh to take the first 30 32 32 bits

they use this constants to um sort of bring some nonzero values because if you don't add any um in random looking numbers then you have some symmetry you have some structure in the cipher and it's sometimes a little bit easier to break and use different constants for different for each of the four rounds because they want the rounds to be different because if the rounds are similar then some attacks might or might not be possible okay so look at this constants um so sh is broken so BR I told it so let's trust him uh but it's not that broken like I said it's uses everywhere so if it's where had broken uh would be some we would be using

something different maybe um what they mean by Broken is that for cryptographers as soon as you find an attack that is a little bit faster than the generic one then you cre it a break what's interesting is that they have Collision attacks that have complexity approximately to the60 to the 64 we don't really know some people try to verify this then they set up a cluster of computers they implemented the attack they optimize it and so on and they inform one they were very optimistic we're going to find a collision they waed they waited they waited and no Collision um so the details uh why didn't work are not very clear but they may have uh underestimated the cost of

the attack but what is pretty sure is that there's a way to find collisions much much faster thousand times faster than than it should be okay so the recommendation is not to use sh one today but of course nobody cares because they don't have a collision okay so I will briefly describe the techniques that we use to uh to find collisions so it's based on the state ofd differential attacks here you have the message expansion you take a a small block of message of 5 12 bits and you expand it to something but much bigger and this much bigger thing you will process it you using the operation I showed before the exhaust and rotation and so and

using the constants so the little blue squares here are the differences we have two different messages and they have only tiny differ is maybe one bit maybe two bits three bits and here it's the internal State on the right so the first step is the the top and the last step is the the bottom and you see how the difference propagate so at the beginning you have very few differences and you have many many more and then it sort of vanishes so what does it mean what when difference is completely vanish it means that you enter the same state in other words that you have a collision what you want the difficult thing is that it's very very

difficult to predict how differences propagate because of what we call nonlinearity if everything is linear it's completely predictable you know the difference the input difference you can determine deterministically the output difference so what we do is with Shan is first find what we call differental differential characteristic it just means a way a pattern of propagation so the way we do it is we imagine that it's predictable we imagine that it's completely linear and then we compute the probability that the nonlinear stuff behaves linearly and once we find this uh pattern of propagation that would give us Collision we try to find a message that will satisfy this propagation and a and give us Collision um that's the details of the

pattern we found here we have many many differences so it's extremely extremely low probability much lower than 2 to the minus 40 uh it's extremely low but the trick we use is that we use some automated techniques some automated search techniques to find a message um that matches the differences so that we don't have to bradforce it so this comes essentially for free we satisfy all this great difference pattern essentially for free and then at this step this step it's a bit more difficult we cannot modify the message because we we found a message that satisfies the constants here so what can we modify not a message we modify the constants um so the first step is what I

just described for the second round we will try all the 2 to the 32 constants K2 until one gives us what we want so a good propagation of differences and we sort of repeat the same thing iteratively for all the all the three steps so you might notice that here we don't modify K1 the first constant it's completely untouched it's the same as the original one and then we will modify the next the next three constants so the total cost of the attack is approximately 2 td48 uh took us uh if I remember correctly C hours on on the cluster we use but 248 is relatively relatively small in terms of of uh computation search okay and we had a collision so

you had to trust me that this works but if you take the original initial value of Shan if you modify the constants so you see it's different from the original one it's not it's no longer a square root of whatever and you take the two messages with this difference pattern you get the same hash so we have a collision not for Shawan but for a slightly modified version of Shawan what's pretty cool too is that if you take the state-ofthe-art Collision attacks on the original shaan they use two blocks so Shan would has the first block and then the second one and you would only have a collision after two blocks here we have a collision after

just one block all right so it's pretty useless you see it looks like garbage bites you don't see how how to exploit this um and it's not a real sh one so you would say Okay equation is useless and it's on not rro one so what is it use useful for um so I don't know if you've already seen this kind of thing but I have seen many cases well many cases couple of cases where um customers of some encryption um or some security companies say yeah we want our personal as crypto we we don't want the same crypto as uh those guys uh because uh we part know it or we just don't understand

things so we want different constants we want different boxes we want different whatever and sometimes people do modify the standards so they want the customers that s so you're no longer compliant because it does not match the official specs but you will have your very own version of shaan or a so I know that's one of the motivation for this work you design your custom Shan you can sabotage it okay and how to turn this garbage Collision to useful Collision so not just two sequences of random bites but two actual files of some given format you might think about images documents uh PDF or whatever so the basic idea that is quite agnostic to the file format we'll use is

to construct messages as follows so here you have M1 and M2 which will be the different the first block it will be different because we need differences to get a collision and we'll create M1 and M2 in such a way that semantically instruct the the program so which might be the command inter the image viewer or the CPU uh in the case of M1 to jump here and start executing the first payload and in the case of M2 we complete discard the first payload and execute the second one so you might see as a simple if I'm M1 then go to pay one if I'm M2 go to pay2 and that's actually what um what we did with the shell

script example I'll show later but it's not so easy because we have constraints the differences cannot be fully old we have to have some differences at I don't know maybe the the third bit or the 23 bit so we can do everything we want and a big problem is that in many five formats you have these magic signatures at the beginning the first four by have to be a given value and what really sucks here is that because of the cision attack we have to have differences in these first four byes so we cannot find uh adamant collisions for five formats that impose a magic signature of four bytes at of set zero but as you will see

later there are some f format where this signature can be later in theile at another offet okay uh okay so maybe the most interesting would be p is portable executable files the Windows executable for example it doesn't work because um there's the entry point that is encoded somewhere here and because of our differential pattern we have to have a difference at this point which mean that this would create a file of more than one gig and this would not be supported by windows so we can't find collisions for PES we can't find collisions for Elf nether macro macro files um not today and maybe if you find a better differential characteristic we may or may not be able but um yeah sorry

this doesn't work at the moment something that works is a sh scripts so this is completely trivial you just put some uh sharp symbol uh so it's a comment in a shell script and then you can put whatever you want not only printable characters but any arbitrary bites so you will have differences in this in this step and then you can have just a normal shell script where you say uh if my first bite is like this then execute this this command and if my first bite is like that and do something evil so we have this example in this first version we start with a with a sharp and we have zero a at at this line

so when we execute it we have this this this animal here and the second one is just a little bit different you see there's a couple differences but we still have the the shop and it puts hello word so what's cool here it can be hello word it can be a CO it can be a deer it can be a dog it can be everything uh it can be execute uh this program it can be launch your internet browser it can be your RM minus RF so that's pretty cool but if you look at the Shell Code it will be straightforward that you have this this if statement if OD minus blah blah so the back door is clear not um

undetectable here okay what about archives so I'm not speaking about zip because it doesn't work for Zips but for R and 7z formats what's pretty cool is that like I said the signature doesn't have to be at the first at the zero offset so we can have the Cure later and start with some garbage that will include our differences so the IDE is is as follows we have two um two files so one file here one file there so in the first one it will be sort of corrupted it will not look like a r so the parer will continue looking for valid archives until it finds r two in the second case it will look like a

valid archive at the beginning so it will execute r one and again what's very cool is that you can put whatever you want in r one and r two so it work exactly the same for for 7z uh Comm BR so Master BR record and com files so maybe not the most used executables today but what's fun is that there's no signatures at all you can the first bite can be a bite of code so starts with x86 code so here the idea is again to use a jump instruction to jump at different offsets and control this jump based on the differences imposed by the attack which where works like this jump address one jump address two and we

have two different payloads that again we can completely control jpeg is a bit different because it's not code executed like in the previous example um and there's a two by signature so you have a sequence of of chunks in the jic format and what we do is that the first chunk will be commented by one of the um one of one of the two files so it will not be recognized by image viewer whil in the second case it will not be commented and it will be processed so it will allow you to make to put two different images in the same file um that technical data I don't think it's very very clear so I will I'll skip it

but you can get something like this and you said they have um exactly the the same the same hash okay and again we can put any images I was told it was not cool to put images of uh RSC company here but yeah I don't care um now polygloss that's even cooler so the what I showed before is that you construct a sabotage shaan and you can find uh colliding shell scripts or you construct another sabotage shaan you can find colliding jpegs but what about if you want to have the same malicious Shan and create both colliding sh scripts mbrs and archives then you can do it too and here the tricky use that the first block

will be valid with respect to each of those three formats so there are interpreter will not plan the it will there will be some useful x86 code and the shell script will um will interpret it correctly as well so I call it virtual multi collisions because out of one instance of shaan you can find M collisions for different file formats and not just one Collision for each type but many many collisions for each of the three types that's pretty useful okay and even more magic so here we just have two files and depending on the structure of the PDF so we also look at the PDF format depending the image viewer so this is Chrome this is Sumatra

this is adob they have different definitions of the PDF format so they will read it a bit differently um so we you took pictures of at random and we have different script different mbrs and we we we checked it into q q it works well that's very cool okay so you will have more details about this binary thing in po get fof which will be distributed at Defcon uh by Travis Goods speed and yeah thanks s uh we're here well a cryptographer and a binary star war in Du bar so by an and and Maria they did a really really good work so you can get it for free it will be your on Twitter so now I'm going to

conclude I don't know what what time it is yeah just three minutes left okay so many people ask me oh Russia wanted broken what can we do should we kill ourselves or you know don't need to worry it does not affect the security of shaan at all uh we reuse the ideas of attacks and Shawan it has no implication of the on the original Shawan so so far so good don't use Shawan anyway uh so there's always partner with people you know Snowden and all this stuff say ah but NSA they use this the B Shan you we knew it well uh I think it's very unlikely that NSA use this trick because first of all the design shazo to two or

three years before shaan and there was a really huge weakness in sh zero which suggest that they were not really that smart at that time uh an actual Collision for shz was F in 19998 okay and the techniques that we use they were discovered only in 2004 2005 whated publicly uh so you can still argue yeah but maybe NS then knew it well you can still speculate but I think it's very very unlikely that they they did this on sh one uh can we do the same for sha 256 um we haven't investigated this the good news is that in sh 256 you have not four constants but 64 constants so which gives us many many more freedom to

modify the constants on the other hand it means many more differences as well which may or may not be interesting what it's not so good is that whereas we relied on the Collision attacks on the full sha one we cannot rely on collision attacks for the full sh two because there is none the best known uh attack is on 35 31 steps out of 64 so the first step would be to create to expand this work to find uh characteristics to find patterns different propagation patterns for the F 256 I don't know if it's feasible but we will probably look into it in the next months thank you for attention I will be happy to take

questions you can go to this web page Myan email us and that's it thank you very much [Applause]

yeah on that one side previously you had a second reason for why you didn't think the numbers were and C yeah square roots or the random but you can take square root of many many numbers or many can take a huge database of numbers that look like

yeah the the