
uh welcome uh yeah my talk is today about uh why mime is broken and okay the screen is a little bit off let's see so uh about me i'm uh yeah i'm the security engineer security researcher fellow i'm working uh 20 years plus at genoa gmbr uh munich my focus of the work is not on breaking things but on protecting what's already broken and i did a lot of firewall development focusing on the application layer i did eight years of collaboration with academia and research projects focusing on defending against taxvier meal and rap and i'm currently involved more in product and research strategies it's a company exactly 30 years old now with 360 plus employees in various locations
inside germany uh we are a subsidiary of the bundesliga gambija and we build security solutions for it and ot and our focus is on sectors with higher security requirements so public sector critical infrastructure structures regulated industry e-health etc and yeah of course if you want to help you're welcome uh the motivation of my research like i said uh i did a lot of um firewall development uh uh at the application layer so i had to implement uh application protocols and uh the best way or the preferred way to implement application protocols is to actually look at the standards not many do this actually when implementing these protocols uh yeah but when looking at the standards
and trying to find out how to implement these the problem is that these standards are typically very flexible and very complex and they are unnecessarily unnecessary flexible and complex they leave way too much room for creative interpretation so there are lots of edge cases with with no clearly defined behavior there's no defined behavior of protocol errors there's a lot of short instead of must be implemented and they are partly conflicting with previous standards of course all of this conflicts kind of the security uh the result are that we have different implementations which interpret uh especially edge cases in different ways and this can of course be used by attackers so if the analysis system like a
firewall interprets the content differently then the end user system like in our case a mail user agent then we can pass attacks through the firewall so the focus of my research in this case is um it's mine uh mime is kind of a standard for a rich meal so what we have today structured male binary attachments non-ascii characters uh and what i explore here is how to use different interpretations of mime to bypass security systems so these are analysis in male filters firewalls intrusion action systems antivirus versus the interpretation by mail user agents or web front ends um i'm looking into bypassing mirror detection by content using the icar test viewers and uh bypassing detection or attachment
filtering by file name so if we can uh make the firewall believe that we have a different file name it's not a dot x is not a dot zip then the firewall will pass through the mail there's similar research for http one there's all this ports figure stuff about http dsync attacks which are targeting the server side a little a lot of research on the http site too but for targeting clients so sending malicious responses or unexpected responses by the server and by passing firewalls this way this result here uh was primarily done in 2015 to 2018 during a research project but i freshly updated it to make sure that all the attacks still work and they
do so yeah to explore this topic um i created a lot of different mails with different test cases so uh nearly a few hundred meals for continental asus or nearly 200 meals for bypassing extension blocking um these males uh if exported as a files or mail directory and also as a packet capture file to check against intrusion action systems and then i checked against several systems like various male user agents various antivirus and male fetal products some intuition injection symptoms in firewall i won't mention here because it's not a free one and some libraries okay a short introduction into what mime actually is in the beginning there was about uh before 1994 five there was only esky there were only esky
males these had a line uh length limit of thousand thousand bytes and there was no kind of structure no attachment or similar insight in 1996 we got these mime rfc which defined different aspects of how to serialize structured information and non-sk information into the original limits so yeah binary data structures data like attachments and of course like i said this is a standard which is flexible which is complex underspecified and has lots of room for creative interpretation a year later there were some additional standards one is the callback outcome disposition how to specify a file name is how to specify if the attachment is inline or external and there is uh another standard for encoding of non-ski characters into
uh structured data like file names or so um for some reason they didn't include this in the original one so they needed yet another one which implemented a totally different encoding for some reason okay uh if you look at the source code of a mail we see the different standards applied here so uh we have the subject which is an unstructured header field and there we see rfc 2047 so this one which is about encoding characters in the header what we have here is with q is a code printable encoding which means that uh the characters the non-ascii characters are encoded in some kind of hex value so this equals c drive a c 3 means
that it's a character hex c 3 and then we can look in the utf art utf-8 encoding what the second actually means okay uh then we have this kind of mime preamble here which is hidden in modern male clients so basically all male clients we have today it's only for the old male clients so this is everything before the first part because we have here a multi-part mail which is defined in rfc 2046 and we have a boundary here defined to split the mail into different parts with the final boundary here uh yeah then we have uh for some parts we have a name so we can have a name given in this way we can have name given in
this way this is actually rc2231 which has this kind of encoding for long file names which we can split over multiple parameters because like i said the limit for line lengths in male is a thousand characters and which has these kind of hex encoding here which looks similar to what we have in quoted principle but again it's totally different because we have a here we have an equal sign here we percent percent centered so and yeah for some reason they needed to make everything differently and then we have coded printable encoding here for the content and base 64 encoding here uh for the content of the attachment okay uh i show some selected examples of how
this uh can be used uh to create edge cases which are theoretically allowed or especially not as they are not especially forbidden by the standard but where the interpretation is ambitious so for example in this case you can define two content transfer encodings this is similar like attacks in http where we have two content length sellers with different values and because we have two values here it's not clear how this part so this is base64 how this gets interpreted and depending on the mail client we use it gets integrated in a different way and depends on the analysis software we use or the library some manage to see this and some uh don't see this so sorry
carter ids doesn't see the attachments right attachment in this case oops slight variation of this we have one field only but we have multiple values inside uh again some male clients understand this and just take the base64 because it's the first one no male client actually uh unders takes it detects the last one but outlook and apple mail don't understand this at all and just assume there's no encoding done you can play similar games with the content uh the multi-part male so we have boundaries here and we can have two boundary definitions and uh the correct one would be a bar here so this is the first one and here's this is as the last one and again depending on
the meal client and depending on the analysis software uh we have different results same thing again one field multiple values and here again it switches first uh what was on this side and apple mill was on this side he has a node switch and it's the same with the various software so we see here see a majesty new which behaves the same as pearl mime tools because it's used inside yeah and then a different area the area of encodings there's a b64 encoding uh base64 basically means that we have three bytes binary data mapped to four byte ascii data so it's already it's taken the first six bit then the next six bit and so on until we
have uh have 18 20 24 bit 3 bytes binary and maps these to 32 bits which uh are playing s key and if you have less than three bytes and we have a padding so in this case we have four bytes here which means we have two bytes padding because we have to fill in the last two bytes you get six bytes and here in this case we have two bytes this means you have one byte padding and this padding is done with the equal sign here and the standard or the rc is not clear uh in this case uh if what we have should be encoded in one piece so that we have only a single
padding at most a single padding at the end or if you can actually have multiple pieces it suggests that must may be done this way but there is no short or even better a must and that's why we have different implementations so we have some about apple mail which just accepts that it's okay to do so uh we have mode this just takes the first part and you have outlook which i don't know it takes the first part and then there's garbage and i don't actually know where this garbage comes from [Applause] yeah and if you have a look at the analysis software or the libraries we see that most of these plainly fail to see what major mail
clients actually can do ah but we have more encodings we have a quota with printable which is in the standard tool but we have uh other encodings which are not in the standard but which are actually uh supported by some real clients there is this why inc encoding and this is an encoding which comes from the area of houston news i don't know if somebody still remembers it's some 15 20 years ago this time was it yes very efficient encoding and more efficient than base64 and that's why it was used on usenet to transfer transfer binary data and because thunderbird can still function as a news reader it implements these encoding but nobody else does and no analysis
software is able to see this other is very interesting encoding is uu encode this predates mime this predates why inc too uh it was used in former times in plain text mirrors to include some binary data so we had some kind of file name here inside this uh part and then we had this encoding and then there was an end and this encoding is very similar to base 64. only it's mapped slightly different but the idea is the same uh yeah and this is actually a widely supported encoding uh so we have major mill clients which can do it and uh we actually have major antivirus products which can do it too but uh there are lots of variations so we
can have different uh transfer encoding times vm sometimes we need to begin and then sometimes you can skip it etc etc and uh they all work slightly different so there's lots of room for passing bypassing analysis software and then we have another strange feature of mail which is comments in mail header as in mail fields so the standard actually says that uh one can place combat this is this part with a parenthesis around it uh in several places freely inserted and outlook uh takes this to the next level it's basically nearly everywhere this stuff can be inserted and it simply gets ignored so outlook sees in this case this i bar as a boundary all other um implementations don't see
this bar and yeah clement can see it and there's this firewall which can see us and there is a uh python library but everybody else cannot see this okay this is about content filtering and then we have the specification of file lamps two two small examples like i said we have this rfc 20231 which defined a new encoding for file names and which defined how these can be split into multiple parameters so there is an index for the parameter and uh yeah i can split it over multiple solutions the order actually does not matter which is explicitly specified as a standard and yeah um like one see i can see thunderbird apple mill not all supports
this kind of outlook has no idea what the standard is at all and the funny thing if i use microsoft exchange it even transforms this encoding this write encoding into a wrong encoding because outlook doesn't support it but i have this in other standards this rfc 2047 which was defined for encoding of characters into unstructured mail fields and it specifically says that this encoding stuff should not be quoted and should not be used inside of content type of content disposition but funny thing is most supports this and yeah many analysis software supports us too but there are several variations of the topic so if i use uh some uh strange encodings like otf16 it gets
weird and uh i have again ways to bypass firewalls or analysis systems okay so how to apply this knowledge in the practice uh small thing i create a mail and i check this mirror with virustotal so this mail has inside here base64 encoding a zip file and the zip file contains the icar test bureaus and like we can see there are so the ground truth 38 uh products in uh a virus total can actually parse this male and can uh detect the virus inside so nice let's see if we make a small change we add another transfer encoding here and we can see the number goes slightly down if you switch the order it's not much different and if
we specify a content transfer encoding which is something which doesn't exist like the x66 then yeah it goes down but there's still a lot of scanners which will simply have heuristic and uh see this is base64 encoding and will uh try to analyze it okay small step uh we don't use a single chunk for a base64 but we was actually multiple chunks for two bytes so we get an uh padding one character betting after after each chunk and now the number goes way down and even if we take back all this stuff before here is it still very low and last step uh we just uh add multiple boundaries here with different implementation and now
we have no antivirus product uh supports us anymore if we look at the meal in thunderbird we can still normally just extract it have a look at it and if we uh remove uh this uh quantum transfer confusion here it's still only one antivirus product which will seize us okay slightly different area are dkim signatures i don't know who is familiar with dkim and demark the basic idea is that we want to protect us against the spoofing of the sender and the visible sender inside the mirror is inside the mail from header and j-mark basically says that this from header must the domain of the from header must be aligned with either senate permitted frameworks for spf or
daycam signatures and taken signature is actually only a it's a cryptographic signature over parts of the header and over the body and we can check this and it's independent uh from the transport of the mail so we can redistribute the mail and this uh dickham signature does not change at all which is very nice but the ecom standard relies on mime being not broken and the dqm standard itself is insufficient too so there are no requirements which header fields like content type like subject also need to be protected there are some recommendations but these recommendations are insufficient there is a way in the standard to protect header fields to make sure that no critical header fields can be added
but there's no requirement to do so which means that typically implementations like an openg chem or so or basically in any products we see outside are broken they don't uh properly protect the headers additionally dkm allows us to sign only parts of samir the idea is that some software could add some food and the signature should be the same and they say this is a seriously security problem but nevertheless they allow it okay so what can we can we do this is a mail i got um about oh six years ago and uh from uh deutsche post which uh is a classic sender a typical seller for spoofing and uh we have to say dickens signature
here and the deconstruction it says that uh not the full body is signed but yeah only up to a specific part like here at the end uh the mail also uh the consequence also says that we have some fields signed here but these are only included once that means the existing field is including but i can freely add additional fields these are not included in the signature so what i can do is here i can add another date i can add another two header i can add another content type and i can freely uh put stuff at the end which means with this change content type i have a different boundary which means all this part up to this boundary is
seen only as a mine preamble is not visible in the middle client today but instead this part after that is shown and if i deliver this mail i can see that this part is shown and i can see that decam is still okay and says the properties of mail show that the date is changed the message id is i didn't know if i changed it in previous example but i can see for uh that daycam and denmark both pass so final words uh are there any solutions to this problem yeah maybe the problems are hard to fix there are zillions of mime implementations and scripts in the wild is often broken there is no money pulley on
implementations which can could be used to enforce a minimal quality we have such monopoly browsers one can try to block such edge cases but this can cause uh unbearable correlatory effects we have i've seen this in the practice there's simply too much junk in the real world and in dubbed operation beats security it worked before we installed instead the firewall there's a way to sanitize the wheels but uh this can break stuff tools or it will break taken signatures pgp and s mime signatures i've seen this too and ultimately one can simply lock the problems and then hopes that somebody cares actually about the locks thank you [Applause]
so do we have any questions for stefan and his very optimistic outlook on mime
so steph and i have one question and you kind of addressed it in your last slide if i am an administrator of an email service what what can i do is there anything that i should do today to protect against attacks like this i think there's no good answer to this because actually we we've built in our product all the other protections against these attacks i have shown and we had to make a lot of these protections optional because there are a lot of junk meals outside which are not spam meals which are real males but which are actually invalid okay there's another question i'm loud um hi um have you have you taken a look at
sometimes when you forward an email with attachments uh then in different kinds of email clients or webmail clients they get unfolded even though it's an attachment heavy have you been looking into this kind of scenario because i used to fix a bug in a webmail client where there's an attachment and a forward with an attachment and they were unfolded and this kind of helps sometimes mitigating also or some bypass some firewall rules yeah forwarding meals is it a strange thing they are there there are two options to follow the meals the inline-four wedding at the forwarding is attachment and uh supporting this attachment is supposed to just uh put in the origin email but one can actually modify it and
then there are different problems on top of this because if the original mail was transferred over a path which is eight byte clean and uh then i forward the smear and the the next pass is not eight byte clean which is a traditional path with smtp then i might need to rewrite everything so it can it could be used to kind of sanitize but uh it will then also it might also break it might break the signatures this might break pgp and as my signatures you mentioned that you generate a lot of test cases are you aware of any publicly available weaponized frameworks to exploit these issues so that i can target specific setups and generate a mail that will
bypass these checks now i'm not aware of any other research in this area in this steps but if you want to start searching come to me and i give you test cases and the frame vector generates these awesome thanks all right um i'm that should be the last question we've reached 15 o'clock please give stephen a hand is broken