
foreign so I'm very pleased to be here to present the it is the very last part of a research performed during two years and I would like to thank the besides Lisbon team for inviting me to to present this research work so it deals with how to explain trade data in a way that you are sure that you can never be detected so I'm sorry uh my Linus computer was not working with the video so there is one demo that I will not be able to do but if you are interested I can do the demo on my computer right after the talk so I probably agenda I will first explain the different state of the art and non-technic to exfiltrate data or to protect that app because the bus issues are connected then I will present how to make a malware very difficult to say impossible to um to analyze it is the first initial step in order to deploy the actual texture techniques and I will explain how to bypass all data leak prevention technique and I will conclude so nowadays must attack are using performing data exploration and for example logbeat 3.0 the most unknown case quite systematically they they put pressure on the victim by threatening to publish data that they have collected so it can be just a few credentials but it can be also a complete a complete database so depending on the the environment it is more or less difficult so the the most difficult case when you consider air gap environment so they are not connected so the some techniques are known but generally the the amount of data that you can accelerate is very limited and you can just enforce a very low data rate so it is just efficient to exercise for example a few credentials but whenever you are connected I mean in a network environment it's it's more easy and whatever may be the protocol considered so um but you have generally two um bypass the different traffic surveillance and in most industrial cases it is Data League prevention tools for example in Azure Microsoft Azure environment but when when you succeed it is possible to accelerate a large amount of data huge databases at a very high data rate acceleration rate and generally the target environment are the low or medium awareness about this and most dlps are not able to really [Music] prevent data exploitation but the aim of the talk is to show that even if we we would have very efficient DLP it will be there will be they could be defeated very easily so from the defender perspective you can perform two actions the first one is the most most common one you generate is an automated or semi-automated analysis what we call DLP but especially in the postmortem step you can be you can wish to perform a manual analysis of the document that are currently accelerated so we have to deal with these two issues and we will show how to solve them so from the attack of perspective once again he has to consider then two different [Music] approach or he will launch very common [Music] attacks not specifically a targeted attack because it is supposing that only automated detection will be in place or you can consider a various targeting attack typically apt attacks uh and in this case you have to consider that the manual analysis will be possible and you have to defeat it as well so in this talk we will uh I will focus on the connecting environment but everything can be applied to air gap environment and if you if you wish to have more information about the air gap environment I can see give you references so in fact there are as most attacks you are first to deploy your malware uh and then the malware we will be able to First Scott their environment make analysis uh study the environment in order to adapt to the environment and to prevent these analysis and ease detection and of course even if the malware is is cooked and analyzed it must not reveal the exact uh the actual nature of the attack it must pull the analyst and it will be part of the of the problem to solve so we have a two two-fold issues we have two fold issues in fact the attacker has to consider the different Challenge and the issues to solve so you can have semantic detection most of the time it is keywords or even Behavior profile the data can be encrypted but the encryption is very easy to detect I will explain why um the attacker can use an encryption key in order to to encrypt this data but you have to manage a secret key which is not very easy especially when you have to accelerate mini document you can be you have uh or you have to face the fact that for example you can actually trade data only through ipsec Channel and the attacker has not the control of the APC channel so how to extrate data despite the fact that there is uh such a protection if you want to accelerate many data techniques like steganography are not possible because of the embedding rate the acceleration rate is very very low less than three percent and you have also to consider the user's baby viewer because um the amount of data the frequency of acceleration can also be a detection criteria so on the defender side he has to try to solve all these issues but with the defender perspective of course so as far as the issue uh the fourth issue I will not I will not have time to present it but you can refer to my talk at cancer Quest and we have updated a new ipsec IO Hardware uh in order to adapt the attacking is still work fully so we can effectively accelerate data in order to despite the the existence of the ipsec channel where just to monitor the IPC Channel and to make a statistical analysis so it's possible so you can you can ask yourself why such a study is this not to to build attacks of course but I think the attacker perspective always enabled to be aware of the risk and to try to find Solutions and there are possible applications that are not related to attacks I will of course for for sake of clarity present unitary break but of course you can put all together to have something very sophisticated so for for a reason that you will well understand the old codes and proof of concepts are not public but technical papers explaining that will be and the slides will be made public so what are the different known technique as far as information protection is concerned and of course is related issues a data exfiltration in fact I I have taken the NATO terminology you have two ways of protecting information you have what we call Concept communication security so you just protect the information but not the fact that you are exchanging sensitive information so most of the time it is performed by cryptography but also you have to for in some industrial case which are very sensitive you have also to add physical security or even to fight against electromagnetic emissions but if you exchange encrypted data of course it's very easy to detect so the other way is to perform to add Transaction Transmission security you not only protect the information but also the channel the communication Channel it means that in fact you hide the fact that you are currently exchanging sensitive data so if you look at the transmission it looks very it is innocent looking you cannot discriminate between to communication one being really innocent the other one transmitting sensitive data and the most known technique is steganography so in the technique I am going to present to you of course which we we intend to combine bossages to protect the information but also the fact that we are currently exfiltrating data so if we compare from the technical point of view the difference between cryptography and steganography uh between commsec and tronsec in in cryptography you just transform your plain text into a random text and to to go to to be able to go back to the plain text you must know a secret key so you have issues related to Security Management and so on and of course accessing the plain text without the key must be complex and invisible in in a final time um on the right side you have a steganography so that are of course encrypted so it is a consec part but your encrypted data will be embedded in in a covert image after the the embedding it is not possible to it is supposed not to be possible to make the difference between the image before insertion and after insertion and after that of course the answer is the embedding process is also key secretly dependent then you send the image and you reverse all the process and in the history of Tourism for example Al-Qaeda used to steganography to exchange since the sensitive data it was a limited use but it is it has proven to be the case so when we compare these two techniques what are the relative Security in fact in both cases the detection is rather easy uh on the left to detect cryptography for example here you have a case of the analysis of a binary Windows binary file which is partly encrypted and we use what we call entropy entropy is the amount of uncertainty or Randomness if you prefer of data so for all the code which is non-encrypted the entropy is Resolute and whenever you have a peak it means yeah that you have a crypto and the entropy which is denoted by H in fact normal data whatever maybe the language is around four uh if you have packed compressed data for example packed binaries you will reach six and encrypted data you have the maximum per character eight so it is very easy to detect that you are currently transferring or externating encrypted data as far as technography is concerned you have on the left so in fact this this drawing shows shows two points that that in fact whenever you go beyond three percent of the export or bits for example the the most the least significant bit in a JPEG image and fight beyond the three percent uh there are efficient techniques we are able to detect to make the difference to discriminate between a normal image and that steganography image and if you consider the number n of the possible usable between order to embed data you can use only a square root of n so it means that the exfiltrating a large amount of data is quite impossible with this cryptography without being detected so even for for few a set of a few credentials the use of staggerography is not recommended because because you will be detected so is there another solution yes so it comes from what we call magicus cryptography and cryptology and Mathematics in fact it is an emerging field I work on this field for for many years and it can be defined at the interconnection of all the attack techniques with cryptological mathematics in order to enrich them each other so there are a lot of application a lot of not all not all uh attacks you can use it for legitimate protection so the development is super malware and you have a lot of mathematical theories that enable to build actual malware that are very difficult to detect not to say impossible so and you can also to to perform Crypt analysis for example to steal secret key you can use malware so it is the case where malware can be used in order to do what we call apply Crypt analysis to make Recon for example um thanks to to mathematical properties we were able to to design and build code that were that we are able to detect on which kind of processor um there were there were executed and to decide to activate uh or not the illegitimate the malware part in fact you have to compute mathematical functions and depending on the result you will know whether you are on the given class of Intel processor or arm and so on we are able to discriminate very precisely the different kind of processor families and ultimately it is possible to to to use mathematics in order to to build block ciphers with the trapdoors encryption ciphers with trap doors and in 2019 I was a beside to present algorithm with propdos and if you want you can go to to talks at blackout Europe you have published and designed and published an algorithm which is very similar to the AES the international standard and there is a trapdoor and in less at 30 seconds you are able to break and find the key so malicious cryptography and Mathematics must be considered in cyber security because it is a power of mathematics and cryptology to the service and to the benefit of attackers so the first part is to use a malware but malware can be detected can be analyzed and the reverse engineer can understand more or less quickly what the malware actually do that's so the the main tool is what we call non-trivial denable cryptography the enabled cryptography is not for a long time but only a travel case called one-time pads was not until now so in fact it consists to to take a random two random sequencies random sequencies and adding to the text and you have two random sequencies each producing a different plain text from the same ciphertext but of course if you have a very long ciphertext it's not practical because you are first to embed the random sequence in the malware code so the the key will be leaked and and compromised during the reverse engineering step and it is not practical so um last year we have found a practical solution which is non-trivial and in fact we consider we still consider a ciphertext n a unique algorithm and two different plain text P1 and P2 and we implemented the framework which is able to build the algorithm e on the fly it takes only a few seconds and we don't need this time very long keys keys as long as the cipher takes but very short keys so from 128 to 256 bits which is currently the standard for the key size in secret cryptography so e the algorithm e must be deterministic so it may be a stream Cipher or block Cipher we're currently working on Black ciphero Version it must be supposed to be public it means that even if all crypto analysts of the world try to analyze it it should be not be possible to break the algorithm and of course the the secret key is far smaller than the size of the ciphertext which is was very important condition for operation security so we discard completely one time better and we have the following equation from the same Cipher text depending on the key I use I will obtain two different plain text so it seems to be theoretical let's say how to apply this in our context and of course I have chosen two plain texts but could extend two three four or five different blend texts it can be generalized so here you have the the security analysis so we support that uh from the side vertex C we cannot guess P1 and P2 and even if you and we have P1 we cannot we cannot guess P2 or conversely it is a security assumption and condition for the security so of course I'm going to present the the use in the context of malware but there are a lot of legitimate applications we have developed with the research and development company so the code protection not only malware but also also legitimate program anti-force the techniques and we are currently working on Multi Communication channel you have only one encrypted Channel but depending on the key you can access to one or two channels of communication so for the demo I will not be able to do it now but right after the talk I can online other computer I can show you so we have the application on our context we have a malware and depend encrypted malware and depending on the key that will be used you will either decrypt as a innocent looking code legitimate or Lane malware that will fool the analyst or if the condition are met with a different key it will decrypt on the actual malware which intend to to perform the real Attack it means from from the same code with two different short keys you will obtain two different code and of course the aim and the operation condition is to make sure that the analyst the reverse engineer will always have access only of this you will find a key you will find a code it will be happy okay I have understood the logic of the malware while in fact there is another logic so of course the key cannot be inside the code otherwise it should it will be a compromised during the Reuters engineering part so in fact we suppose the mode operation supposed to have a common component control and the and the the malware is in communication with the common and control so we have designed a very secure and complex communication protocol that use fingerprint time index type modification random connection environmental connection condition and so on and the malware itself never knows when the the right condition are met the malware is able to detect whether it is under analysis or not we have developed with a PhD student a lot of technique and you have in in the fourth reference at the end of the slide so the mlware the non-encrypted part of the malware analyze its environment is able to determine whether it is under analysis or not and you will send this information to the command and control and depending on the result it will of course decrypt in one code or another code and of course the protocol is done in such a way that the probability that the reverse engineer will access only to the to the wrong solution will tend to win so it means that now we are able to do to deploy a malware that let's say would be quite impossible to an original to say impossible now how to actually try data and of course the the aim for the for the marijuana to be detected is to hide the acceleration Technique we use the first very lame but uh it is lame but it is very still a very efficient for most dlbs I have tested against the most famous LPS that you can buy they were all defeated we use Simple metadata when you you have a digital document you have the data what you see and you have inside the file you have a lot of internals what we call a metadata you are for example old version you have date uh the ownership and so on um and most all the present-day digital format have a lot of metadata so we have a we have observed a large permissiveness with respect to metadata and it's very easy uh to to exfoliate data in this way uh regarding metadata of network network protocol you can see the the work of the Polish University Washington University they did a very nice work exploiting Network metadata but it's the same principle so I've taken just an example I've taken a PDF file it was a scientific publication I just opened with a text editor and I insert a text within the different PDF part because PDF is a language it is like a script of course here I have made the the text visible for the demonstration but it is a lame approach in fact you will have to hide it with the specific statistical profile we are going to see it right after and you close you save the document you can open the document now I layer the document is still working and you can test it against most DLP there is no detection at all even very famous DLP from very famous software editor experiment is very surprising but once again for me it's a lame very lame attack of course if you you can say okay there is document Integrity in place because for example every document cannot modify without triggering an alert you can build your own document it's not a problem so in fact the aim is to produce document that are made mimicking normal document in place and for this we are going to use entropy and statistical profile for example to take a text and to make it look like a PDF at least from the perspective of a statistical detector of course for your human eye you will see that it is not a PDF but once again most of the dlps are automated process so of course you are not going to actually to to mimic the statistical profile in such a way that it will be possible for the analysts to reverse the process so generally we use key dependent secret key dependent transformation so secretly means that you have Key Management to uh to deal with and of course you cannot hide the secret key inside the code otherwise there is a problem once again the key will would be compromised um you could be compromised during the reverse engineering part you cannot like run some run some ransomware usually do use a crypto IP API to Generate random keys