
everyone good morning my name is Cheyenne's Devinder Doshi I hope everyone is having a great time here at the conference yeah awesome so my talk is labeled perfidious make PE backdooring great again and even though we will be discussing a technique that could be used to possibly backdoor PE files I would like to make sure that you keep an eye out for the bigger picture which is the framework and the library itself right so Who am I my name is chance David although she I am a cyber security graduate student at UMD and I'm a teaching assistant for the reverse engineering course at UMD I have previously worked as a male research internet cidery where I started
developing this project and I like malware analysis and reverse engineering so that's enough about me I like to know something about you so how many of you all are interested in reverse engineering malware arises stuff like that awesome great so before we get started and do what perfidious is and what it it is trying to do it is important to understand the context under which this entire thing was developed right so it was summer 2019 May August period I started working at cyber tea as a mall research intern and my manager for the budget basically said ok we want to develop a project that could be used to develop malicious files out of benign PE files and you can maybe
pass develop our source for source code and plug it into a machine learning system that could automatically generate malware samples right that was his initial idea now I had two questions regarding this is this legal for me to work on because what I would be essentially doing is developing malicious software that could be used to bypass the endpoint detection systems that are currently in place right I do not want to get arrested by the FBI the second question was that how would I go about getting the source code for all these malware samples right because if you have worked in the field you know that malware sample malware source code is something that is really hard to come
by right and you would need enough source code to be able to even if you had passes in place right that could pass all the source code to develop new malware you would need enough samples that would train a machine learning model on it right now as fate would have it the manager basically left the company for another company I was left with the project to whatever to do whatever I wanted to do with it right so I had two choices basically I could idle away my time at Highbury or I could be develop a project that could be a logical conclusion to what he started right so that is where perfidious comes into picture now before you can get into
perfidious it is important that you understand what a PE file format looks like right because that is the input to perfidious so this is the basic structure of a B file you have the daus header followed by the Doster followed by a rich header which may or may not be present followed by the PE signature followed by NT header these together compose the entire PE header that we know about this is followed by the section header and the actual sections right so before we get into each of these components of a PE file it is important to understand what these terms mean raw address so raw address space as we know starts at 0 this is basically
the raw file size of the actual PE file that we have it starts at 0 there's virtual address virtual address is basically the address such at which each of the components in the raw file are loaded in memory when the file is run so raw address is usually start at that height hex number then we have something called rvs now RVs are important to understand because RVs are something that the PE file uses internally to map addresses right so whenever we have an RV a present inside the PE file we need to convert it into a raw address to find that data structure on the file itself right so this is the formula that we
utilize now the first data structure present inside a PE file is the daus header right daus header contains a ton of different internal components the the main ones are highlighted in yellow G magic is the MC signature that we see at the beginning of any PE file right this basically identifies that this file is a PE file and it can run with the help of Windows loader right checksum checksum is basically the checksum of the PE file this helps us this is basically one of the first checks that you need to make sure that the checksum of the PE file is same as the checksum mentioned in this particular field that basically tells you that this file has not been tampered
with right the e.l.f knew basically tells us the offset to the extended header header which is where the actual PE file begins from dosa or how many of you all have participated in in any of the flareon challenges by fire so they they're basically the reverse engineering Olympics you can say they are some of the reverse engineering challenges that fireEye puts out every year in 2016 they put out a challenge which was basically a DOS file that was that basically said so once you completed the reverse engineering challenge when you executed the file it basically said that this file cannot not be run in DOS mode right so done basically a DOS program which was
present inside this das stuff so da sub can contain an entire DOS program hidden inside it or it can contain a program which runs and says ok this program cannot be run in DOS mode when you try to execute it in DOS mode right rich header now rich header was previously unknown data structure that was discovered around 2015-2016 timeframe this was previously used by Microsoft in order to find various malware groups that were utilizing because this data structure contains information about the compilers and the linkers that were used to create the p5 right so this could be used as a mechanism to develop signatures for various malware groups that were writing malware samples right so this is another important data
structure that one needs to make sure to keep in mind P signature this is the PE followed by NAR null this is the place where the actual PE file should ideally begin from like the data structures present previously are basically used for backward compatibility reasons right we don't really need those data structures this is followed by the NT header which is composed of the file header and the optional header which links to the data directories now file header how many of you all know what uwp means yes platform yeah so why do we have uwv like why does Microsoft need UWB what is the agenda for having UWB in place
exactly exactly so what Microsoft is ideally trying to do is have a single file format for each of the various applications that can run on every device that they have so they are trying to have some sort of uniformity throughout the various devices that they can support on their operating system right so for in order to do that the files need to have enough fields and enough information in place that can help the programs run on various Windows loaders that exist on those platforms right so file header basically contains the information about all those things it contains the machine code basically switch says that okay this is the architecture on which this particular file is supposed to run on it
has the number of sections that it contains the time date data at the time date Sam anyone know what that time means January 1st 1970 why is that time like why that specific time so that basic day that that date that time SAP is basically called the epoch time right that is considered as the beginning of time so that particular field calculates the time date Delta between when the program was actually compiled with that - the that particular time they are time Sam and the time date Delta that you get is basically stored in that particular time dates and field so you can use that field in order to find when this particular program was
compiled right now all the other data structures after that contain information that is important as well another important field is characteristics right characteristics basically gives you information about what kind of file this actually is vet whether it's a DLL it's an exe what kind of permissions this file has and stuff like that next we have optional header even though it says that it's an optional header it's not really optional it contains information about the base of code base of code basically tells you at what point inside the PE file the text section begins right X section is the place where the actual code for that particular program is stored right the image base basically tells you which
virtual address is used in order to load this particular program into memory right so even though it says that it's an optional header it is not really optional now once this optional header is completed it's followed by the data directory stable data directory stable basically contains information about the size of the data decrees and the raw virtual address at which this particular data directory is loaded right these are some of the data directories that are common commonly found inside PD files not all of these data directories will be contained inside a B file but some of the common ones which I will explain are the ones that are usually found inside all these PE files right export table
can anyone of you tell me what an export table means why would you have an export table inside a PE file what could it contain anyone so as the name suggests exportable basically contains information about the functions that are exported by this particular PE file right now PE files not necessarily are executables right they can also be dll's dll's are basically dynamically loaded library length libraries right now dll is basically contain information about the functions that it exports so this table can be used as a reference to find all those functions and the ad and the addresses inside the dns where these functions are stored right then we have the import table import ables are
usually present inside the PE files data exe is these these basically give you information about all the functions that are imported by that exe from the various dll and the addresses at which these functions can be found there are multiple sub data structures as well found inside for for the import tables will not go into that resource table why would you need a resource table inside the exe inside any PD file why would you need a resource table now when we usually think about programs they need not always be a CLI program right they can be GUI programs as well like games right you'll have a ton of different resources like images song files icons
stuff like that right you need to have a resource table that can be used in order to index those resources right so that the program when it runs it can find find those resources add those memory addresses resource table contains all that information exception table exception table contains information about the various exceptional exception handlers that are present inside the P if I write whenever an exception is triggered it goes through this exception table find out what function it needs to execute for that particular exception trigger and it executes that certificate table certificate table contains information about the certificate that I that is used to sign this p5 right so say you had Microsoft certificates lying
around right you could cite in your own executables with those Microsoft certificates and your certificate table would contain the Microsoft certificates right then when you run those on Windows you'd never encounter the yellow box that you get when you try to run files that are not signed by Microsoft right yeah like this program has not been signed by Michael that is basically how you can bypass the certificate table check debug table debug table is usually stripped inside most professionally put out PE files basically this contains all the debug information like all the debuff flags all the breakpoints that were used when it were when you were debugging the PE file TLS saver can anyone of you tell me
what TLS table means what it could have what TLS means what is TLS in this context it's not networks so what does TLS mean
anyone you so TL is basically stands for thread-local storage hey most operating systems that we have today are multi-threaded operating systems right programs use this multi-threaded capabilities to TLS evil basically stores information about various variables inside various threads inside your program right this differentiation is important because otherwise we could not have that multi storage capability where same area bill is used for multiple threads right that all that information is stored inside the TLS table import address table is basically same as the import table until it is loaded in memory at which point the addresses are replaced by the actual addresses in memory now this is followed by the section header section header contains information about the section
itself that the data present after it it contains the name the virtual address the size of the raw data pointer to that raw data and the characteristics now can anyone tell me why characteristics are important for various sections inside PE file for those of you who have worked with PE file why do you need characteristics for each of these sections it's just a stream of bytes right why do you need characteristics yes exactly so depending on which section is present inside the PE file each of these sections have their own permissions right not all these sections present inside PE files have execute permissions only the text section can be executable but the text section cannot
be reliable right because you don't want your text section to be replaced by another text section right so that is why you need characteristics which can highlight what kind of permissions are contained for that particular section now the current code injection techniques that we have one of the most common ones has custom section edition right this is something that if any one of you have gone through YC this is one of the techniques that is discussed in that course so you basically add a section at the end of the B file you create an entry inside the section header for that new section and you give it D right you read write execute permissions right or
whatever permissions that you want to give and then you basically change the entry point of the PE file to point to that new section what this does is instead of executing the text section when you try to run this program it will execute the section that you added at the end now the disadvantages of this approach are basically it is very easy to detect by most end point detection systems right because not in most P files only that X section is the section that contains execute permissions so the enjoined detection system can basically just check for these actions that have execute permissions and it can say okay this section is something weird I basically want to flag this P if I write
to the cell gain versus the time required to correctly implement this thing is way too low right so what is the other approach you can take P code giving can anyone of you tell me what a code cave is what does it mean to be a code cave like what what could a code cave be what do you understand by that word so P files are basically stream of bytes right not all those bytes are filled with information that is important for that file to run right there will be a stream of nulls that are present inside various locations inside P if I like those nulls might not be utilized those are called eco deicide and that can be
used in order to fill your malicious code inside right so how P code k works is you basically find the code keys that exist inside the p5 you try to find those 4 KS that exists inside sections with execute permissions that is basically the tag section if you find such a for km you replace the nulls with the malicious code and then you change the entry point address of the entry point to point to your new shell code that you've injected right that is how P P code caving works can anyone of you tell me what is the problem with that approach like what could be the problem that you could face when you try to
inject malicious code inside te code caves
not really what what could be what are some of the limitations of trying to inject core using this technique you not a real function so when you talk about core caves there is a limitation of size that is present for the code caves that you have right first you need to find code caves inside the PE file say you were able to find those code caves then you want to find a code cave that is large enough to store your shellcode then you need that code cave to be present inside the executable section as well right so this technique is not really this is not something that you can execute that is not this is not
something that you can do for each and every PE file that you have right this is so that these are some of the disadvantages of this approach right you need to be able to find for caves inside the PE file you need to find a code here that is large enough you need to find a for cave that is in the executable section right so why not just edit the dot text section instead problems that you face the dot tech section has the execute permissions that we need it has all the necessary things that we require why not just do that say you were able to edit the checksum that is one of the disadvantages but see
you could edit edit the checksum what would be the problems that you could face when you change the peg section yes if you say yeah but it's just a stream of bytes right so you were able to find it X section inside the BFI you could basically just pull it out um exactly that is one of the reasons why this has been really difficult to until now right because when you change the text section text section is usually the first section that exists after the section headers right so when you change that you basically move everything that comes after it right so you need to be able to map everything inside the PE file in
such a way that if you make one change inside the p5 all those changes are reflected back into everything that follows right that is where perfidious comes into picture right what perfidious is trying through is for Phidias is trying to fingerprint the PE file and convert it into a python-based class right so each and every data structure that is present inside the PE file is mapped into a class structure and whenever you make a single change inside any of these structures present inside that class you basically create appropriate changes inside for the sections that are dependent on that as well right so what you do is you use a function to directly import malicious code the perfidious extract the text
section of the PE file and combines it with be malicious for in such a way that it changes changes it via the control flow graph jumps right so this is basically even more difficult to detect because now you don't have a single blob of malicious code present inside your PE file right you have connected those individual malicious chunks by jumps right so you don't have a single blob of code that can be detected by signature detection mechanisms right so that is what perfidious is trying to do now advantages of this approach it is relatively difficult to detect it and detect it statically if it has done really well and really write write the malicious code itself is split into
smaller pieces so it is difficult to detect all the other parts of PD files are left relatively unchanged right you are just making changes through the dot text section and you're performing the in the individually dependent changes inside the PE file but not the entire B file is being changed right so it is relatively unchanged so these are some of the advantages of this approach how would you so how do you go about preventing such kind of detection like injection how would you prevent this kind of injection what would be the easiest way in which you could prevent this kind of injection of PE files on your system like how do you detect such an injected PE file
so one of the easiest approach would be to only allow whitelist software on your network right only allow whitelisted checked software that has been verified that ok this software comes from this particular author on your network that is the easiest approach to basically stop malware on your network right but that is not really feasible for most enterprise network so the next approach would be to have dynamic analysis on all PE files that come through your network right the addition approach that you can use to detect because in dynamic analysis you are basically detecting the control flow of the program itself right so if it does something malicious it can be flag one of the approaches
that the one of the third approaches that you could use to detect such kind of index injection is that you can use something called a graphs hash right this is something that I came across while I was in Singapore at hack in the box so the researchers they basically suggested developing a hash for the control flow of the program right inside in that way basically if the controller of the program changes the corresponding hash changes as well right so you can use that hash to detect basically how different a particular program is from another program right so those are some of the techniques that you can use to detect such kind of injections now even
though this program is not completely ready I'll give you a demo of how it works so when you Rick so this is how it works you basically have the PE class structure you pass into a one of the PE files that you have and then when you print that particular class structure it basically shows you all the things that it found inside it it found the daus header it found the doorstop yeah it did not find the originator because it does not exist for this VI so it creates a P object at that particular address now if I try to print a particular data structure this is what it has for these individual data structure that is present inside the p5
now can anyone of you tell me why that could be a useful thing to do why would you want individual check sums for various data structures inside the PE file yeah so basically you could use it to detect at specific granular level which part of the PE file has been changed right you can use this in order to detect injections in various parts of the PE file right so this is one of the things that it can do apart from doing this so this is the end there's a sort of the intermediate end goal of this project right so what I'm doing here is I am printing the original doorstep message of this particular PE file and
changing that dot stuff message using just the equal to sign and placing it with mrs. Johnson and then I'm printing it again now if I try to run this this is what happens right this is the original message that is present inside the beef I like any other piece is right this program cannot be run in DOS mode but now this message has been changed to be such thousand in 2019 so this is my intermediate goal for January February 29th 2020 that I should be able to do this for each and every data structure that is present inside the B if I write say I wanted to change an address to a particular function inside the import
table I should be able to do that using just the equal to sign right so what this does is it gives me accessibility to every data structure that is present inside the pe5 the next thing would be to develop a machine learning algorithm that could automatically develop class-based structure for the malware malware samples that are passing into this project and then find out okay what changes I could make in order to make a particular PFI is undetectable for the current and burn detection systems right once our finger didn't read the entire PE file that would be the next logical step to go about doing this so that's it from me thank you any questions yes
ah so currently this thing is not open source because I am still trying to do so around January time I will be open sourcing this project because that is when I will have completed the fingerprinting part of it after that you could basically help me develop the machine learning part of things that is the next phase of this project or you can basically use it - it's just a Python library right you could develop your own detection systems using Python for it and you could plug it at the end of your network and basically say okay I want to detect these kind of injections into a particular structure inside the PE file you can use it to develop your
own detections like that like that is what I have in mind until now
any other questions awesome thank you for your time I have bad door and