
besides DC would like to thank all of our sponsors and a special thank you to all of our speakers volunteers and organizers so I hope everyone is having a great time at besides DC yeah awesome awesome awesome well thank you all for coming to my talk so my talk is called perfidious make PE back during great again now I'll be sort of proposing a technique that could be used to backdoor PE files that is better than the rest of the techniques but I want to keep I want to make sure that you keep in and keep an eye out for the bigger picture which is the library that I'm sort of developing and this is no URL that you
can use to download the slides and I know it's a tiny URL but you can use preview door tiny URL if you are worried that I might be doing something malicious right so it's a good resource to sort of have a reference for the B file structure so yeah so let's get started so something about me my name is trans Dave in the douchy I'm a cyber security graduate student at UMD I'm also a graduate teaching assistants for the reverse reverse engineering course there I have previously worked as a malware research internet cyber D where I sort of started creating this project I like reverse engineering and malware analysis and these are my contact information so that's enough about me
something about yourselves so how many of you all are interested in malware analysis reverse engineering that sort of thing awesome great so uh before we get started and do what this project is and what this project aims to do it is important to understand how this project got started right so during the summer of 2019 that's May to August I started working at cyber tea as a summer research intern and my original the original idea that my mentor proposed to me was to basically create a framework that could be used to generate new malware samples out of the original marvel samples so his idea of how he would go about doing this was have source code of the malware samples and
train a machine learning model to learn from the source code to generate new malware source code now I had three questions for him the first one is this even legal to do because I am an international student I don't want to get arrested by the FBI or something right so the second question was how would I go finding all these malware sample source code I he used to work at NSA so I figured he might have access to something that I don't but apparently he thought the zoo repository was repository very good by I find enough samples to train a machine learning model turns out it's not true haha and after that he basically joined
another company two weeks into the project and he left me to do whatever I wanted to do with it right so basically I had an idea of a project nothing to go on with that about and I had no idea how I would even complete this internship so one of the main fears that I had was maybe I will get fired along with him but that didn't happen so apparently I had to do something I had to create a project that could be a possible conclusion to the work that he initially began so that is where perfidious came into picture so before we get into what perfidious can do and what it is aim to
do it is important to understand what a P file format looks like right so this is a basic overview of any PE file that might exist on your machine it can be a DLL or it can be an executable files right so we have the boss header we have the da stub we have the rich header which is sort of unknown structure until 2015 we have the PE signature followed by the actual NT header which is composed of file header and the optional header which points to the data directories table followed by section header and then the actual sections which contain all the information that is required for the BFI doron right so a few things to remember these are some of
the concepts that you need to know before you can work with any PE file before you can create a parser sort of a thing for a PE file there's something called a raw address which is the actual address at which a particular byte exists inside a PE file so a PE file is a said nothing but a stream of bytes right you need to know where a particular byte exists that is the raw address of that particular byte raw address space is usually begins from 0 because 0 to the actual file size is the actual file size of the file a virtual address so virtual address is the space where that particular byte will be
loaded in the virtual memory once the PE file is run virtual address space usually starts at that hexadecimal address but that's not necessarily true it's you that address for most PF Isles related virtual address so this is something that the PE file does most addresses inside the P file like the metadata for the PE file and everything has addresses listed as an RV a or relative relative virtual address that is the formula that you can use to calculate the raw address from the RBA if you have the RBA this is important because when you are trying to parse a PE file from the individual structures inside the B file from the B file it is important to know the raw
address of a particular structure only then you can extract that particular structure from the PE file right awesome so this is what the dos header looks like it has come it contains a ton of different information but the most important ones are highlighted in yellow so there's the e magic which is the MZ signature that is present at the end at the beginning of the any PE file you have BC sum which is the checksum of the PE file and the e.l.f Inu which is the pointer to the actual PE header like the beginning of the extended header right the anti header it points to the beginning of the anti header you have the dust-up so dust-up contains odd
dollar terminated string which is usually this program cannot be run in DOS mode now a quick question how many of you all have participated in flare challenges before right so if you know if there was a flare challenge in 2016 which contain an executable like once you reverse engineer that executable you got the flag and the string before the flag was this program cannot not be run in DOS mode right so what it was it was an entire DOS program that was contained inside that p if i right huh so there can be an actual program that this bossed up contains so it is a very important structure then we have the rich header now rich header was
originally unknown until 2015 this data structure like there is sort of a conspiracy which says that this data structure was used by Microsoft to identify various malware groups because since the structure was unknown malware authors basically never thought of changing the data that was present in this structure and this structure basically contains data about the compiler the linker and all those information that all those programs that were used in order to compile this executable so if you forget to change this data structure you basically have lead all the information about the tools that you used in order to create the smaller sample right so this is quite an important data structure to change if you are creating malicious files this is
followed by the PE signature now this marks the beginning of the PE file now according to me you don't really need the data structures that are present before it but because of my backwards compatibility Microsoft hasn't really lost all that data structure this marks the beginning of the PE file P followed by two naal bytes right now next we have the ante header now anti header is composed of the file header followed by the optional header optional header and that points to all the data directory stable now the file header now file header is composed of all these fields now these fields are really important how many of you all know what you w b
stands for yes sir [Music] exactly so u WP stands for Universal Windows platform now what Universal Blood Windows platform means is Microsoft aims at developing a universal API a common API for all its applications that could be used to develop applications for various devices that Microsoft wants you to run its executables on right so for this for for you to be able to do something like that you need to support all those different architectures as well right so the Machine field basically specifies in bytes the machine that this particular executable is meant to run on right so that is followed by the number of sections this specifies the number of sections that will exist inside the PE
executable file now there is a very common technique that you can use to add sections right so this field becomes particularly important to understand because once you once you add or subtract a section from a PE file it is important to change the corresponding value inside this particular section because most endpoint detection systems will try to check for these things like the number of sections actually match this particular field and that is how they can easily recognize whether a particular PE file has been tampered with write the time date Sam how many of you know what that time may means Wow you are old nice so epic time basically marks the beginning of the beer you can see beginning of time for
us right so time/date stamp is basically a time delta between that time and the time when this particular executable was compiled right so using that time data you can basically calculate what this time could be right when this particular executable was compiled but since this field is relatively known most malware authors remember to change this particular time so then we have a pointer to symbol table number of symbols the size of the optional header now this field becomes important because these data structure are present as a stream of bytes right you need to know where one data structure stops and where the other begins so size of optional headers becomes really important in that way then we
have characteristics of the PE file which basically specifies whether it's a DLL or an executable or something like that right now optional header now even though it says that it's an optional header it's not really optional in nature because it has a ton of information that needs to use base of code specifies where the text section of the PE file starts right this is important because you need to know where the code actually starts from right image base specifies the actual address at which this particular executable will be loaded in the virtual memory then we have a ton of those fields and that is followed by a DITA going to be stable which contains the size of the data
directory and the r-va at which a particular data directory will be loaded right so when we come to data directories there are ton of different data directories that are present right these are sort of all the common data directories that might be present we will go through some of them but not all export table export table basically contains information about the exports of this particular exe file right now it's not common for most exe files to export some tear some functions but for dll's it's very common because since it's celebrity it will export a ton of functions that might be loaded by different exe files right so this table basically contains information about all those exported functions import table
now this import table becomes very important in terms of exe files or malware because it gives us import information about the functions that were imported by this particular malicious file next we have the reaso stable now when we talk about P Phi is not all PE files are actually malicious in nature right some of some PE files are actually gains that we might play so these games contain a ton of resources like dialog boxes and maybe images of guns so all those resources need to be accessed when this particular PE file runs so the resource table contains addresses to all those resources located inside the PE file we have the exception table now exception table is usually
something that is not present because it gives you information about what exception will be executed under what condition so this table is usually scrapped from PE files because so it can be used to trigger sort of exceptions that we might figure out from this table right same so how many of you all know what certificate table might have yes exactly code signing certificates right now this becomes important because it says who is the author of this program right who actually signed this or program when it was created now how many of you all have encountered this dialog box which says that this program was not signed by Windows or Microsoft right so this is triggered because the exe or the
DLL that you are trying to load or run was basically not signed by Microsoft as an author right now this can be bypassed you see you have Microsoft's code signing certificate yeah good luck with that yeah so if you maybe found those certificates you can basically sign your executables using that and it would never show up right that dialog box would never show up it would just run so debug table debug table contains a lot of debugging information so sort of the breakpoints and debug debugger information that was used to create this particular file this is also stripped so that you cannot know what kind of breakpoints were used then we have TLS table who can tell me what a TLS table
like what what is the LS table like what would it have yeah exactly thread-local storage right it's not the TLS for the networks it's thread-local storage now why do you need thread-local storage most operating systems most modern operating systems are basically multi-threaded in nature right so you need to be able to store all these information for all these threats separately thread the TLS table basically stores all that information for each and every thread right next we have the import address table now without going too much into what in from input address table is and what it does it's basically a sub structure for the import table and it contains information same as the import local table but once
once it's loaded in memory it contains the actual addresses of all the imported functions right now next we will move on to the section header now section header is sort of the metadata containing structure for all these sections inside the table now it contains name the virtual address of a particular section the size of raw data when it is load when it is present on the actual file system pointer to that raw data and sort of characteristics now who can tell me why we need characteristics for any sections like why would you need that field exactly so one of the common use cases for characteristics is to find out whether a particular section is readable
writable or executable or contains all those three informations or any of those combinations so next we move on to the current code injection techniques that exist for most PD files right now one of the most common code injection techniques is custom section edition now how many of you are done osce before okay so one of the techniques that is taught in that particular certification is how you can add a custom section at the end of the PE file and basically change the entry address of that PE file to point to that section so that that section is executed in instead of the original text section right so what you do is basically you create militia shell
code put that as a section and append it to the end of the P file and then change the end entry address of that of in the inside the PE header to point to that particular section right but it is important to know that you also need to give execute permissions to that section because by default only the text section has the execute permissions for most common PE files right now what are the disadvantages of this kind of an approach now it's very easy to detect for most endpoint detection systems since text section is usually the only section that should have execute permissions it is also difficult to do it correctly since you need to change a
lot of different figures inside the PE file so the ratio of the stealth gained versus the time required for successfully implementing this kind of an injection technique is way too low right since it's easily recognizable the next approach that exists is P code caving right so what we can do is we can find out all the code caves that exist inside the P 5 now who can tell me what a code cave is what could a code cave mean anyone so code caves are basically streams of nulls by null byte that are present inside the PE file which is not actually being used by the PE file right now those null bytes can be used to store
our malicious code right so it's important to understand that this is a really stealthy technique since there would be no way of knowing whether a particular file has been injected with malicious code and it's really difficult to implement now why would it be difficult to implement this technique now what would be the problems that you could face when you are trying to implement injection using this technique exactly so one of the important disadvantages of this technique as you need to first be able to find a code key if that is large enough to store your malicious code what is the next thing that you might face when you are trying to use this injection technique exactly
so it's important to know that this particular code cave needs to be present in a section that has execute permissions right so if that particular section does not have execute permissions your code will not run right even though you found a code cave that is large enough to store this kind of malicious code right so this technique imposes a lot of restrictions in terms of what you can do with it right so why not just edit the dot text section of a PE file why is that not possible text section contains the code right so why not just edit the dot text section what could be the problems that you could face when you're trying to do something
like this say you could do that what is the most basic problem that you could face no you could edit the signature as well the most basic difficulty that you would face when you are trying to do an injection using this technique yes yeah so yeah so that's one of the problems that you could face but the most basic one is that usually take section is the first section that is followed by these section headers so when you change the dot text section you need to make sure that every section that follows it is also changed to accommodate those changes in the text section right all the addresses and everything would change as well so this
is not really possible until now because you don't have complete access to the P executable right you do not have complete access to each and every data structure that is present inside the PE file so this is what perfidious aims to do right what phidias can be used to fingerprint the PE file converted into a class based data structure and Python then you can use a function to directly input the malicious code inside the text section right it would extract the text section place the malicious code using jumps inside the inside the text section and then the text section would or run the graphical user interface but in the background it will also run the
malicious code right it will obviously not function as the same as the original executable because use the code path will be different but you will still get a GUI if there is a GUI application right so that is what it aims to do now the advantages of this approach it will be relatively difficult to identify the shellcode since the shellcode is not present entirely inside the PE file what you have is chunks of independent shellcode that are connected via jumps right so it would be really difficult to develop a signature for such kind of executable right the malicious code itself is split into smaller pieces so you can join those smaller pieces via jumps all the other pea arts of the PE
file are left and relatively unchanged so how would you detect these kind of such an injection technique right one of the most easy technique is to only allow whitelist software under Network right that is the most easy technique second technique that you could use is you can perform dynamic analysis of each PE file that is that is that is on your network that way you can found out find out the real functionality of that PE file the third technique that you can use is graph hash analysis now this is a technique that I found out about recent in Singapore add the hack in the box conference so in the add that conference one of the researchers basically
proposed a technique that could be used to perform a graph hash analysis right you could create a hash value for the control flow graph of a particular PE file right if you create such an hash that hash can be used to compare the the distance between that hash and the hash of the original file and you can find out whether the functionality of the file has been changed right that is how you could basically try to determine whether a particular file has been injected using this technique now I'll give you a small peek into what perfidious does
so this is sort of how you use you'll be able to use perfidious you can import the PE structure from PE class from the perfidious library and then you can say basically import a PE file into that particular structure it will parse that structure and then it will basically create a class class structure of that and then I just go to yeah so what it does is it basically tells you what kind of structures exist inside the PE file and what kind of structures do not and it basically when you say ah stop dot message is basically print prints the message that is present inside the Ross stop structure right now this is what I
aim to do with this particular library so you can basically say that ok I want to change that structure and that particular field to say something else right I wanted to say B sides DC 2019 and then I would then I'll show you that it actually can do that right so the original structure which said this program cannot be run in DOS mode now says besides DC 2019 right so this is what I aim to do with this library I want to make each and every sub structure that is present inside the PE file accessible so that they can be changed and altered however I want right so this is something that I aim to do
the future of this project now I am currently in the process of parsing each and every data structure inside the project I am to complete that by December N and from January I'll be basically starting to find out all the links between every data structure so that when you edit a single data structure the corresponding field inside the actual PE file is also edited to correspond to that data structure once that is done I am trying to the final goal of this project is to develop a program and framework that can be used to make a PE file look like some malicious P file or a PE file from a malicious apt group right so the
techniques used by the apt group can be used to make a malicious file look like that apt group now the implications of such a tool is basically you can make a file from one apt group look like another apt group right that is what the next six months from January will look like like I will start starting to develop a machine learning algorithm that can do that so thank you for your time that's my presentation and any questions so I have these back door and breeches card games for anyone was some good questions behind in them out yes yes so this entire program will be open source in January once I complete the parsing stuff because that is something
that I want to do for myself and learn what the B file structure actually how how each and every data structure interacts with each other but from from January I will be taking on open source volunteers who want to contribute to this project yes yes this seems similar to shelter shelter so I use the project not really but I'll look into that thank you yes
so I'm not really sure about that but from our from what I understand you can basically change the field to represent it being signed by some other yeah so it will break the signature what you'll have to do is when you change the executable you will also have to manually calculate the checksum and each and every field that represents the checksum so there are multiple sections that are present inside the P if I right you need to make sure that your check sums basically correspond to the changes that you have made in the PE file so that is something that I aim to be like I am trying to make a function that can automatically do that do you do that for
you so once you make changes to any of the like any of these sub headers or any of the fields inside those structures the checksum would automatically be calculated but yeah that is yeah so yeah so if you check for the hash with the original program it would it will basically be different right so that is one of the that is been sort of like you only allow in whitelisted software on your network right so that is one of the important setbacks for this procedure process so if you only allow science by Nuri's that you actually verify then you can basically not allow any code injection techniques right you would have software from the original author running on your machine
but not all like not it's that's not true for each and every machine in the world right not everyone does that so yeah yes
so currently I am only working on x86 and x64 types of binaries since that is what I have started until now but yeah once this project gets steamed I might have support for arm as well awesome so oh yes yes so I am still developing the tool but I have been working on this project since start of May yeah sorry yeah currently it's just me since I was the only intern working on this project and I have sort of asked them to allow me to work on this project since I stopped working for Siberian August but yeah they said yeah go ahead no one wants to work here no it's not open source so it will be open source
sometime in January once I complete the passing phase of the project so I need help understanding the linkage between the data structure so that is when I will take on volunteers and for the machine learning part of the project yeah yes [Music] yes so yeah so that was one of the initial implications like that is what my manager told me that why are you creating a tool that could be used destructively but my understanding of this is once I create this framework it could be used to develop even better signatures right so I'll just show it to you one second so if I [Music]
so what I am doing is I am developing checksum for each and every individual data structure inside the PE file so this can be used to develop better signature checking mechanisms so you can check for each enemy today data structure present inside the B file inside instead of the entire PE file right so what you have is you have signatures for every little thing that is present inside the V file this can be used to find out exactly what data structure was tampered with right so that is my aim right to develop a tool that could be used to develop better signatures and better detection mechanisms along with learning [Music]
yeah yeah yeah that can be used as well but say you are splitting the malicious code into only three or maybe two parts right then you will not have as many jumps yeah yes sorry yeah you you have to ask Microsoft about that yeah yeah like why do we even have da stub or dawes header anymore like no one is running das files right at least from what I know yeah so Microsoft is basically trying very hard to make sure that all its programs are backwards compatible and that has essentially led to a structure which which is vulnerable in nature right like how many years has it been since PE files have existed we have not been able
to secure it yet that says something about it yes [Music] yeah so I haven't looked into that I'm only working on PE fights currently but yeah any more questions awesome thank you for our time and a shameless plug so I am looking for full-time cybersecurity entry-level positions starting May 2020 so if any one of you have any openings in mind feel free to share them with me