← All talks

Manalyze: A Static Analyzer for PE Executables

BSides Belfast · 201731:26151 viewsPublished 2017-10Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
BSides Belfast 2017
Show transcript [en]

thank you very much for being here and coming to see my talk about a program that I calm analyzed so it's a static analyzer 4p executables so first I'd like to say a few words about Who I am so I work for Paris based company called a CMS to national and I'm really not going to go on too long about what we do basically just run up the mail security stuff services etc we are two two-man team I work with attic doors which you probably already follow on Twitter since we're a very small company we do get a lot of lead weights you work on some open source projects and we releases a lot of code on github so other projects

that worked on our apk track which is an Android app to trap to track your updates for Android software for people who don't have the Play Store do not want to use any Google services like I do there is also maybe something that's going to be more relevant to your interests which is called which is what I call freedom fighting it's a representative for pentesting scripts it contains something like a stealthy lot cleaner reverse encrypted Python shell that also supports TTY so stuff geared towards a red teaming apart from that I do I do operate a tour exit node it's located in France and I've learned yesterday that a French exit not operator has just been not arrested but

she's been questioned by the police because he his note had been used to access some kind of child pornography case I don't know so this might be my last talk ever before I go to jail as well hopefully not and finally I'm not CISSP which is a thoroughly or some kind of running joke on Twitter so I thought I'd just mention it so a few words about the project origin so manualized stands for malware and analyze I learned at later date that's in English it has also a whole different meaning which I didn't know about but the project was always named this way so I didn't want to change it again I started working on this in

February 2014 when I was working at a Security Operations Center there were mainly two reasons for my starting this project first was my personal annoyance raishin @a antivirus software Oh packet decisions what I mean by that is that I used to have hard drives full with malware samples that I downloaded here and there the whole wide web and regularly that every time I would plug the disk I would be very fearful that the antivirus software would try to delete everything on it and I was very annoying at the same time every time I would download the program from the internet sometime it would it would get deleted and I wouldn't have any explanation as to why the program was

flagged as being suspicious and it was kind of frustrating so imagine you're downloading a keychain for something four dollar application that region that you've tracked for maybe an hour of your life that you're never getting back and as soon as you've gotten your hands on it it's just get deleted with no form of explanation I just wanted to know what were the reasons behind my antivirus software decisions so there was also another concern that I had in my job was that a lot of the tasks that I had to do were increasingly growing repetitive so every time I would receive a malware sample I would first drag the sample into PID to see if there was a unknown

Packer I would then upload the file to virustotal etc etc so I was wondering why can't there be a single tool that does all this and as I couldn't find any I just decided to write one so just to give you a quick overview of the project it's a free open source software so it's reason that the terms of the GPL v3 it's a tool that was written in C++ and it's available on both Windows Linux and I think it probably works on Mac OS as well even though I've never tested it what it's used for is to perform initial assessment of PE files so the objective is to be able to quickly figure out

whether one sample is worth investigating closer or whether it's like a known quantity that you can discard it almost immediately so the way it works is it's going to generate a report containing weak signals that hint at the final behavior and I would like to point out immediately that's the way it works is it's not a drop in av replacement it's definitely not some program that you're going to be able to give to your end users and expect them to be able to take decisions based on what the program outputs it's developed for InfoSec professionals and it's not a replacement for human intelligence right intelligence either it so what it does is it gathers as much information as it

can on a particular file and presents that information to you and it expects the user the experts user to be able to understand this information and be able to take a decision based on whatever was collected so it shows you a lot of information and you figure out whether you want to open the find an Ida or drag it into the queue sandbox etc this morning we've had a very interesting talk from IDI about dynamic analysis I don't want to go into too much detail details about the differences between static and dynamic analysis I'm sure most of you have already used some experience in the field so the only thing I'm going to say is that static

and Isis has its shortcomings but it also yields very quick wins from time to time so I think it would be a shame not to take 20 seconds to look at the outside correct characteristics of a program and see if we can already infer its behavior just from there so the tool only performs setting and Isis and does not do any sort of dynamic execution or not even symbolic in the execution it's just looking at the way the file structure and trying to guess what it does from there but the architecture of the program so the idea is simple you get a lot of input files maybe thousands or hundreds of thousands of different files that you know nothing about and

you feed them into a parser the P parser is going to dissect the file and offer all the information about the file to a whole set of plugins now I'd like to go into much more details later and each of the this plugins going to generate some individual output that populates a report in the end I also embedded the yarra engine which I think most of you know as well that can be used by the plugins to perform additional tasks and finally after the report has been generated it goes you know outside in an output form a line which is going to format the text into either plain text or human readable output or JSON for other tools if you

need to use them so the way this talk is structured is I'm going to go over each different parts of the this architecture to talk about the different reasons and technical technological choices that made developing this tool so the first and core part of this program is the PD parser so I before before starting to work on this project I looked at the different parts that were already developed and that I could reuse so I looked particularly into a PE file in Python and there were also a couple of C++ libraries but all of those were ultimately rejected PD file was I thought a bit too slow and I wanted to be able to scale a lot and handle

thousands of files in a matter of minutes so this is why I didn't want to go to the Python way and the C++ level I found looked like they were very and safely written and I didn't want to risk using them so I decided to write my own which might be a bad idea you let me know the problem is that PE files are extremely complex so there is an official specifications for PE files you can download it from Microsoft's website but the thing is this specification is sometimes quite cryptic it's not very clear and moreover the windows as loader that actually parses the PE files for the OS is quite liked so it's going to

accept a lot of files that are not that did not adhere strictly to the specification and writing a parser that's that's that performs tricks parsing is going to reject a lot of malware that would actually be able to execute on a normal system so there also there's another difficulty is that since we are going to be handling malware which is by definition very interested in put we know that there is a significant chance that the input that we're going to use is going to try to actively attack the parser so there have been a couple of bugs in either Pro in the past and other analysis tools that were attacked by the malware in order to

prevent analysts for from reversing the programs and looking into it so there is a very nice presentation from the black hat us 2011 where people from reversing labs discussed all the different pitfalls that they had identified while looking at the P specification and the way they could maybe crash an Isis tools so yeah the idea is that I wrote my own P parser I did in C++ as well so you're probably thinking that is this is a very stupid idea because never a C++ is a memory and safe language all I can say is that C++ is fast and I did my best to program very defensively I also tried fuzzing the programs for maybe something

like to mount with AFL and well-known fuzzing tool from Google didn't find any crashes I used gel West Indies handcrafted PE file so you wrote minimize take PE samples that are that do work on Windows but not exactly valid as well all I can say is I couldn't find a lot of crashes well I couldn't find the questions that are in the program I open the bug bounty so if anyone can you find a PE executable that triggers a crash in my parser feel free to send it to me and there's a bug bounty on it it's not a lot but just come out of my personal pocket money so please be understanding oh nice there's a PowerPoint crash I'll

have to investigate that later probably or the VM is shutting off that's nice ok please bear with me for a second

yeah and of course that windows updates as well it's just been triggered okay I'll do my best to continue with other slides while everything is rebooting so there has been a Chinese researcher who's been posing the application as well he's found two crashes in the past so it's definitely not a foolproof but yeah please please try to find the blogs in the application maybe you'll get a the rewards well hopefully it won't be too easy I'm so sorry about this

yeah so we're back

yeah so finally I talked about the speed I was able to go through a whole virus share release with the tool and was able to go through it in about 10 minutes so parsing about sixty eight gigs of eatables in a few minutes I think it's quite nice the keys is that a lot of the files in the virus share archives are and actually not be executables so they just get rejected as soon as the analysis starts another thing is that all the analysis plugins were turned off so obviously this speeds up analysis a bit so a few words about Yarra which is a parent searching tool written by Victor Alvarez from virus tool you all

know about Jaros oh I'm going to say is it's I use a slightly modified version in man lies the reason I do this is that they had their own PE parts for to do stuff like looking for patterns at the entry point or looking for patterns at the specific section and it felt really bad having two different beauty parlors in the same project even having one is already quite shaky to me so I removed the the processor entirely and put mine in that place I I did C++ wrappers as well and removed all the non library code now I'm going to talk a bit about all the different plugins that are implemented inside this tool so the

first one is the this what I do is I simply apply Clannad e-signatures clan ID doesn't need to be installed on the machine all you have to do is download signature files run a Python script that converts them into yarra rules and then apply those rules automatically with the program so the signatures are not distributed with the with manual eyes because obviously they change every day but there's a Python script that just allows you to download them and use them immediately they also have the family uses other kinds of databases that are HDTV and NDB files are just no big list of hashes of known executables and sections that I don't use but maybe I will at some point I

also have a resource nicest plugin which looks at all resources our mechanisms that I used to buy PD files to embark arbitrary contents so looking at those those resources usually leads useful information its magazine mechanisms that malware is liable to use to drop other executables or files that they might be needing to use like your configuration files etc so just by looking at the resources of a PE file like are they impossibly encrypted or compressed is the P file composed of more than 75% resources etc in which case we might infer that this is maybe a dropper we can get some interesting information about what the P might be doing on the machine so of course if you use the

program called a resource hacker that works only on Windows as far as I know you can use my eyes to extract resources on any OS I also have a PE plug-in which just applies PID signatures yeah I've decided to spread them into two different plugins one that's dedicated to compiler detection and the other one to Packer detection but accurate detection plugins also looks for well-known section names so if it's a if there is a section called dot VN pro tect we know that the Packer is possibly the unprotected cetera so it's useful information the ID hasn't been maintained for I think a few years so I'm not sure how relevant this there are signatures still are but maybe I'll just

remove them at some points so far I haven't had any complaints I have a strings plug-in which does a very straightforward thing of looking for well-known strings that indicates and possible malicious behavior like reference to system tools like Reggie did task manager etc etc debugger process names references to address icky registry keys the WMI there are also a lot of virtual machine detection techniques that rely on looking for specific strings in the OS so if you look for those strings in the malware as well if the strings are not obfuscated you're going to know immediately that it's trying to be something fishy basically every time there is a new apt report that comes out I'd brush through it look for

groundbreaking new invasion techniques and add a few lines to - this plugin to possibly detects and you know newer implementations I also have a cryptography detection plug-in that is directly inspired from the either Pro plug-in of fine crypt I think so what it does is it looks for the particular cryptographic constants you used in particular algorithms I also look for the oids that I used to reference other algorithms in third-party IP is the way I did this was simply to download you know some random crypto library on the Internet I looked for all arrays of integral integers like this I'm not sure you can if you can read it properly from where you stand but the idea is I built some

big error rules that look for all those constants and now I'm able to detect md4 md5 sha-1 cetera et cetera AES inside binary so knowing that a binary is using cryptography is always something interesting because it can either mean that there is some of obfuscation going on or even that the programs are run somewhere we'll see some examples they also have a backward detection plug-in which I mentioned a bit earlier there is a whitelist of section names I look for possibly encrypted or write alone executable sections I look that the number of imports is reasonable so PE files that have less than 10 imports are usually a little bit suspicious and require further inspection apart from that there

are also funny in custom senses that are caused by some Packer so this is a funny one that I discovered a while ago was there was this Packer that would compress every resource in the program but it wouldn't update the size of the resources inside the PE headers so if you were to sum up to sum all the sizes of all the different resources you would end up with a size bigger than the size of the PE itself so you would immediately know that the file has impact something not groundbreaking but it's always nice to have is important Isis plugin so I'm going to look for combinations of imports that would hint at the program's behavior so for

instance everyone knows about the virtual log by process memory and craveable thread that's used to inject code processes I look for networking functions process service and registry manipulation api's and also cutting process imports that can be used for code injection in particular process Halloween power loader and item bonding which are well known techniques to perform code injection but all rely on very specific assets of api's there is a small example given here at the end of this slide again I'm not sure if you can read but basically this is a program that imports crate process shall execute and also create a trade file and get ten paths so it doesn't require a lot of thinking to figure out that this is a

program that probably creates a file in the temp folder and tries to execute it so most likely dropper behavior one of the future works I'd like to work on is be able being able to resolve imports that are loaded dynamically with get proc get truck address and load library with the capstone engine and I don't know if that's possible but I'd like to try one of the most recent additions is a Bitcoin plugin so it just looks for Bitcoin addresses in a binary so what it does is go through all the possible addresses and then tries to validate the Bitcoin addresses have a very specific structure the first four bytes I think from memory the sha-256 of the rest of

the address so it's quite easy to figure out whether the address is actually a very Bitcoin address or not so this is the output of the tool for the wanna cry ransomware so you can see immediately that even though it pretends to come from my crystal corporation it contains three different Bitcoin addresses and if you were to google those Bitcoin addresses you would find reports for a link to the wanna cry attacks in anything so this is a quite this is a way to you to to gain time and perform the analysis very very quickly finally I have this authentical plug-in that checks whether the digital signatures of the PE is valid if I ever

find a binary that's claiming to come from Adobe or Oracle or Google and if it's not signed then immediately I can raise an alert and say that okay wait second this is supposed to come from a well-known publisher but there was no digital snake sir signature what's up with that so far the the plug-in is mostly only available in Windows I have a UNIX version that works but it relies on open SSL which is a you know and horrible to work with on Linux so I can look at I can display the certificate issue but I cannot check the digital signature at the moment so there are other problems so how do we check the digital the

digital signature of program on Linux when you don't have Windows at rest so there is another issue which is Microsoft executive hours often known by hash in the security catalog of the OS and they are not even signed themselves so is there a way I can export that and into the tool I have no idea finally there's a virustotal plugin that just submits the hash to virustotal and returns the new result so all I can say is only the file hashes submitted if some of you are very privacy conscious and you do need to register on virustotal account if you want to use this plugin because they won't let you use their api without a key by the way

each plug-in that I present it is absolutely optional so you don't have to use one of them if you don't like it especially if you don't want to send queries to virus although you really don't have to finally I created a portal which is a an online portal for people who want to try out the tool so you can just submit executables online or either the P file directly or a link to it and you will get the report immediately the samples are not shared with anyone and the only one with access to the server but basically you we don't have to trust me if you don't want to you can just download the source code from github and

run it locally so I'm just going to give you a quick look at how this looks like so here you've got a summary the pony output this is I still wanna cry or rent somewhere so there are strings related to yes just 32 Bitcoin addresses in is a few imports there are quite suspicious etc it's a trial also you know the virus total score which is a 58 out of 62 which is a lot there's a tab for discussions if you share your findings and then the hopi structure but I'm not going to go over that because it's more specialized one thing that I'm quite proud of with this tool is that it's quite easy to use

and reuse there's one thing I hate more than anything is when I try to use an open-source tool and see that it's it's a mess to compile I can't get it to work that dependencies are missing so I really did my best to make Menai very easy to install in just apt-get install the dependencies git clone the code from github see make make and that's it it works you don't have if you want extend the tool you don't need to understand the core you can just use the API which is I think quite well documented and just interact with the people without having to understand the structure being beneath it you can also use the pea

parser and take it out of the project and use it in your own one of the examples I made available online is using this P parser to write your own process hollowing code so you can find all the resources on my blog a few words about the future works so one the most the most the things I would like to do most is getting that authentical plug-in to work on Linux I would like to work on icon recognition for so if for instance in executable files has a PDF document icon I think this is something we can detect automatically and raise alerts for very quickly I mentioned wanting to use capstone engine to resolve imports

and for as for the web portal I would love to be able to work a bit on some more bigger data capabilities like having a search engine within people of search for malware based on section names or any characteristics you want and finally possibly Python bindings in the future I've started working on this but it's far from ready yet then I'm just going to do a short demonstration of the tool to illustrate how it can help people during their analyzes so this is a sample coming from the apt one CSI campaign from 2014 it's quite all but I really like this example because it just works perfectly it's not always the case to be honest about this but when it

works it's really really nice so let me launch just colonize this of this program

and I'm just going to launch all the plugins

I hope yeah then we go so I have to go up a bit so here we see the compilation days which was when we received the file in the Sun I worked at it was somewhere around January 17th so we knew it was a very fresh train there was reference system Chinese language inside the PE file at the time there were there weren't any clan Meili signatures but there is now so we see here that it's creating processes and creating files in the temporary folder yeah this is possibly a dropper because when you look at the resources then one of the resources is a PDF document which is weird why would the P file and embark in

a PDF file who knows there is also another P executable embarked inside the file and all you know the resource is a month for 93% of the executive also mostly this is a file whose only role is to drop all the files yeah constants related to this as well so there was a strange obfuscation being performed and now there are lots of lots of different signatures for an entire software so what we can do from there is extract all the resources so you look at them very closer you I create a folder called out

I'm going to go to the bracele folder and look at what's inside it

absolutely there's alcohol or so what's in there is a PDF documents the icon of the of the pin file which is a PDF icon this PDF was simply decoy document that would be opened when the user clicked on the file didn't make them believe that it had you know open the an actual PDF document I didn't work and here there was a second executive which was dropped on the system and we seemed executed so I'm going to perform the analysis on this section in the second program

now go

see here we see so they're the refers to the Chinese language are not present anymore there are still constants related to this and now the interesting import of this program or create process Internet's open Internet open URL etc so we can easily infer that the first part was the dropper and the second part is actually the implant is going to perform a download and execute we're going to be looking for commands on a remote server and just executing them and going and also a lot of signature so that's it for the example one last thing I would like to mention is if you want to look at the pea structure in more details you can just

dump any any part of the structure you want so there you go you have all the imports the different sections etc etc and if there is any part of this that you want to reuse in another tool you can just use the JSON output formats

and there and you can feed it to another to like so I think that's it and be happy to answer any questions otherwise the project is available on github and feel free to download it tweak it and send me back reports thank you very much [Applause]

I haven't counted them I think it's something like two or three gigs files that's not much but I only keep executables I rejected everything else I think most people are not very interested in using the portal for anything else that just testing the tool so people will upload one or two files to see how it works and whether it's interesting and then if they are satisfied with it I assume that you download the code and run it locally because they don't have to send me anything which is perfectly okay as far as I'm concerned anyone else yep yeah it's quite easy in the sense that I've written a lot of documentation there are lots well maybe not a lot of

sample plugins that all the code of the other plugins is I think mostly clear I've had there is one example and then you can use the other plug-in as a reference so you hopefully it should be quite easy for you but if you ever encounter any problems then it means I have done my job wrong and the documentation is not good enough so be sure to let me know is you have to subclass a class and then implement a single method and then you can interact with the people through the API but that's it you just have to implement a single method well maybe two or three methods because you have to provide the plug-in name apart from this

you just drop in your plugin inside the default program folder just drop the Esso or the other file inside the same folder as memorize and it's just going to be detected in automatically okay well thank you very much for your attention