
next up uh from uh starting it well right about now um we've got one of our longer 50 50 minute talks and I'm very happy to introduce you all to Pim jarbach who is going to be uh giving a talk called Smoke loader the Pandora's box of tricks payloads and anti-analysis he's a reverse engineer so everyone give a big welcome to Bim can you all hear me okay cool all right all right so quick introduction to myself my name is Pym I am a reverse engineer at proofpoint my main focus is on equine botnet so things like ammo Tech cubot ice ID and so on but every once in a while I'll get the random apt request but I'll be asked to reverse engineer some nation-state Trojans and whatnot I'm a member of the cryptolamus team so for those of you that don't know what that is it's a team of roughly 25 researchers all around the world where we for the last three or four years have been fighting with the botnet ammo tet and I've done a mix of reverse engineering for them and software development to do automated malware processing uh my background is in computer science I got a degree from Lewis and Clark uh in computer science um and my first job out of college was a software development role and I had a deep interest in malware at the time but I didn't have any formal training on reverse engineering so I kind of wanted to combine those two things and I started to get pretty good at reverse engineering and I also really enjoyed writing code so the for me the kind of nice blend of these two was Network protocols and specifically malware Network protocols that's kind of what I specialize in for malware analysis and there's my Twitter and GitHub okay so for the agenda today we'll be talking about uh we'll be going into the history of smoke loader and we'll be getting into the first stage that will analyze the set the final stage and then I'll talk about how I actually achieved a fully static config extraction for this malware family we'll be going into the communication protocol that it uses to communicate with the command and control and then we will talk about the payloads that I actually received from this botnet so what exactly is smoke loader a smoke loader is a piece of malware that is classified as a loader and this basically means that its entire job is to deliver additional malware so you can kind of think of it like the UPS system where people can just send malware through it um it first appeared in 2014 it targets solely windows it's around a 30 kilobyte payload which is pretty small generally you see them around like 100 to 150 to 200 and this malware is actually written in C and assembly now pure assembly is not something you really see in malware too much but in this case which I'll get into later they actually have to do a good chunk of this development in assembly so while smoke loader is a loader it has additional plug-ins that kind of extend its capabilities that'll be for data exfiltration and just additional actions on objective and whatnot and from a reverse engineering perspective people really like to reverse engineer smoke loader because it's highly obfuscated there's things that smoke loader does that people don't see in other malware families um and since it was since it first came out in 2014 it's actually has had continued development throughout its life cycle so they generally have a round day update every year or two where they add additional features the entire package if you wanted to purchase the panel a bot and all the plugins and everything it would run you about sixteen hundred dollars and what's interesting here is there is a check where it's make sure that the machine that it's infected is not a Russian machine and this is something that you cannot remove this is like hard-coded within the bot everything else you can modify but this is a check that is not allowed to be bypassed and finally this is a multi-stage malware so it has that first stage and then if everything goes well with the first stage then it will get onto the final stage of the malware wow that does not look good um all right this talk might be tough with some of these diagrams um but this is basically the listing that they have on the Forum where smoke loader is being sold you can go and see what they're actually what they're advertising and what modules they have and it's really nice from reverse engineering perspective when you can just go and see what features they have makes my job a little bit easier okay so the current set of plugins that smoke loader actually has is a form grabber and this is advertised it's really just stealing credentials that are sent in HTTP post requests and whatnot I'm not sure the efficacy of how any of these work I haven't reverse engineered these plugins then it has a password sniffer which is just going to stiff sniff the network traffic for like FTP credentials and various other credentials the low hanging fruit and then it has a remote PC which basically acts as TeamViewer so it's not something where they have like their own session they actually join the session of the user and then they have a fake DNS plugin so if you wanted to redirect all traffic from google.com to some to your own IP you can do that this doesn't work with SSL so it's just purely HTTP traffic and then for they have a file search module so you can basically give it a regex and it will find all the files that match on the host and then send them back and then it has a procmon module which was kind of interesting frockman is generally a tool that people will use for incident response and whatnot but in this case they actually use it to they can basically Define events and actions to happen when specific processes are created and then they have a DDOS module a standard keylogger and then the email Grabber which is just going to steal the Outlook address book and whatnot so how is smoke loader actually used let's say I was in the market and I want to buy a botnet I want to start my own but I don't have the dev skills to write my own so I'm going to see this smoke loader and it's I decide that it's the one that I want to purchase so basically I'm going to get the C2 panel a bot and all the modules and I set up my panel and I start infecting machines let's say somehow I was able to infect 300 400 500 machines then I can go to all my friends and be like hey do you guys have malware that you want to deliver to all these machines that I have infected and they'll be like heck yeah why not and I can basically say well if I'll install your malware to 100 machines for a hundred dollars so it's basically this as I mentioned before it's really this delivery Network and you can have a single bot that is tasked to deliver 50 plus malware samples which just with with just its initial check into the command and control so this process is generally referred to as selling loads so the other big malware in this family of loaders is going to be private loader basically does the same thing where they sell a service for installing your malware x amount of times but this model has flaws it's something that's kind of used by lower or mid-tier criminals because a lot of these hosts that are infected they're infected with like 30 40 different remote access Trojans info Steelers so you really have cases where they're all exfiltrating the same set of credentials over and over that people have seen for the last 10 years so the data you're actually getting is not going to be very valuable but if you just need like raw compute power for DDOS or stuff then I guess this could be a viable botnet to use okay so now we'll get into the operational model so let's say I'm the admin and I have a couple partners from the command and control server I'm able to send the following to the Smoke loader bot I can send the plugins if I purchase them I could send actual raw executables I can send executables that are encrypted with rc4 and I can actually send URLs that point to clear text executable so it's a way for instructing the bot to download payloads from a remote servers basically so in the actual listing of the smoke loader Forum post they actually say you have to crypt the panel they specifically say like this is not fud like this is a sample that you need to like uh pack basically so for those that don't know uh packing is basically if you were to think of your malware sample as a onion you just add another layer to it of encryption or compression or something and this is basically to defeat basic antivirus and things like virustotal checks and so on so let's say we have a pack of smoke floater sample and that's the sample that actually lives on disk here so then that is going to get unpacked to our smoke loader our initial stage of smoke loader and then if all the checks pass in that stage then we are finally going to get within memory the unpacked final smoke loader stage okay so with this understanding of this botnet and having me talk with a bunch of other friends and them saying like smoke loader you know it like delivers tons of payloads it would be really cool to see what they're actually delivering I was like well like I have access to a bunch of malware feeds but what if I want to basically turn this delivery network of infections into a delivery network of intelligence data basically like I wanted to turn smoke water on its head and really turn it into a passive intelligence gathering tool so this is the process that I came up with so there's a stage one for smoke loader the initial stage that is going to decompress the final stage of smoke loader and then from that we need to instruct all the configuration details so the command and control servers the encryption Keys the versions and whatnot and then we need to understand the network protocol because we need to be able to write a client for this botnet that can communicate with the command control without actually causing infections we basically just want to save off all the payloads that were sent so now we'll get into the analysis of smoke loader stage one so stage one is really where all the interesting obfuscation of the malware lives the functionality of this is really just to check if the host is a viable victim for the botnet so it checks the Locale of the machine just to make sure it's not doesn't have like a Cyrillic keyboard or something like that it checks for sandbox artifacts virtualization processes and then if all this checks paths it's going to decrypt and decompress the actual bot now some of the main obfuscation techniques that smoke loader uses are going to be opaque predicates and then it actually has this technique called runtime function decryption and a a slew of other anti-disassembly tricks that's not great either okay um so getting into opaque predicates I just want to give a quick introduction to what those are so in the top we have our incorrect disassembly so for those that can't see there's basically a instruction that is a jump not zero and it points to location X in memory and then there is a jump zero instruction which points to the same location X now as humans we can see that a jump not zero and a jump zero is going to cover all of your cases it's basically like if you were to write code if you had a conditional that said if true else if false like it's a condition that is always going to happen so we can easily see that this jump is always going to be taken but disassemblers can't realize this they don't they don't have the ability to process this Boolean logic so the disassembler takes the byte immediately after that jump zero and starts disassembling from there but that's actually the incorrect implementation so in the bottom we have the correct disassembly where I told the disassembler don't disassemble from this location disassemble from this one so that's where we can actually see in the bottom here it's doing a pop ECX and then doing a jump that is actually the correct flow and this is not something that is going to have any effect on the malware it's not going to slow it down or cause any issues with its execution this is purely just to make make the lives of reverse Engineers more difficult yikes okay so now we have the runtime function decryption so basically all the functions that are of interest to smoke loader it encrypts their function bodies so normally when you have your source code let's say you're writing a python application you have your function call and then you can just read the source code and see what's happening smoke loader doesn't allow you to do that so basically what it does is when it's calling a function that it has deemed important or basically eighty percent of its functions are encrypted it gets a reference to its current instruction pointer and it gets the size that it needs to decrypt or encrypt and then when the function is called it will decrypt the rest of the body and then at the end we actually have another call that will encrypt the rest of the body back up so from a static analysis perspective this makes the malware really tough to look at because you're not going to look at valid you're not going to be looking at valid assembly instructions you're going to be looking at encrypted code so the only way to really statically reverse engineer smoke loader is to add additional scripting on top of it so how does smoke loader actually Implement that so in the bottom here on the decryption results we actually have a function that I threw in idapro's decompiler after I did a bunch of work to decrypt the function body and you can see at the top there they have the decrypt code body call and then the rest of that code is basically what I was able to decrypt and then at the end they have another call to decrypt code body which in this case actually encrypts the rest of the body back up so even if you were to take a memory jump a smoke loader you would never have a snapshot in time where all the functions were decrypted so you really have to go the python approach and manually parse these function bodies so in the case of smoke loader it does this by xoring the body with a single byte xor key so in this case they use EF um so the next thing they do that I found interesting was they have a way to get obscure string references so they have a call instruction here that does a call to location four zero two two four six and that basically does a jump to uh the address after these strings now for those that don't know a call instruction in assembly what it really does is it pushes the following address onto the stack and then it does a jump that's all the call instruction does but immediately after they do this call instruction they have a pop into the the ESI register so it's basically giving ESI the address of that sbie dll which is the sandboxy dll it's a common tool that people can use for analyzing Windows processes and malware so this is just one way that they references reference strings indirectly and this also actually breaks disassembly and The Idler Pro decompiler so just another thing to make static reverse engineering more difficult so now getting into the actual execution flow of what stage one does so the first check that it does is it checks if there's a debugger attached and it does this by reading the process environment block which I'll talk about later and then it loads two dlls kernel 32 and user32 and if it detects that it has a Cyrillic keyboard then the malware will exit but if that check passes it's going to load ADV API 32 and shell 32 and then it actually does something interesting it takes ntdll which is kind of your main Windows deal all your lowest level dll and it copies it to the temp directory and loads it from there and this is a technique that malware uses to evade EDR systems because EDR systems commonly look to see if ntdll is being loaded directly so so that they can place their hooks into the functions so in this case it's a attempt to bypass that um and then they have some basic checks just to see the check its own file name so if you like if it's sampled.bin or virus.bin or something the malware is not going to execute and it checks if there are dlls loaded within its own process that relate to sandbox and then it checks if there are VM processes run so it checks for like virtualbox parallels VMware and so on and then finally if all those checks happen it has a a if statement basically where it will if the host is a 64-bit system versus a 32-bit system then it will decrypt and decompress the 64-bit payload otherwise it will do the 32-bit payload so now if we were to think about this smoke loader initial stage in memory it's basically broken up into three chunks so we have the top chunk being our smoke loader stage one then our second chunk is going to be our 32-bit final stage and then the final bit is going to be our 64-bit so how does smoke loader actually extract this final stage within itself so it decrypts it with a 4 byte xor key which I have in the python implementation there and then it actually uses a algorithm called lzsa to decompress now for those that know things about compression algorithms they're incredibly difficult to implement um and I was actually able to find an implementation of this decompression algorithm but it's in a raw assembly which makes it kind of difficult because you can't really call that from any python bindings and you can't really there's no C implementation no nothing so it makes analysis even more difficult there but let's say we were able to decrypt and decompress our payload and now we have our final stage of smoke loader so the final stage this is basically what the final stage looks like in a hex editor now if you notice there is no PE header so this is not a valid Windows executable so how does this load well basically it uh it functions as Shell Code so it's position independent code where it needs to be able to resolve its own access because normally when you have a Windows executable you can rely on the Windows loader that is going to properly load your executable memory and make sure that you can make all the function calls that you need to make with shell code you don't have that feature basically so this final stage needs to be able to resolve those things all by itself so the main thing or the main features that this final stage really has is just C2 Communications and to inject and deliver or to inject and receive these payloads that's all it really does at the end of the day so since it's a Shell Code or effectively Shell Code it needs to initialize its main client so it needs a couple things that it there's a couple things that has to do that normally you would rely on the Windows loader for so it needs to find the correct dll handle so this basically gives it access to the libraries that it needs and then from those libraries it needs to figure out the functions that it wants to call so it needs to be able to find all those addresses for functions and then finally it needs IPS and domains to communicate with because it needs to check into the C2 to get its list of tasks um and then we it needs the ability to gather host information and in this case that information is sent to the command and control server or then the admin of the panel is able to filter their Bots by the various information that has been sent okay so how does it actually so how does the smoke loader sample actually able to do this well it reads something called the PEB or the process environment block and it's the structure that is present within all windows processes and there's and normally it's really for like additional metadata and debugging purposes but in this specific case malware authors love it for this one particular field this ldr field here and that's basically a pointer to another struct and that struct is this smaller one on the you are right and that basically has a list of all the dlls that are loaded within the windows process so as you're unpacking your malware sampl