← All talks

"Introduction to Malware Development in C#" by Jayden Caelli, BSides Canberra 2023

BSides Canberra · 202322:47638 viewsPublished 2023-10Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
Learn how to build basic malware in C# and how to bypass modern AV and EDR products. Jayden Caelli I am a penetration tester at SilentGrid. My main focus during this position is a focus on malware development.
Show transcript [en]

um our next speaker is a camber local we have Jaden KY he's going to take us through an introduction to malware development in C got some fans over there I can all right so um let's all give a round of applause to

Jaden hi my name is Jaden I'm a penetration tester at salent grid and today I'm going to talk about how to go about starting to develop Windows based malware in particular using C what I'll do is show you how to build a basic piece of malware then show you some techniques that can be used to byass antivirus and EDR Solutions this talk will focus mainly on developing a type of mware called a loader on Windows in short a loader is a piece of malware that runs other mware in memory for example it could take malware generated by C2 framework say Cobalt strike or metas and run in memory loaders can run all various types of

malware including pure assembly AKA Shell Code a Windows executable I.E a Windows PE or a net assembly this talk specifically will focus on running Shell Code in memory also known as a Shell Code Runner the reason why this type of mway is useful is that it's much easier to build a loader that can run your favorite C2 framework from scratch and get that to bypass AV than building a C2 framework yourself so first it'll help to have some back information on What's called the windows apis so the win32 API is an API built by Microsoft to interact with Theos if you were to build a windows-based application this is the API you'd be expected to

use this uh this API is officially documented by Microsoft and is included in various DLS including kernel 32 kernel bass and user 32 on the lower level these apis will call what are called the native apis the native API is again an API created by Microsoft to interact with the OS but importantly it's what transitions from user mode the privilege level of applications to Kernel mode the high privilege mode where actions actually performed these are not officially documented by Microsoft as they're not intended to be used and mostly reside in ntdll this is a diagram showing the relationship between the apis so say for example I have an application that wants to access another process I could do

this by calling open process wi 32 API in kernel 32 which would call open process in kernel base which in turn would call the native API NTI opening process in ndl this API would then perform the Cisco instruction switching from user mode to Kernel mode the kernel does its stuff and then returns to Native API which returns to Kernel base Etc so what's c c c is a language that utilizes net net itself is a development platform platform with the Net Framework being a Windows specific implementation of this platform it runs via what is called the common language runtime and because of that we would say it's managed code what this means practically speaking is that we can't just directly

call unmanaged code I.E the win 32 apis C itself is a high level language so makes it very easy to code in compared to say C or C++ there are some interesting things you can do specifically with C in terms of development I.E the reflection apis and in general it's much easier to debug than say C if you were to start trying to develop C mweb I would highly recommend using Microsoft Visual Studio Community Edition this is an IDE designed to build and run various windows-based languages including C okay so how do we actually build a Shell Code runner in general the steps that we need to perform are to First obtain an decrypt a payload

that is the application needs to somehow obtain our payload it could be stored within the application itself or it could be downloaded from the internet most of the time it'll be encrypted so we need to decrypt it secondly we need to allocate memory in general there are permissions that stop you from executing code in just a random memory region so we need to allocate some memory where we can control the permissions thirdly we need to copy our payload into this new memory region we allocated depending on how exactly we set up our allocating our allocation of memory we may need to switch permissions on it to all us actually execute the code stored there I'll go over why this

is useful later and finally we need to execute that is somehow get the CPU to start running the code that's stored in the memory region we allocated so let's say for example we sted a payload in a bite array we call win32 Api virtual outlock to do the memory allocation to request memory at least the size of our payload with read write execute our WX permissions we use the inbuilt method marshall. copy to copy the payload to his new memory region then finally we use create Fred to execute the code stored in this memory region firstly we can generate Shell Code using msf fenom here it just spawns calc when run so to do the next step we need to

use a winf 2 API however I just said that you can't Direct l call unmanaged code for manage code so how do we do this a part of the do framework is this thing called a platform invoke API just as designed to allow developers to import external functions from unmanaged DLS I.E to win2 apis to do so is pretty simple you just use a DL import keyword and what's called the function prototype as being the function parameters and return parameters here we do this to run virtual aloc firstly we import it see here we're using the DL import keyword to say import a function called virtual alloc from kernel 32 with it's imported we then call virtual alloc with our

Shell Code for size and rewrite execute permissions as a memory permissions to set once we've done that we use the inbuilt method mar. copy to copy the shell code to the allocated unmanaged memory finally we import the wind phpi create threed and then we create the thread on our new memory region when we do this this our new Fred will get killed because our main Fred does not know to wait for other Fred to finish so to make this happen we can use the win32 API wait for single object point out to our newly created threed and say wait for an unlimited amount of time being ZX ffffff this is only one example of running shell C of memory many more

variations on this idea exist with different ways of allocating memory storing the payload running the payload and executing that can include executing code and other processes other than this one called remote process injection doing this you would even inject into an already running process or spawn a new one in general you should prioritize doing self injection unless you have a good reason not to the reason being is that performing remote processed stuff is actually quite obsc heavy however you may for example be using a l bin and want to inject into another process to increase your obsc if you were to build a previous example would get pretty much detected by every a that exists the reason for

this being is that our msf Venom payload is embedded in our binary a quick binary search will find it so our first point of call in terms of evading detection is from the static point of view how can we avoid this all we need to do is encrypt the Shell Code note that this doesn't have to be good encryption IE as but really anything that changes the payload I.E Exel Cesar Cipher or one I came up with involving Fibonacci sequence in my example to each bite of a Shell Code I simply add its index Fibonacci Sequence number and then in Reverse I just minus that number from each bite when we decrypt our payload we have

an unencrypted copy of our Shell Code in memory this can to be picked up by memory sweeps and or event correlations so what we want to do is minimize the number of copies of our unencrypted Shell Code appearing either in memory or being used by function we always have to have at least one copy of our unencrypted Shell Code in the allocated memory region but we can minimize the number of copies appearing Elsewhere One way we can do this is to write the Shell Code in random chunks that is we break the Shell Code into random chunks individually decrypt each individual chunk and write that to the memory region so here in this example I created

a word uh random chunk map where it's index map to size for example index Z is of size 4 bytes index 4 is of size 5 bytes then when I go to copy the Shell Code into new memory region I first randomize the list of chunks pick each chunk at one at a time and individually decrypt that chunk and then write it to the memory region another way our uh our Shell Code Runner could be caught is through the detection of a high entropy section where we stored our encrypted payload now has a high entropy as you can see in this image this has raised the entropy of aex section to near maximum being eight and the tool detected easy shows

that that shows us that there's something packed there on the right is a standard Windows binary where we can see the binary entropy of the teex section is roughly six so how can we avoid or reduce the entropy of our Shell Code one idea or one method is just to add a bunch of repeated bites or words words around our Shell Code to reduce its entropy however some AV engines are able to detect us and still find the hidden encrypted payload a method that I like to use is to encode each bite as a random English word for example bite zero meaning pineapple I then put a stop word in between each bite so I know how to

decode it at runtime so for example my payload would become pineapple andic con blueberry on the left you can see my payad string with end cone being the stop word in between each bite and below that we can see a word to bite map when I decode it all I need to do is convert these words into byes and use that in my payload uh in my shell code Runner as you can see on the right this has reduced my entropy to the point that no longer detected as packed another issue with our example is the use of rwx memory in general rwx memory is is rare and can be an indicator of malicious activity we

should instead first allocate read write memory then change permissions to RX the way the way to do this is by using the virtual protect wi 32 API to change it from RX as you can see in the example below we use Virtual alloc to First allocate read write memory copy the Shell Code over to this memory region then use Virtual protector switch it to RX another way our show code Runner be caught is through the import of unmanaged apis I.E the wif 2 apis if you would look at our example we have create threed virtual aloc wait for single object and virtual protect this is a classic import table for a Shell Code runner in a DOT assmbly these Imports

are stored in What's called the import map table like the one below so how can we avoid this we can avoid this by dynamically finding the location of these apis using two different wind 2 apis get module handle and get proc address in short get module handle will obtain a handle to load module I.E kernel 32 Gap Prock address obtains the location of a function given its name and a handle to the loaded module with this we can use what are called delegates in short a delegate is something that allows us to convert from unmanaged to managed code at runtime what does us us do is create a function where we later specify its location to

run it here we use this trick for virtual alloc on the first image we are first importing get module handle and get proc address and then we are creating a delegate with the same function prototype as virtual aloc in the picture below we use get module handle and get proc address to gain to find the memory address of virtual aloc we then use the function Marshall doget Delegate for function pointer to create a delegate of type virtual aloc Dell with the uh at the location of virtual alloc after we've done this we can then just treat it as if it was a virtual alloc function itself this removes entries from the inle map table but does include two get module handle

and get proc address AV and EDF products like to track calls to win 32 API and Native apis one method of doing this is hooking loaded modules socaled userland hooking roughly this works by the AV or EDR engine injecting a d into each process it wants to monitor this DL will override the first instruction on any apis that it wants to monitor to replace them to be instead a jump instruction into its own DL practically what this means is when we use the Wind 32 API or native API the AV or EDI engine will first check if our code is malicious and kill us before we can even call a function here is an example of anti open

process and anti DL with no hooks here we we can see the first instruction is Mau r10 rcx in the image below we can see the first instruction is instead a jump to random memory address being a memory address of the AVS or edrs D so how can we defeat these so these hook these hooks exist in copies of a DLS that we have loaded into our own process what that means is we can control their contents what we can do is override the contents of these hooks with a clean version of the win32 API or native API technically these hooks exist in once called the teex section of the D as this is where the executable code is

stored so how do we actually do this so firstly we need to find the location of any hooked DL in our process via call to get module information this location is also termed a base address then we open a clean version of a DL from disk via F via file mapping apis doing it this way way it just makes the pointer arithmetic easier to copy the text section after we've done that we pause the hook DL and find where the hooks are the text section we then change the memory permissions to rwx Via call to Virtual protect then we copy a text section from the clean DL over the hooked version of text and the hooked

DL we should do this for any potentially hook D that we use most commonly this is anti D but can also include a wif 2 API DL like kernel Bas and kernel 32 here in this first image we're obtaining what's called The Base address of the ndl loaded module via call to get module information then here we are mapping a hopefully clean version of NTD from disk into our process using the file mapping apis then thirdly we need to find the location of a do Tex section the way we do this is read the file headers of the D find the number of entries in the section table iterate over each entry in the section table to find the one with a

name text below is a very basic overview of what the PE format looks like once we finally found the entry with the name text we can then find the location with the do text section by adding The Base address and the virtual address of the teex section this will be the same relative address in both the hooked and clean DL once we have this we then change the memory permissions of the hooked D into read write execute we then use the win fre2 API M Copy to copy the clean code over the hooked code and then finally we change permissions back to RX on the ndl loaded module etw so etw stands for event tracing for Windows it's a component

component of Windows that allows applications to send events to a central store called an etw provider what's this allows applications to register and view the events written to these stores or provider including EDR products which heavily use etw for monitoring there is a specific net uh there is a specific store used by the net runtime which includes information such as assembly name and methods here I've run the tool silk etw the capture ew events when I run the tool seat belt in memory as we can see it captured 361 events including the name of the assembly seat belt so how can we bypass this well the very first thing we need to know is that

net puts entries into the net Provider by calling the native API etw event right what we can do is set up what's called a hardware breakpoint to immediately return When the net runtime attempts to execute this function stopping the event from ever being written because we never run the function in short a hardware breakpoint is a breakpoint that we can set up when certain conditions are met during the execution of a process or thread I.E when the process goes to execute a particular function in this way we can make it if the process goes to execute the address the execution of the program is stopped and an exception is raised the location of where we set up

this information is in the debug registers dr0 to3 and dr7 in short what this means is when the donet runtime goes to run the etw event R we can modify What's called the context of the particular thread or process and make it return to the callers address so how do we do this so first we need to Define what's called an exception Handler when an exception is passed in we need to check and see if the exception was caused because the process tried to execute the function etw event right after we have that we need to register this exception Handler within our process via a call to the win32 API aded exception Handler with that we then need to obtain the actual

memory address of etw event right which we can do using get module handle and get proc address with this information we then set up the debug registers where the address of ew event right is in dr0 and we set up execution star break points in dr7 and enable dr0 as a location to check so here's the Handler firstly it obtains the current context of the process and the exception record what we do is check the exception record and see if it was caused by the process trying to execute etw event right if so then we the we grab the return address from the stack make this the instruction pointer and then tell the process to continue

running here is the setup for the bypass so first we add we registered this exception Handler by qu to advected exception Handler we then obtain the address of etw event right and then we enable break points in dr0 to3 and dr7 here is an example of using the debug registers in our case we're using DRZ to hold a location of etw event rate we set bits 16 to 31 and dr7 to 0 till able execution style breakpoints and we specifically set bit 0 to one to enable dr0 as a location to check and the last method of bypassing a I'll go through is through what's called control flow graph manipulation also sometimes called sub routine

ofation in this I mean a control flow graph of a program is basically what function calls what function with what parameters I found in practice making minor changes to this either by splitting up a function or changing whether a function takes variables or uses local variables makes a big difference so here are two examples on the left is examples that we're getting detected on the right is non detected in the first example I'm using EXO decryption and to bypass this detection I simply made a new function called Exel bite and use that in my Exel decryption effectively splitting it into two methods the one below that is an example of me using AES encryption where I pass

in the key and IV as parameters to the function to bypass detection in this instance all I did was make the parameters be local variables instead some final thoughts so loaders on the most bang for bark malware AV evasion is based mainly on doing custom stuff not necessarily the most leite methods however using well established techniques in a custom way is best always prioritize manual ofation because a lot of automated Packers or Crypts will just easily get you caught incrementally add bypasses as you need don't just randomly bypass things without reason for example I built a loader that bypassed a certain EDR product and then I decided to try and add an MC bypass to it adding this MZ

bypass made my loader get detected so in that case actually makes more sense not to bypass Amy each AV and EDR product is different so will require different loaders 0% detection is dead and so having a library of different variations is very helpful thanks for [Applause] listening