← All talks

Find Me If You Can! How to Locate a DLL's Unexported Functions

BSides TLV · 202236:58500 viewsPublished 2022-07Watch on YouTube ↗
Speakers
Tags
About this talk
Oryan De Paz demonstrates three techniques for locating and calling unexported DLL functions at runtime without triggering antivirus or EDR monitoring. The talk covers byte-sequence pattern matching, caller-instruction analysis, and call-chain traversal methods, with practical automation using IDA Python scripts across multiple Windows versions.
Show transcript [en]

[Music] [Applause] [Music] [Applause] so first i want to thank to the besides team for organizing such a great conference and for the opportunity and to give for giving me the opportunity to speak here today and thank you for coming [Applause] so nice to meet you i'm muyan i really love windows internals and reverse engineering so we're going to have some fun today and generally learning new things i recently joined the app on my team as a security researcher [Applause] that's my team my they mostly do vulnerability research malware research and many kinds of research basically before that i was a low-level researcher and developer at symantec protecting from active directory attacks so usually i'm detecting malware but

today i will take the attacker's point of view now let's say we want to take over the world and in order to do so we want to load and execute our code on a certain process now when pos says processes office often needs to load their code additional code on demand and in order to do so they can call the win api function load library this function uh not only implements the um not only loads it to the memory but does some more extra work behind the stage so in order for the code in the library to run successfully uh for example it might change it might change the right memory permissions according to the

different sections inside of it uh it can resolve some relative addresses inside that the uh library code uses and it does many many other work preparation before it finally executes the library entry point now let's say that as the attackers we somehow gained access to this process to the ability to run a certain code tiny tiny payload on this process but we want to run some more complicated code so let's load it from a dll file now we can call the load library function but we will have two main problems with it first it is monitored by antiviruses edr products so it might get blocked or raise an alert second this function registers into internal

structures of the operating system such as the loaded modulus that every process has so it might be a bit too loud or noisy if we want to hide ourselves what we can do is implement our own function that will be pretty similar to that will be pretty similar to the load library implement implementation but would skip the noisy part of it so we said that we can't use the load library function and we need to somehow um we need to somehow achieve its functionality by calling other bits of code from unexported functions down its call stack today actually this is exactly what reflective loader does and today we'll see how they do it but first what is a reflective loading

reflective loader so reflective loading is a library injection technique it kind of implements a tiny minimal pe loader itself and eventually it um allocate and execute the payload from within the memory process the process memory now reflective loading is a widely used technique um and it exists in many uh in many open source tools such as blackboard if wait [Laughter] okay so it exists on many open source tools if you want to take a further look at it blackboard is a good example for one but there are really many others and this function is a function form and the blackboard source code i simplified it a bit so we can see easily the function that it

looks for and every get pattern function name basically holds a how it implements how its search for the exported functions in memory today i will focus on the first function there the ldlp and ltla's data but notice that this choice is completely arbitrary you can use any other function you can use this search method that we'll see today in order to locate any other unexported functions you wish to find but first what our exported functions well uh function addresses inside the dll often changes between if the dll and some changes between different windows versions they usually do so [Music] in order to overcome this uh every module holds a table that maps between the function addresses

the function names and their addresses in memory a dc table called exported export table and these functions are called exported functions unexported functions are all the functions that don't appear in this table so we have to look for it ourselves now while we can use exported functions both both statically just importing inside of our code the library and then just call it calling the function leaving the address resolving parts to the compiler or by dynamically get the process uh the function address in memory using the getpork address win api function with unexported functions we can since we they don't appear in this table so how can we allocate them ourselves now it is very important to understand this part

effective loaders are need to resolve this function during runtime during runtime we don't have any symbols we don't have access to the function names we actually have no way to guarantee that we hit the right function or even in the start address of the function so we need to do a lot of research in advance we need to do some homework in order to tell them to tell it how we want to find it how can we find it and be really sure that this is the right address so today we'll explore three different ways to do it then we'll see how we can automate this process uh to make it easier for us to compare

between the search data between the different windows versions and finally we'll be able to choose which one is the best way for our target function why do we need three different ways to locate this function can we just calculate its offset from the model started us well you might ask yourself that but the answer is that it would be very unstable in this case because we don't know in which part of the dll the function will be and it is very like likely that we will have to hold a different offset for every windows version we want something more generic than that but what if we could somehow uh calculate the offset from a a position that is closer to our

function address then it will less likely to be changed this offset so our first method for today would be to locate a unique byte sequence within the boundaries of this function and then calculate our offset to the start address of the function so this is part of our target function code this is the it is the disassembled view for taken from the ida disassembler and it consists of two parts there is the instructions part and there is the actual vital presentation of it in memory now let's take a look at these commands which end their byte sequence which i assure you it is unique and why is it important for it to be unique well

we want to avoid false positives if we will have this uh byte sequence in multiple locations on the dll we might reach the uh we won't hit the wrong function or the wrong address so it is very important for for it to be unique now let's say we locate this by sequence address inside the memory so it's pretty close to the to the start address of the memory and then all we have to do is to decrease from it the offset to the start of the moment of the function so our formula in this case would be to locate a unique by the the by sequence address which in this case would be this address

then we can will be able to uh decrease from it its offset which in our case would be 46 for this windows version and finally we'll get this address which is happen to be the function address i started my way as an automation developer so it was only natural for me to try and automate all this process so let's see how we can automate it so this is ida the uh the view ida main window and it consists it has a built-in python interpreter in every in idapo versions it's it's an actual python interpreter just as we know it and we can run a command inside of it uh which is pretty uh helpful and we'll

see today why we also have two other options for running python scripts on ida we have the script command which is an interface that allows us to run multi-line python commands and we can also import a code from python files or export what we wrote there to uh script files and we can also run it and getting the outputs inside of our python interpreter another option is if you already have a propelled script you can just run it using the escape file now how can we automate our search process uh we said that we have to locate a unique the unique by sequence address and then we want to validate that we hit the right address

because we are doing it before and we are doing only the preparations now we have either symbols and we can know what the uh function address a using its name so we said that we uh we are calculating our function address using this by sequence address and the decreasing format the offset and eventually we want to make sure that we hit the right function so we'll compare it to this uh output uh from this function which is an id api function that allows us to locate a function using its name then all we have to do is to print the right message whether we found it or not once we run it we see that we successfully found

our our function using the search data so that's cool now we want to be able to run it on multiple files because we don't want to manually run it on for every windows version right there too many so how can we do it actually we can use ida and python again how so uh ida allows us to run script from a command line we don't have to open the dll file with the guru and we can easily automate it there are two uh arguments which are relevant relevant for our case a a is for autonomous mode and s is for running script which in our case would be uh a patent ido python script a

this is how the command should look like now we can use this uh command line feature to write another python script and then iterate all the dll files from every windows version and run it using the command line so we will have to adjust our script first to be able to run form a command line so it will take its argument from the command line and instead of printing it to the a built-in interpreter we will have to save our results into a file we also have two of two id api functions one is allows us to wait until the db and all the symbols are loaded successfully and the other one just exits gracefully

from the ida

next now we want to have a separate file another python script that will hold all of our search data for and all of our dll files the search data consists of the by sequence address and its offset just like we saw and now we'll want to open a file with the relevant headers and everything for our output and then we will just iterate the uh every uh file and will execute the cmd command after we run this uh this script this is the results that we get we see that we uh successfully found using this uh search data for uh this function on four different windows versions but that's not good enough we still have five

uh windows versions that we couldn't find our function using there's it's this search data so we have to somewhere some to somehow fix the search data we said that it consists of byte sequence and an offset right so one of them is probably wrong the question is which one of them so we'll go back to our script and this time we can add a function that will allow us to uh debug it and this time if we won't find the sequence the byte sequence address we will print the right message if we did find the by sequence address then the offset might be wrong so let's try to recalculate it and see if where what is the right offset

so i ran this file i took the latest windows version that it didn't worked on and now it says that it found the byte sequence address it didn't printed any message and the error message on it uh but it said tie with with offset 44. so okay let's try to change uh our code all the other functions that all the other windows versions that we couldn't locate our function on and see what happens so these were our first results and this time we got these results so as you as you see we uh found our our target function on two other uh windows versions so that's a progress but we still have those three windows

versions that it didn't worked on so again let's try to want to escape and see what went wrong and this time we see that it couldn't find the buy sequence so we again have to look for another unique by sequence and and then try to run our script again and this time we already found all of our windows versions so we can sum this uh a method up with uh we that we have three uh different combinations for in order to locate our target functions on our target function on all these windows versions so the pros for this method would be that it is the closer that we get to our function uh the more likely the offset

the less likely that the offset would change but what if we have no byte sequence no unique byte sequence inside this function uh the other addresses or boundaries and this leads me to the next method that we have to locate this function so we can still locate a unique byte sequence but we can try to locate the unique by sequence of the function call so this is the the function that calls our target function and this is the call instruction that calls our target function this instruction consists of two parts there is the call instruction represented by e8 and it followed by an offset to the start address of our function which is a relative to the end of this

call instruction but okay why not search for the byte sequence of the call instruction actually i can assure you this byte sequence is probably really not unique it is probably widely used and we need some to find something which we can assure it is a unique byte sequence remember but we can take a look at these commands these instructions prior to this call and actually this is the parameters preparation before the call that it passes to our function so if there by sequence is unique we can try and look for this one for this address then adding to it the uh by sequence plan and eventually take the relative offset from there and find our way to our

target function so our formula in this case would be we'll still have to locate the by sequence address we'll still have to add an offset now a the byte sequence length in this case and now we also have a new offset that we just found that we have to add to our formula so let's say that the by sequence address in this case would be this address then we will add to it to the byte sequence length which is 11 in this case but we also have to add the offset length because we said that the offset is uh relative to the end of the score instruction then we'll add to it this offset that we just found

and eventually we'll get this address which is uh our target function address and if you remember that's exactly the address that we previously saw so that's good we have two different ways to locate the same function now how can we automate this so it should work pretty similar to the first method right because we only added one tiny uh detail so we'll still look for the uh sequence address then we will have to find the a call address extract format the offset and compare it to our function address taking from the symbols because we're in iodine we're offline so that's good then all we have to do is add a tiny flag that will call the right

search function so that's easy and now we'll go to our second script and we will have to hold another list that will hold all the search data for the second method so this time it will look like this pretty similar to the first one and now we'll just change the output file uh to a different name so we can differenti differentiate between the outputs now i run the script and this is the results that we got we see that the buy sequence is pretty similar but it still has some tiny changes and this time we could divide it our search data into two groups so that's good that's more consistent than the first method that we

saw that's a progress okay so what are the pros of this method well it doesn't count on a really random byte to count on the function signature which is less likely to change of course that nothing is uh sure but again what if the byte sequence address of this parameter preparation is not unique so i kind of get tired of relying on bisequences address and uniqueness and so i try to look for another method and this leads me to our third and last method which is to locate an indirect function call a using uh its exported function and going down the call chain uh and then once we get our uh function offset from the call

instruction just like we saw in our second method we will be able to reach our function atlas so again this is the function called our target function and if by now we focused on the byte representation now i want us to focus on the actual instructions but before that let's zoom out for a second we have our target function and the function calls it but what if the function of close it is not exported so there might be there has to be another function who calls it and another function calls it and so on and so on until we reach our exported function which in this case would be elialo diana and of course let's not forget that the

function equals our target function also might call other functions and back to our code now i want us to take a look not only on part of our function code but i want us to take a look at the entire function code and actually i want us to focus on the whole instructions inside of it did you spotted our target function well it's this second call so if we'll step into the second call of this target of the function who calls our target function we can activate the offset just like we did on the previous method again and find our target function but remember this picture well we found a way to locate our target

function but what about the function who calls it and what about the function of cause the function of causes and so on and so on so let's take a look the answer is exactly the same by the way let's take a look uh on the calls flow at the call flow and this is our exported function this is the ldl or dll function just like we saw on the picture if we'll step into the second call instruction inside of it we'll get to this internal function then we'll step into the third call then we step into another call and another call and this one also should uh probably already look familiar because this is the target to calls

uh the function calls our target function so if we'll go down this call chain we'll have a way to locate our function without uh using any by sequences by sequence addresses once we've found the function address we can call it from we can simply call it from our memory a formal code so now let's take a look how we can automate it this one is a bit different than the first two methods so we can write another script or add another function or whatever works for you and now we have to get all the code call the call instruction inside the memory and try to find the function sorry try to find the function and get its

index then we said that we have to look for it uh for the entire change so we'll have to hold the list of uh every caller and colleague uh couples uh of this uh whole chain so we will be able to uh to calculate the all the uh all these process then we will just iterate it and execute of course the same function if we print the result to the python interpreter this is the result that we get but now let's see how it how this method work compares to other windows version so now we will go back to our second script the one that runs on multiple dls and we will take care of the e output file

and save no sorry this is uh sorry this is the first script and we need to adjust it to write the output to a csv file um then we will iterate uh only the dls we don't have extra data in this case and this is the output that will get after running it on all the windows versions so we see that the first couple is pretty stable it's almost uh always the second function call the second couple and the third couple also very stable seems to be like they seem to uh not change at all um the first the fourth one again and uh our uh caller function that calls our target function also very stable and it's also

always the second function that's being called but the problem starts here in this line we can see that this function is very inconsistent it's also the both the function a index is changed and also the amount of the call instruction inside of this function are also often changed so it probably means that this function is changes openly and we don't really want to count on it so if we'll take a closer look and we'll try to sum it up if we would look for any one of these first four internal functions we could successfully use this method to locate them but starting from this function that's starting to make some troubles we can really rely on it

so let's uh sum this method uh well it doesn't rely on rely on any byte sequence address no uniqueness no random good but we have to remember that the first we go down the call chain uh the chances that we won't be able to find our function or that it would be stable decreases so [Music] it is better for us to use this method to locate functions that are closer to the export function also it relies on other functions that don't really relate to us so just also not so good and now let's talk about the stability between all these uh versions we add the buy sequence address that and calculate from it the opt to the start

function we saw that it has three combinations but we had better options using two so we had better options uh the direct function call using locating the a bit sequence address of the parameters preparation went really great we had only two search data tuples and eventually we also saw that the last method of uh going down the uh function uh code chain could also be successful but we prefer uh we can't really rely on it so we have three different ways to locate this function and for our target function the the best way to locate it as we saw according to the result that we just got the best way to locate it is the direct

function call but it is very important to say that there is no right or wrong method here because as you can see it can really change between different functions and it very depends on what is the target function that you are looking for so my conclusion is i use the automation to for every target function you're looking for in order to find the best way that suits your needs now we have not one but three different ways to locate unexported functions again use the python scripts to find what's better works for you oh and that's a great question waldo well of course you can try it at home if you wish to try and learn more about it

you can use any one of these resources which is a very good point to start with and i will also try to i will also post this resources link to twitter account to my twitter account we'll have a twitter thread if you have any additional resources feel free to share them and of course the github repository with all the scripts that we saw today will be uploaded later on today i also will be posted on twitter that's my twitter and they'll uh we don't have really time for questions due to time constraints but for any uh questions feel free to contact me either on twitter or now or whenever you feel like and that's it thank you

[Music] [Applause] you