
the next speakers are also very active in the local community and they deliver talks in DC a couple of times and one time each of them and today they are going to be talking about hooks in 64-bits it's extremely interesting talk but before you can start fine
[Applause] show me hi everyone we are safe until then we're here to talk about the project we did in the past few months a nickname The Cook's the purpose of this project is to monitor the native execution inside of all 64 processors in order to gain some visibility into certain viruses inside of such processes so we'll start with a quick introduction I know then I'm a dancer a gymnast nacrobat they started competing with gymnastics a kid and danced for many years recent years I'm a circus acrobat they did you perform aerial silks and Lyra in my spare time I'm a security researcher at San Pierre one who I have done this project and with me here is a
staff who is also a security researcher at San Pio one and let me introduce himself so hi everyone I am a self and I'm a security researcher 29:1 before we start with the project itself let's let's have a bit of background in Avery's and security products in general want to monitor every things going on in the system and what's happening inside its inner process one of the most common ways to do that is by using user mode hooks placed on some interesting API functions such as those that dealing with virtual memory operations or inter-process communications major disadvantage of this hook is that they can be bypassed pretty easily using lots of different techniques we won't talk about most of
these techniques we'll focus on a certain family of viruses the target the you are 64 mechanism so a couple of words about this mechanism when Microsoft started developing the 64-bit operating system they faced a bit of a problem exists in 32-bit applications can't natively run on top of 64-bit operating systems and will just break Microsoft didn't wanna break all these applications so they had to find some solution to allow them to run the solution they came up with is called 1264 which is naturally for Windows and Windows 64 it's a compatibility layer that allows 32-bit applications to run on top of the 64 bit rating systems as if they were running inside of a 32-bit system this layer is
basically just made of a few a 64-bit DLL loaded into 32-bit applications and it has lots of different applications across the system some of these implications such as file system and registry direction aren't we development for this project and there are pretty well documented so we won't talk about them and implication we'll talk about has to do with system called handling whenever process wants to make a request from the operating system they do that through a mechanism called a system call most of these system calls are dispatched through a dedicated system DLL which is loaded into every Windows process called ntdll while 64 processes have two versions of this anti DLL a 32-bit one and a 64-bit one and let's
see why it looks like here on the Left we have the 32-bit version of ntdll and which receives requests from the application these API functions in the DLL don't really do much except for forwarding the request to the ah-64 layer the slayerfest first has to make the transition to 64-bit mode and they do that through the jump instruction we have in the middle this is a unique jump instruction for jump to code segment 33 which makes the processor on the hardware level switch from 32-bit to 64-bit execution mode afterwards the you are 64 layer has to make a few more adjustments to the request such as pointer size expansion from 32 bits to 64 bits a calling
convention adjustments as the convention is different on 64 bit and a couple more stuff when the request is complete the one 64 layer forwards it to an API function in the 64 bit ntdll which can actually dispatch the request to the Oscar know through our dedicated CPU instruction now let's talk about these vices as I mentioned in the beginning there is a whole family of bypasses the target the you are 64 mechanism they all basically work by calling 64 bit API functions without going through the 32-bit API functions or the or 64 layer the most well-known and most commonly used bypass uses of this family is called heaven's gate it was first published about 10 years ago and was
seen quite commonly in the wild in lots of different types of Malwa if we haven't been technique works by making the transition to 64-bit a mode on its own making of the required adjustments to the request and then directly a call in 64-bit API functions found in a 64-bit ntdll now why is that hoped by this technique because while 64 processors are effectively 32-bit processes user mode hooks are anomic are normally placed on 32-bit API functions with the assumption that the 64 bit functions will not be called indirectly [Music] this is normally - except for cases where techniques such as heaven's gate are used and they cause they create a sort of a blind spot for a vs which will
miss most calls than using such techniques so we want to address this blind spot and do that by you hooking the 64 bit API functions themselves this sounds easy but in reality it's not that much we first need to find a way to run or 64-bit code inside of our 64 processes which are effectively 32-bit processors will do that by injecting or 64 MIDI ll into these processes then we can't just inject any 64 bit DLL because 64-bit code faces lots of limitations inside of all 64 processors and eventually we actually want to hook our API functions so for that we'll need a hook in engine that is actually capable of hooking 64-bit code in this processes
now we'll start with the injection okay so just like Lauren said the first task that one must accomplish is finding an injection method which will allow us to run our own 64-bit code inside the context of the target process so generally speaking there are lots of different injection methods which are available at our disposal but unfortunately most of them are not going to be very beneficial to us and the reason for that is that most injection techniques will only allow us to inject dll's which has the same business as the target process so that means either injecting 64-bit code into a 64-bit processors or 32-bit code into 32-bit processes and what we needed in the course of this research is
something a little bit more unique and is basically the ability to inject 64-bit DLL into valve 1264 processes which just like Alden said they are effectively 32-bit processes so not every inject the Methodist capable of doing Greece okay in the course of this talk we are going to describe three relatively known methods which are capable of injecting 64-bit errors in 1264 processes and we will start with the relatively easy stuff and gradually work our way through to more complicated their techniques and we will conclude this part by talking about the technique called injection via a PC and we will see that while this technique works great on some of the older Windows platforms it would probably break when
you try to run it in more modern systems so after we will get a firm understanding of why this failure happens we will be able to present two new variations developed by us during the course of this research which basically allows a PC injection to function correctly even on the latest release of Windows 10 for example so let's start with the easy stuff so the first technique we are going to talk about revolves around hijacking of a dll 912 64 log and this technique was discovered several years ago by a security researcher named Wally dasa which I think he's a is opposed to the BDS movement but other than that is a pretty cool guy so yeah ok so what is
this dll anyway and so whenever you launch a new of 64 process in the system an integral part of its initialization phase will try to look up and load the 64-bit DLL name of 64 log directly from the system 32 directory and so what happens is this is that this DLL is not shipped as part of the standard Windows installation which makes us and many other security researchers believe that this dll is actually only used internally by Microsoft in order to aid debugging or diagnostics of 1264 applications and in fact we can leverage the fact that this dll is not shipped in order to gain code injection and the way we do this is by simply
waiting our DLL renaming it 1264 log making it export a specific set a function and then just dropping it to the system32 directory and once we have done so well the other will be like magically loaded into everyone 64 poses in the system thus we've gained our code injection so overall really easy and straightforward the DLL hijacking ok and the second technique we are going to talk about takes advantage of Heaven's Gate so if you recall not long ago we all then talked about Heaven's Gate and basically described it as a hook evasion technique which is used used mostly by malicious actors in order to bypass hooks inserted by security product but reality is sometimes a little bit more
complex than that and there is usually a very fine line which separates Marvel from security products and I think this is one of the cases in which the very same technique can be beneficial for the attacker as well as the defender so with that in mind let's see how an AV for example might benefit from Heaven's Gate so what we should know is that every one 64 process has in it two distinct image low-dose 1 implementing the 32-bit until the other which can only load additional 32-bit images and another one implemented in the 64-bit entity error which can only load additional 64-bit images now for the sake of argument let's assume that we already have 32-bit
code running inside the target process again if we're talking about the security solution such as an AV this is a valid assumption which usually holds so this 32-bit code normally can only access the 32-bit image image loader which isn't really getting us anywhere because we can't use it to load our desired 64-bit payload but what is 32-bit code can do is to use techniques such as seventh gate in order to transition itself into 64-bit mode and once it's done so it basically opens up the possibility of using the 64-bit loader for example by calling the 64-bit version of a function called LDR load DLL and we can use this function to load our desired payload so to summarize we
are talking about the two phase injection method here in which in the first phase we inject a 32-bit dealer use this dll to transition into 64-bit mode and then load our 64-bit additional 64-bit a dealer ok the third and last injection method to describe is something called an injection via a PC so before we dive into all of the nitty-gritty details of how this injection is actually implemented a couple of thoughts about a pcs in general are usually in place so an APC is an acronym which stands for asynchronous procedural call and it's basically a mechanism built into the Windows kernel which allows us to take a particular routine and then queue it to a thread of our choice and what is
mechanism guarantees is that at some point in the future the target thread will stop whatever it is currently doing and execute our routine instead so in Windows a PCS come in several flavors and in the course of this talk we're only going to describe one specific kind called user mode a PCS user mode a pcs have some distinct characteristics to them first and foremost the code that is about going to run the target of the APC will run with user mode permissions unlike kernel mode permissions the second one is that the APC itself will only fire or trigger once the target thread enters something called an alert able wait state that happens for example when the thread goes to sleep wait for
synchronization object something like that and the third key point to notice is that all user mode a pieces are handled by one single function called K IPC dispatcher which is exported from ntdll so this function will basically make several preparations and then we just follow the call to the real target of the APC which the programmer intended to call yeah an AP C's can be cute both from user mode and from camera mode we are going to use the kernel mode variation and yeah so now let's talk about injection so injection via a PC is probably the most popular injection method used by AV solutions as well by some intelligence agencies as you can see and they basically do this by
queuing in a PC which ultimately calls either LD all of the ll overload library or one of its variants and most of them use this technique to inject the DLL which has the same business as the target process but what we should keep in mind is that in one 64 process a pcs can run both 32-bit code or 64-bit code and in some cases for example when we cue our IPC from kernel ma driver and the default will be to 118 64-bit mode so that makes them great candidate for loading additional 64-bit the errors okay what we can see here is some kernel mode code which implements this technique can Ahmad inject injection of DLS from the kernel
mode rival so several key takeaways from this slide first of all the target of the APC the function which executes in the context of an APC must conform to a standard prototype which was predefined by Microsoft so as you can see all of these functions must return void and receive free pointers as their input arguments now this raises the question what happens if the function that I wish to call has a different prototype so to overcome this problem we use some some sort of a trick and the trick basically says that we use the first parameter the normal context parameter to pass in a user-defined structure and inside the structure we can encapsulate both the address of the
function that I really wish to call as well as the values of all the parameters that I wish to forward into it so in our case because we are dealing with injectors we are going to pass the address of the 64-bit version of LD all of the error as well as the name of the dll that we wish to inject so if you take a closer look at this function in its entirely you can see that it basically just translates one function prototype into another and this is the reason we chose to term it an adapter func this is not an official terminology of any kind but this is the term that we are going to use for the rest of the
talk so keep that in mind and now because the type of the APC we're using is a user mode APC the adapter func itself must reside in memory which is accessible to user mode in order to make sure it does we first have to allocate a user mode buffer which is both executable as well suitable and after we've done so we can simply copy into this buffer the actual code bytes which comprise the adapter func after we do this we can simply initialize a user mode APC which points to the buffer we have allocated and queue it to the APC queue of the target file we can repeat this entire procedure for every a newly
created web 64's run in the system and because we know that every thread empties its APC queue as part of standard for initialization this basically guarantees that our dll will be ultimately loaded into every 164 process in the system and so what we did is we have taken the source code that I've just shown you compiled it into a driver loaded it into a Windows 7 system and indeed as you can see the injection walked this plan we have a 32-bit notepad X the process which loaded our 64-bit the other and ok so after some time one of the one of us said it would be like a good idea to test the same solution on some more recent Windows
platforms so we decided to test it on Windows 10 and then we got some very different results what we found out is that most processes to which we try to inject our the other simply crashed so after a little bit of probing on Twitter we bumped into this twit you can see here from Alex ionesco and from this to it we learned like took effect the first one is that he encountered probably the very same problem just something like I don't know to an air-filled before us and the second one is that the crash probably relates to CFG so before we can start thinking about possible solutions to the problem we first have to understand what zfg is what are its
implications on the system and how it is related to well 64
so CFG is an acronym for a control flow guard it's a relatively new experimentation technique first introduced by Microsoft in Windows 8.1 and later enhanced in Windows 10 its features it's meant that is meant to combat memory corruption vulnerabilities by preventing indirect calls to non legitimate call targets such a compiler enabled mitigation it works by inserting an additional call before every indirect call into a validation routine this validation routine receives the address that is about to be called in the interactive and checks whether it's a legitimate call target or not if it is the function returns and the execution continues normally if the function decides that the address does not make of alcohol target then it will crash the process
what are these val call targets for images these are defined as start addresses of functions every image compared with CFG exports a list of its functions and whether each of them makes of alcohol target or not the for private memory allocations we don't have a list of where the functions are so this system just marks the whole buffer as valid colt idea just to be safe these whenever new executive memory is introduced into a process the system marks the marks which parts of the process make of out call target to do that we have a new memory area inside inside the process called a CFG bitmap this is a relatively large memory area where each bit marks eight bytes in the
process address space and shows me whether these eight bytes make of alcohol target or not the validation routine checks this safety bitmap for each address it receives to see whether it's about call target now let's take a look at this validation 15 M this is l dfp validate user call target a function found inside of ntdll and i say it receives the address it is about to be called indirectly and makes the rate calculations required to see to find the beat in safety bitmap that marks this address if the bit is set meaning this address is a valid call target we take the left branch and the function returns if the bit is not set
it means that the address is not a valid call target so it takes the right branch which will eventually cross the process and result in this cords call stack that we took from our crash dump in Windows 10 now on all 64 processes CFG gets a tiny bit more complicated while 64 processes host both 32-bit code and 64-bit code so we also have to safety bitmaps native CFG bitmap that mark 64-bit code in the process and our 64-bit map that marks 32-bit code as you might remember these processes also have two versions of anti DLL a 32-bit and 64-bit one so this works nicely with the to safety bitmaps the validation routine in the 64-bit
ntdll and checks addresses against native bitmap and invalidation routine 32-bit ntdll checks addresses against the our 64 bit map a new the cuticle memory introduced into these processors only gets marked in one of these two bitmaps so how does this system make the decision of which bitmap to Mark's new addresses in with that we have the kernel function mi select CFG bitmap which gets called whenever a new memo new executive memory is introduced it it chooses the correct safety bitmap to mark this memory in based on a few simple checks first the process has to actually be about 64 process if it's a native process we should only have a single native CFG bitmap so we'll only
so will always choose that one then if the process is our 64 one which app whether the address is above or below a four gigabytes if the address is above four gigabytes it will get marked in the native safety bitmap but that shouldn't happen very often because in one 64 processes the whole a memory above 4 gigabytes is actually reserved and we can't allocate it or access it so this check should almost always be true then the final check is whether this memory is relocation or a part of an image if this segment parameter is null meaning the memory is part of a private memory location and in that case it will always get marked in v1 64 CFG bitmap if this
memory is part of an image the function checks whether it's a 32-bit or 64-bit image and chooses the bitmap accordingly now how does that all relate to our problem I'm our problem was that we tried to inject our DLL using an EPC injector we actually managed to puree PC successfully to pro 1264 processes but we took a look at our crash dump and saw that we got two queries or EPC dispatcher the function that should dispatch or a PC and call our APC target but before it managed to call our adapter thunk this function checked whether it's a valid or invalid call Tania decided that it's an invalid call target and crashed our processes but why
did that happen as you might recall with Sony my select safety bitmap that all private memory locations below four gigabytes inside of all 64 processes are marked in V was 64 safety bitmap now the function that should handle our APC is the 64 bit ki user a PCB structure found in 64-bit ntdll so our validation routine tested the address against the native safety bitmap because our adapters and is marked in v1 64 and not the native safety bitmap this caused our processes to crash it seems like a dead end but we didn't want to give up on our APC injector just yet kind of liked it so we tried looking for a solution to our problem and in this case I'll do
what I usually do in such cases and ask us after solve a problem thank you okay so in order to fix our IPC injector we must somehow make the adaptil func occupy the native CFG bitmap so we took another look at em I select CFG bitmap and started to see what we can do about it so what we realize is that all we need to do is to make one of the sub conditions inside the if statement evaluate to false so at first we decided to tackle the first sub condition because it seemed like the easiest one to manipulate so the first sub condition actually checks whether or not the comment process is a native 64-bit process so it basically
does this does this by probing the 1264 process member of the relevant it was a structure if this pointer is set to null the kernel thinks the process is a native 64-bit process otherwise it is splitted as 1264 process so with that in mind we came into some sort of a solution in which right before we allocate member with our adapter func we simply go into this pointer and zero it out or nullified and this will basically make mi selective G bitmap it will trick it into thinking that we are allocating private executable memory in a native 64-bit process and thus our adapter func will be marked in the native CFG bitmap of course that after memory so the
adapter func has been allocated we should restore the original value of this pointer otherwise bad things might happen so to summarize this technique basically works by temporarily native eyes in 164 process and while it does do the trick it suffers from some serious downsides for mainly because they pose a structure that has to be modified is largely undocumented and changes often between Windows releases and so we try to look up for an alternative solution and preferably one which doesn't require making modifications to undocumented counter structures so at this point we decided like to mentally zoom out and really think about our problem in the first place so our problem is caused by the fact that our IPC injector uses an
adaptive func and we know that since the adaptil func is a private memory allocation it will be marked in the well 64 CFG bitmap now if you recall the only reason we have an adapter func in the first place is basically to act as a middleman which just forwards calls to the 64-bit version of El Diablo DLL and we know that the 64-bit version of El Diablo DLL is marked in the native CFG bitmap so this made us wonder why do we need an adapter func at all or to put it in other words why can't we just initialize our APC to call el Diablo dll directly so on the surface of it it looks like we
have some sort of problem on the right-hand side you can see the common function prototype which is which is shared by all user mode a PC routines we have seen this one before and like I said all of these functions receive free pointers as their input arguments and on the left hand side we can see the prototype of El Diablo DLL which happens to accept expect four arguments and so at the very least we have you know some sort of a function prototyping consistency issue and so does this mean we have reached the dead end not yet and what might actually work in our favor is is the x64 calling convention and like some of you might
know already in x64 almost every function received its first four arguments we are registers the first argument is passed in the r6 like easter the second one in our DX the third in r8 and the fourth one in our nine so that means that if we for example we initialize an APC which directly calls El Diablo dealer we will have direct control over the values of the first three arguments unfortunately we won't have any direct control over the value of the fourth parameter and what will happen is that by the time al Diablo the error will be called whatever whatever value happens to be loaded into our nine at this very moment will be interpreted
as the missing as the fourth so called the missing parameter so now let's take a closer look at a yellow dll and specifically at its off parameter so as you can see the fourth parameter is annotated as an output parameter and basically it's a pointer to which the function will write to the base address of the dll that it is just loaded and this actually makes things a little bit more complicated because it means that not any value loaded into our nine will suit our needs more specifically it means that our nine has to be a pointer to a writable memory location and moreover this memory location has to be some sort of a scratch space because we
cannot overwrite important information in the process without risking in crashing so the million dollar question is what value will be held by our nine by the time L diallo dll gets called so in order to answer this question we had to perform some reverse engineering 1k with a PC dispatcher and indeed after some reversing what we found out is that right before ok we see the special for words the call to the target of the APC which again is a yellow DLL in our case it will load our line with a pointer to something called a context structure so what is this context Luxur it's basically a memory block which holds the CPU state that is
about to be restored or resumed once the APC dispatching process has finished now like we've said LD allodial is going to interpolate our nine assets off parameter and this basically means it is going to overwrite the first eight bytes of the structure with the with the base address of the DLL it is loaded so generally this doesn't sound like a very good idea because we are basically overwriting saved CPU state that is about to be resumed so doesn't sound like a good idea but luckily for us in x64 the first few members of every context truck sure don't actually hold members which are CPU related as you can see all these p1 home P to home and so on are not really
CPU related and this basically means we can overwrite them safely because we know they want to be used later by by anti continue to actually restore the context so to summarize what we found out is that we can get rid of the adapter funk entirely and just initialize our IPC to point to a yellow DLL so we've modified our APC injector accordingly compiled it into a driver tested it this time on the Windows 10 machine and as you can see here this time it worked we managed to inject our 64 bit DLL 21264 process which is safe GL
so now that we have a working injection method or like three and a half we can move on to the second part of this project which is the hooking we actually start this project by wanting to hook 64-bit API functions inside of our 64 processes so to do that we need a work in hooking engine we chose to use a method used by most token engines called inline hooks this method works by overwriting the prologue of the hook function with a jump into a code buffer we allocated in the process called the trampoline list and pulling forward the call to our detail function found in the DLL we've just injected into the process this little function can basically do
what everyone wanted to do and will at some point call back into a trampoline trampoline will execute the missing instructions the ones that we wrote over in the hook function and jump back into the hook function to allow it to continue its normal execution if there are lot of different hooking engines that use this technique but none of those who inspected can actually hook 64-bit code inside of 32-bit processes the main reason for that is that basically almost any code written for Windows makes use of some core and win32 dll's that implement most of the win32 api this DLL such as kernel32 a user 32 kernel-based cetera exists in almost every process but their 64 bit version
is just not loaded into one 64 processes and because of different limitations we normally can't load them in either so we have to make do without them which basically leaves us with just the native ntdll to work with so to create a hulking engine that can actually work in such an environment we chose one of the engines we inspected earlier and started stripping off its dependencies other than the native ntdll the first and major step was reimplemented win32 api functions we had to implement all API functions that the hooking engine used to only make use of functions exported from ntdll most of these were pretty easy for example here we have our implementation of ritual
protect which is basically just a forwarder - and to protect virtual memory exported from ntdll which means we can use it so most of our implementations look something like this some were a bit more complicated and then we had to either reverse engineer the missing dll or go to the reactor resources and do whatever they did after we implemented of them all of the api's we tried to recompile or modified looking engine and we're left with this this list of errors looks kind of threatening but most of these just requires like configuration changes to a project like disabling some runtime checks and stuff implemented in CRT so it wasn't such a big deal I won't go
into details here because it's very technical and generally it is pretty boring so after we took care of all that we managed to successfully recompile our hooking engine implemented inside of a DLL and generate the 11 that only makes use of the native ntdll itself then we again tested it toward Windows 7 and it did great we managed to hook any a 64 bit API functions we wanted so we're really glad about that then we remembered what happened last time and tested to understand very didn't do so great actually it failed to who KPI functions completely so what happened there there is we took a look at these processes and so that there is slight
difference in the morula layout between Windows 7 and Windows 10 as you might remember Windows 7 and in wall 64 Pro says the whole memory above 4 gigabytes is reserved and we can't allocate or access it this means that all the DLL or all of our code is all found below v4 gigabyte boundary Windows 10 however this looks almost identical but the native ntdll which we want to hook was actually moved to a much higher address way above the 4 bond boundary other than that the rest of the memory about 4 gigabytes is still reserved so we can't use it why is that a problem because the jump instruction that we use to overwrite the hooked function in order to jump
into a trampoline is a relative jump which receives a four byte offset meaning it can pass a distance of up to two gigabytes from the hooked function into a trampoline another on Windows seven that's fine we can just allocate our trampoline right next to the native ntdll and we need to pass a distance of much less than two gigabytes on Windows 10 we can't allocate our trampoline next to the native ntdll since this memory is written is reserved and a trampoline will always be allocated below 4 gigabytes since the total address space of 64-bit processes is 128 terabytes this means that we should have a distance of way more than 2 gigabytes to pass so our relative jump instruction is
just not good enough and would fail on all while 64 processes or Windows 8.1 and 10 so we had to replace it with a different instruction preferably one that can jump into an absolute 64 bit address okay so at this point we just like compiled the list of alternatives blanching instructions and with the generous help for McGill double of course and so let's just go over them one by one and see what might fit and what not so the first option is actually the relative jump which is already in use by most hooking engines so just like your Dan said it works on the devote size operand which means it can only pass the distance of up to two gigabytes
in every direction and like she said to bytes or simply not enough for us we need to pass a much greater distance so we had to discard this option almost immediately the next two options walk by a loading register with 64-bit absolute address either by loading it with an immediate value or by reference to a memory location and then branching to the address held by the register and while these two are both perfectly valid solutions they suffer from the same problem they both Delta at the register which kind of makes things a little bit more complicated so we wanted to keep things as simple as possible so we had to give up on these two as well the
fourth option is actually quite interesting because it doesn't use the jump instruction instead it builds an address onto the stack into this racist and then just returns to it again for the first iteration it seems a little bit complicated so we put this aside and the quick spoiler allows we will probably get back into this later and so for the first iteration this just left us with the final option which takes advantage of an addressing mode called our IP relative addressing and basically just branches to the 64-bit absolute address which immediately follows the jump instruction in memory so we have modified our hooking engine accordingly bundled it into a DLL injected it and what we so this time is
that while this solution works perfectly on Windows 10 it simply failed to create the hooks on all the platforms such as Windows 7 so to get into the root cause of this we took like an anti DLL version for Windows 7 and to DLL version for Windows 10 and decided to do some sort of a binary diffing and see if and if there is anything suspicious that pops up and what we found out is that in Windows 10 all functions to which we wish to apply our hooks are significantly longer than their windows 7 counterparts so why is this a problem and because what I haven't told you is that the branching sequence that we
decided to use is actually quite long it's binary encoding is something like 14 bytes and what happens in Windows 7 for example the functions are simply not long enough we don't have enough bytes to overwrite and so this kind of took us back to the drawing board and so at this point we decided to take a closer look at the fourth option the one which works by building an understood onto the stack so as you can see here this option is actually comprised out of three distinct instructions we first have the push instruction which builds the lower B world of the address it is then followed by a move instruction which builds the high order the world of the address and
it then concludes with a read instruction which just pops up an eight byte value from the stack and branches to it so in its current form this sequence is again too long for us it actually longer it takes up sixteen bytes if I recall correctly and we try to think how can we optimize this sequence so after a little bit of trial and error we found out that we can get rid of the move instruction entirely and still be left with a walking solution so let's understand why is this possible so like we have said several times during the course of this talk in 164 applications we normally can't allocate memory above the four gigabyte boundary
because most of the address space that we simply reserved and because the trampoline is itself mm okay is a private memorial occasion it will always be below that line now if we take a look at a 64-bit absolute address which again is below that line we can see that it's high order the world will always be zeroed out and what works in our favor is x64 assembly because when you do a push on a 32-bit immediate value what will actually happen is that the CPU will push an 8 byte value onto the stack because the stack always has to remain 8 byte aligned and the high order diode will be zeroed out by default so this is
exactly what we need and after we get rid of the move instruction we are left with a much shorter sequence which only takes up 6 bytes and this kind of makes this solution a universal solution because it will work on all Windows platforms starting from Windows 7 up to an including Windows 10 so again we have modified our hooking engine compiled it into a DLL injected it and this time it just worked and if you for example disassemble one of the Warlocks of a hook function you can see something like this push which is followed by a late
okay so that's basically it let's do a quick recap of what we had here we started by injecting or 64-bit dll into our work 64 process we did that by first using a free previously known method and jet and then improving our APC injector to make it work on new Windows versions as well then we took care of our Hawken engine we implemented win32 API is that accuses we fix its configuration and we managed to get it to hook a 64-bit API is on what on one 64 processes we then verified that it actually works across all windows platforms so we won't mess up this time and we thought it actually worked we did exactly what we wanted and then we
celebrated by going home before 12 a.m. well sooner we actually published this research in a series of blogs and we have and we published our source code for our token engine it will pretty soon publish this this slides so the links are here and that's basically [Applause]