Turning (Page) Tables — Bypassing Advanced Kernel Mitigations Using Page Table Manipulations

Name: Turning (Page) Tables — Bypassing Advanced Kernel Mitigations Using Page Table Manipulations
Uploaded: 2018-09-20
Duration: 42 min 51 s
Description: Demonstrates a technique to bypass Windows 10 kernel mitigations (KMCI, CFG, and others) by manipulating page tables to make shared code pages writable in user mode, inject shellcode, and achieve privilege escalation without executing kernel code. The talk covers memory management, virtual memory tr

BSides Las Vegas · 201842:51207 viewsPublished 2018-09Watch on YouTube ↗

Speakers

Omri Misgav Udi Yavo

Tags

CategoryResearch Technical

TopicReverse Engineering Vulnerability Research

DifficultyAdvanced

TeamRed

ResearchTechnical Deep-dives

StyleTalk

About this talk

Demonstrates a technique to bypass Windows 10 kernel mitigations (KMCI, CFG, and others) by manipulating page tables to make shared code pages writable in user mode, inject shellcode, and achieve privilege escalation without executing kernel code. The talk covers memory management, virtual memory translation, and proof-of-concept exploitation on the latest Windows 10 builds with VBS enabled.

Show original YouTube description

Turning (Page) Tables - Bypassing Advanced Kernel Mitigations Using Page Tables Manipulations - Omri Misgav & Udi Yavo Breaking Ground BSidesLV 2018 - Tuscany Hotel - Aug 08, 2018

Show transcript [en]

hi everyone I am with the CTO and the father of insular with me is Omri inside of the research team leader and security researcher what we'll talk about today is basically how to bypass the current mitigations in Windows 10 pretty much all the mitigations that are currently there so we'll start with a bit about current mitigations and state of exploitation around them and then we'll cover a bit how memory management works and how it relates on virtualization comes in with VPS and km ci is it required for the rest of the talk move on to the technique itself that we call interning tables show demo and mitigations so over the past years Microsoft is spending a lot of effort

improving its kernel mitigations and doing quite a good job about that so as you can see in the table quite a few mitigations that were added and constantly being added as Windows then progress also they improve some of the features the mitigations on the one for example case allow it proved significantly since Windows 8 and by now pretty much everything in the kernel is under mice to one point or another now also features that depend on VBS for example AFCI which we'll cover more in depth later and case efg can control flow integrity the bottom line is that a lot of efforts on that and that led to many new exploitation techniques and a lot of talk over the past years that

cover how to bypass these mitigations last year in blackhat there was a very good talk about how to bypass colon mitigations however this as far as we've seen none of the mitigation the bypass techniques that were presented so far also deal with mitigation such as key MCI and other various types of mitigations so now a bit about memory management so pretty much every modern operating system nowadays you should use a virtual memory basically virtual memory is just a translation form and others to the actual physical address in memory which is done by the MMU the idea behind it is well there are a lot of uses one of them is if you enable very large other space

when the physical memory is actually much smaller poster size solution and a lot of other things we'll cover some of them in this presentation so how does the translation actually works so if we take this 64 bit others for example and we'll break it down we will see the translation works pretty much like that so first you take the caco3 register control register 3 which points the base of the page table and add the 9th bit the 9 bit from the night thirty-ninth bit in the others pay others to the the base of the page table and this is actually a pointer within the PML for which points to a page directory what's called the page directory point opt P so

then you take the next 9 bit and use it as an offset within that table and from there it points to the page directories again take the next 9 bit use it is offset in that directory and again in the page table where we finally get to what's called the PT which is page table entry which points to the actual physical address which can also be referred to as a PFN which is a physical page frame number now so as you can see the lower 12-bit are not being used in the translation and this is actually the page size the offset between the page which means that the page side is in this case for K actually there are also

large pages but it's not really critical to understand the rest of the talk so I won't go into that so we have pity's PT is not content not contained only the physical address they actually contain other bits that are used by the operating system to control permissions on the page and other things for example the lowest bit it's a bit hard to see but it's a valid bit so if it's not valid its points to nothing there is a right bit which means that the page is writable not executed and all that they also built the result for the operating system memory management as you can see from the 40s bit from the 52 bit the

width being used by the operating system to manage memory now because to pity's can have the same pfn which means that they point to the same physical address it also means that the same physical page can have different permissions in different memory others as of different processes so this is essentially how shared memory works and why it's possible to make the same page executable in one process while it's writable in another process which is important to understand because it's we're gonna talk about it a lot later on so just a few words about prototype PT is although pretty is pretty much specific to Windows in a sense and well the only thing that's really important to

understand about them is that they what Windows uses for shared memory basically each shared PT is a prototype PT which for example contains the number of time it is being shared this can be used for for the operations to know when it can remove the page and generally control it and I'll talk about it more later on now the next thing I'm going to talk about is copy-on-write and pretty much how it works so in Windows for example every DLL that's being mapped is by default so if you took it kernel32.dll which are mapped in pretty much all memory in all processes the the code for example starters share so they point to the same

physical page which makes sense for efficiency point of view because then you don't need to waste much physical memory for the DLL for the others that are marked in many processes however a poses can change the code or read only data in its own address space and in these cases the operating system needs to make sure that it doesn't change the cost possesses because well that can definitely cause a problem because if you change the code in one process in the facts the code in another process it can be the source of bad things so what the operating actually does when the process asks to change permission on such page is to for example to make it

writable is not to actually make it writable but to use one of the software which I mentioned earlier in order to mark it as a copy-on-write page now when the right actually happens the operating system handles the fault that happens because the page was not I type also there was a fault it sees that it's a copy-on-write page and then what happens is that the page is being copied and the PT now points to the copy of the page and the right happens there so the change only happens in the specific process in its private memory and not globally across the operating system now when we also have your Chua's asian involved there is another level of

translation of the second level address translation and the goal is basically to make sure that virtual machine cannot temper with each other memory I guess it's obvious why we want to keep operate virtual machines isolated from each other so there is a what's called a guest physical address which later translates by the what's called an apt to a machine physical address it's like another level of translation so we petits how it's called in intelli processors or npt in AMD processors and it also adds potentially another level of optimization because if the hypervisor control those virtual machines know that 2 pages are actually the same it can point them to the same physical memory so the optimization can

exist in even across virtual machines and still keep the isolation intact now as you can also see they also have permission of those pages so when if it is if it is being used the actual page permission is determined on that level so if a page is not executable on the PT level it is not executable at all now Microsoft uses that with many other things to create the VPS of virtualization based security so what they basically did so the hypervisor manages the virtual the other kernels virtual machines and each kernel lives in its own VTL VTL is a virtual trust level now the least privileged trust level is the lowest level so VTL 0 is

less privileged in VTL 1 and basically what you see when you walk on the computer is the the right end of this diagram it's where the regular kernel runs and all your applications and that's VTL 0 now the secure kernel the skm on the left hand side is what actually holds the power and what manager can manage the apts and the real permissions in the other side of the less secure world and that's enables medications like hvc i hv c i is i provides our code integrity which is actually split into kind of two policies one is camp CI which is kernel mode code integrity the other is um CI which is the same thing for user mode

and the idea is to ensure that the code in india with Kensie enabled the code in the kernel cannot be modified or even executes unless it was authorized and authorized actually means that it was signed properly based on the policy that is defined so the DLL that does this is a SK CI it it's what verifies the integrity and whenever a new driver or any types of code is being loaded into the kernel it must be first validate otherwise it cannot run it will not get the execute bit on the apt and once it got the executed bit it cannot be modified because as I and earlier the page would not be actually writable even if the operating

system marks it as shot as such unless it is marked to be writable on the apt as well okay so to quickly recap all that basically virtual virtual memory management is done by both hardware and software I mean everything that Windows is doing must be supported by the hardware to walk for example if virtualization is not a numbered clearly VPS cannot be enabled and it's also the basis for many things like shared memory or the ability to do flexible physical memory management but also the way that the advanced mitigations work such as HV CI credential guard is based on the same things and secure memory enclave's and many other things that microsoft is keep lending to the operating system ok ok so

now that we covered our basics we need to understand what we want to achieve what is our general motivation here so if we look at every privilege escalation exploit that is out there right now or at least most of them at some point in their execution in their operation they run a payload in kernel mode which now becomes a problem because key mci effectively prevents it as we said we cannot create new code in the kernel and know we can modify existing one so we kind of have a problem here most of previous publications kind of assume it wasn't enabled this is because it wasn't relevant up until now mostly and proud and basically our the real goal here is to

able to to be able to achieve code execution we test possible privileges usually a kernel mode but for our for our intensive purposes we can be good enough and we can use we can get to a state that we run as a system user the reason for that is that at this point Microsoft does not consider it as a defensible boundary the quality is taken from a tool that was released I think yesterday don't quote me on that maybe the day before that by Alex UNESCO and that the released tool utility for whoever wishes to execute kernel-mode code from a user mode without any driver being involved in the process and the only thing you

need is basically an administrator account or a system account so we're going to cover our technique which we named is which we named as Turning Tables basically we need to understand what we actually need in order to execute it so once we found a vulnerability that allows us to be able to achieve read and write primitives if you look at it from a developer perspective it means that we have functions that we can supply them with virtual address and data either to read or write from kernel mode and do that from the user space this is very common in pretty pretty much every mod and exploit and that's all we need so to understand the concept and basically

what we are going to do is utilize these primitives and make a shirt code page in user mode we're going to flip a bit in its PT and make it right to go it's a very simple operation just flipping one bit and in that way we will be able to avoid the copy-on-write mechanism now the short code page need to be a one that actually runs in other processes because it is shared so we want to target a code that also been used in system processes next up after we made the code writable in our own in our own process in the origin process and we're going to change the code however we like next we just

need to wait and then we're gonna run a system it's very it's very simple a simple concept and nevertheless we still have some hurdles to overcome the first one as we mentioned earlier page table are now randomized and we some someone need to be able to get their addresses in order to go through the pts themselves and we assume you can already leak the base address for the kernel and with your right primitive and there's a lot of material on on on this topic online so you could just find out about it and there is a very convenient way to find the PT base address using an exported function in the kernel and Microsoft was very kind to us you can

see that mmm get virtual for physical is exported though it's undocumented and we can still use it and the PT base is marked there it's a bit hard to see the value but it's a constant value and which is which changes the every boot so we can easily walk through the export table once we have a real primitive and the end of service address find the function address and read this random value now an additional method it can be can be also growth we can also go through em I get PT address this function is an internal function and it was presented this method was presented in blackhat last year so now that we managed to find the pities

for the a code we want to change in user mode and we need to understand exactly which code we're gonna change so first up we need to understand which target which process is going to be our target meaning in the end of the process of the exploit where we want to execute our code so there are many processes that run in system for example s be ciosed which is a generic host for services in windows win logon and else's which are always processes that handle login in authentication and of course pretty much a v out there and out of the box on Windows 10 we get a V different Windows Defender now we don't necessarily have

to target system process on the start but it will only make our road to a complete escalation much longer now another side effect that we might be able to benefit from is that the fact that sometimes system processes are being excluded from a monitoring and protection of some security products so we might be able to avoid detection afterwards so after we understand which process we're going to target for instance SVC host you need to understand which specific DLL we're gonna we're gonna need to change so target the target that DLL cannot be a one that is shared with VTL a 1 because VTL 1 protects every code that is being used within that so for instance Keanu 32 is

not a very good candidate for that purpose but because VTL 1 is a very redacted zone because it has to be more secure and this attack surface should be more narrow it doesn't use any UI nor that it does complex parsing and all that you or it doesn't use network capabilities so all of the relevant dealers for that for those systems subsystems can be very helpful for us and preferably and we want the dealer to or will already be loaded in our origin process we just save us some hassle from a quick search we did the following for the others appear pretty much in in most processes and our nut shell with VTL zero and keep in mind the

first one it's very interesting for us and we're going to continue talking about it in the next slide so after we decided which specific DL are we going to change we need to understand exactly how we can change it we need to find a place that is shared but it's not used obviously because if we make some changes to it before were really prepared it will make us a crash which is ideally not a good thing to do once one-for-one tries to run in kernel exploit so for that instance we have code caves caves in executable files are quite common basically at the end of each section of the executable there is the unused space it's usually because

there are differences in page alignments between virtual between the file page size in virtual memory and sector size on disk so using that these places are very good so if we look at the code section and we got a place that is already shared it's unused and it's executable so it's great for us and makes it quite a straight fold well we're gonna put our first payload so if we continue on looking on our or a targeted DLL only 32 DLL as I mentioned earlier is quite a good candidate it has over a half a page of code cave which is quite a lot of space for an initial shell code so after we decided to write

our shell code we need to understand how can trigger it because as we said it's in a place that it's not used so just try thing it won't be enough so the target module needs to be used quite often in the tab in the target process so our code that we write on we write there would be eventually executed we don't want it to be obvious too much often because then overhead will be an issue for us and we can also consider places that may be triggered by the origin process for instance we are in RPC some operations from one process requires communicating with a different process which eventually may lead to a path to a code path that will eventually

execute our own code and if we look at the Alerus as we said we targeted the DLL entry points for this kind of files are very appealing now the invoked on certain events such as every thread start of thread creation and haverford exit unless the DLS specifically opted out from it now on Windows 10 if you open Rockman for a couple of seconds you can see that services constantly create many threads so it's great for us and specifically because we target a DLL that Microsoft built their own version of the C runtime it's very easy to be to alter in order to make to make the code reach our code cave and there is a small jump at the

end of the main block between simply need to write and change the add the target address to get our own code in the code cave so we are at the point that our code can run but we need to understand exactly how to build it now because we write to a shared a page that code can populate too many different processes that we don't necessarily know which which ones they are and we don't control it so because we assume that we run in a limited sandbox process and we don't specifically know the target process ID but since we already run code inside the target process it's very easy to check the process name and to a very verified

username in which in under which account it runs and then if it's if it it's if it's good for us we can simply continue on running now since we also run in multiple processes and we don't run our payload to run only once more than one section we can simply synchronize all of the different instances by trying to obtain a named mutex once we start the first one with that we'll be able to do that we'll just continue on to the main payload now because the code cave is quite small in size and we need to be able to do some more complex operations so we want more code which is usually larger than the code cave size and we

can either map it and get it from the origin process or directly read it from its memory or we can just fetch it from a remote machine and now that we run into bleach process it's much easier so if I need to quickly recap and walk through the technique with a concrete concrete example and we have the hypervisor that protects the kernel we have hyper-v particular on the right we have an instance of an SVC SVC host process that is using oily 32 DLL and for example and we get a sandbox process with an instance of edge browser next up after we leverage the vulnerability that we found hopefully we're gonna use a read and write primitives that we

established and make the short code page is writable in the origin process then we can write the initial payload to the code Jeff which will populate to the rest of the processes and we're gonna continue and manipulate the DLL and main entry point in order to make the code execute at a certain point then and then we only need to wait the code is already all the changes are already done we simply need to wait for a thread to be created in the target process and the code will start running then it will load up the rest of the payload and do whatever we like and now for a very nice demo hopefully so we got here a virtual

machine it's the latest Windows RS for version it's updated I think last week to last week which is the most up most current and up-to-date and so unfortunately for us fortunately kind of depends on how you look at it and we don't have an actual working exploit here and there is no actual volume ability that we can use so we have to use our own driver in order to simulate the read and write primitives that's why we entered the system into test mode you can see it on the side here and now if we look at the at the secure the VBS features that are enabled we can see that we have the idea visor and forced code integrity on and

running now we're gonna start our our simulation

okay it's gonna take up a second okay so basically we load it only loaded oily and 32 DLL up until this point we didn't really actually do anything now if we take a look at the address on the system information process that I'm already touch to we can see it's empty you're gonna continue on and at this point the shellcode the initial shellcode was supposed to be loaded and populated into the entire into every passes that also loads up only 32 DLL and we can see that it's actually happened and the code actually changed in a process that it's different from our own so basically what we're going to look for now at our shell

code does its load up a payload DLL from C Drive so we're gonna open process Explorer

it's a bit sluggish the the nested VM on the VMware doesn't work that fast okay

still no success means that the code still didn't run in the SVC host process

try and help it a bit with user interaction that may may cause a service to actually run and we can see our code started if we look at the specific process you can see that SBC host which runs under system actually loaded up around DLL

okay so if we just compare it to how it works versus mitigation so basically we've shown that by passing page table and organization is not really an issue when you once you have a rate relative so this is maybe a bit of extra walk but not really hard to do and the other kind of more advanced mitigations like kernel control flow and CFG control for garden is just not relevant because we don't really need to execute system code we just wanna run a system in some other process and the same goes for chem CI we just don't need to mess with it we just keep on to the to running a system and if you compare it to other kind of

techniques that Fork is that don't require code execution for example like tokens stealing so there are several downsides of tokens stealing one to one a lot of security products so they try to monitor it on rs.5 windows defender system guard is also gonna try to monitor it it also requires a lot relatively a lot of operation you need to traverse the processes in the system in order to do it and in to do it without a kernel shell code is also quite dangerous in the sense that it may lead to a crash and as we've shown the concept here is relatively simple we don't do any complex operation to me in order to make it work

it's just relatively a few simple steps and it's running another thing is that usually when you do some kind of remote exploitation you start off in some kind of sandbox either at Chrome or whatever and then you usually want to migrate to some other process because the browser process is probably not going to live very long and it's kind of suspicious driver telling its system and all kinds of other reasons so usually a next step will be to migrate to some somewhere else and using this technique you don't need to migrate because you're already running in in a process and from our test you can also target protected processes possible mitigation is to use um CI which

effectively means that the prototype pages of usermod processes cannot be changed because actually generally code cannot be changed because it's like km CI just in user mode but this is not really a good solution because at least to our knowledge there is not a single organization that can actually work with this because it requires pretty much every code running to be signed so we don't see that actual solution and other possibilities to prevent modification to prototype pages this should be not that hard to do because VBA so it does that for code that is shared with video 1 so at least for the way we implant implemented the technique in this case it should be a good solution so even

with all mitigations walking it's still possible to write a little really simple I persist to current mitigations we checked in it's also works on RS 5 and latest insider build and that said we think Microsoft is doing a job in developing new mitigations it definitely makes exploitation harder but still holds there we suggested some of the mitigations to Microsoft we didn't get expunged yet we also think that it can be useful to even when mci is not enabled because you don't need to find useful candle pointers or function pointers to finish exploitation or anything like that and it makes it makes things easier and even if mitigations like set which is in tells how the enforced control flow integrity

was working it would still work because you don't really need to mess with the control flow you just write your code and it will work and you can also use those tricks and other operating systems it's not really limited to Windows we just used it on Windows because it was the most challenging in order to probably bypassed dis mitigations the same should walk in linux and our six as well for example and that's pretty much it so if there are any questions

[Applause]

first thanks for your talk I enjoyed it a lot regarding the need for the synchronization among instances this isn't a mitigation or but wouldn't the use of OS level synchronization primitives IPC things make it an easy target for an adaptive IDs to detect that the exploit was going on unexpected semaphore is being created that weren't part of the pattern that that was expected to be seen no just just to detect that the system is being exploited and isolated intrusion detection not not mitigation yeah not prevention okay there questions there's one

why can the process running in user mode right to the table entries and actually set it to to do the copyrights should that be protected and only the colonel during the memory management be able to modify that yeah just because otherwise things like this would cause security boundaries so the operating system uses it to prevent such things I mean the copy-on-write is just done to preserve the optimization of memory

[Music]

now I'm wondering if a viable detection from user space would be if you could if you could detect that the page table was being shared it was not write copy by may be doing dummy writes in two places where you know you know you should receive a write copy your own private copy in your process I'm wondering if there's a way to detect this somewhere there from user mode well we don't have we didn't have any idea how to detect it solely from user mode you you can detect that code most modified if you but the problem is to know when to check it well to check it exactly I mean we use the specific DLL but it can be any other

dealer and you also need to know when to check it this is not there is no some kind of notification that you can use to know when to do it

okay thank you very much for your guys talk appreciate it

Turning (Page) Tables — Bypassing Advanced Kernel Mitigations Using Page Table Manipulations

Related talks