← All talks

BG - Meltdown's Aftermath: Leveraging KVA Shadow To Bypass Security Protections - Omri Misgav & Udi

BSides Las Vegas40:58310 viewsPublished 2019-10Watch on YouTube ↗
About this talk
BG - Meltdown's Aftermath: Leveraging KVA Shadow To Bypass Security Protections - Omri Misgav & Udi Yavo Breaking Ground BSidesLV 2019 - Tuscany Hotel - Aug 06, 2019
Show transcript [en]

hi everyone we're happy to be here so today we're gonna discuss the research that we did shortly after the melon vulnerability was published basically what we did is to examine the patches that were made and we noticed that while it does mitigate meltdown it also opens the doors for some other security protections bypasses and we'll go forward today so with me is a VCR with the security research team leader it in silo and also past speaker in multiple besides conferences and I moved a CTO and co-founder of in silo and also spoken besides several time and Blake it before okay so just to outline the presentation we'll start with a short introduction and speculative execution

and we won't dive too much into the details then we'll talk about kind of virtual outer space shadow and its internals what it is exactly now it enabled us to bypass some security mitigations protections we talked about the mitigations that were done and also other stuff you can do with that okay so as a quick intro most processors rich instruction set and each instruction is composed out of micro instructions or micro operations those micro operations were actually the process of execute and you can also see it on Intel manuals when you look into how which in instruction is implemented the breakdown of each of the micro operations that are executed when an instruction is is

executed so about speculative execution so the idea is that in order to improve performance the processor can try to execute instructions ahead of time by guessing which branch is going to be taken or which instructions are going to be executed so for example if we have conditional branch instruction the CPU will try to guess which instructions will be executed and then execute micro micro instructions of my cooperations ahead of time now obviously the processor can be wrong and choose the right branch to to execute and those cases the action that was done as part of the execution will be discarded and we will call this transient instructions for this presentation now while the state of the poses so me

all the registers will not change if if something like that happened it will still have an effect at least this is what this is the cause of meltdown on the general state of the machine by actually putting pages in the cache so Melton was a process of level vulnerability still is and the vacation was that it allowed user mode code even totally and previous user mode code to it in another effective way the entire memory including kernel memory although it's not supposed to be authorized to access it at the time it affected the Intel CPUs and based on publications IBM purposes and some imposes sauce and the idea was to leverage a side channel that

was created you too remember CPUs memory cache so if we took take a look at the snippet below on the right hand side let's say that our six register points to some kernel address that's not supposed to be accessible when it's bogged down to the micro operations the CPU may still fetch that address and before we actually thought the exception it will try to execute the other operations now if we see the operations below the last line in case it will access it will execute this micro operation it may put in the cache one of the pages that are prepared ahead of time by the attacker in order to know what that was in those candelabras and

by checking the time it takes to access this page it is possible to to know which page was exactly was accessed and from that to know which what was the data in cannon now I know that's very high level exploration to dive into the details you can go to the others below but it should pretty much get us aligned on what what happens exactly so before diving to the mitigation I'll just make a really short overview of virtual memory layout so on 64-bit we have a page to page tables that look something like this [Music] the base of the physical others related to the page table is found on CL 3 and the first level is B ml 4 we'll discuss

it more specifically and the rest on the presentation and after you break down the virtual address and go for each part of the page tables you eventually get to what is known as PT the PT contains not only the physical address but it also contains other bits that define the protection on the page as well as executable or not whether it's user mode of kernel-space writable or not and so on one other thing to note is that there is a fifth called page frame number which is used to get to the actual physical address we'll use it for the presentation there as pfm to refer to physical addresses ok so the solution in high-level was quite simple if the

classic model of the operating system was to map the entire memory space in a single page table both when user mode code is running and kernel mode code is running like it is on the left hand side now after meltdown the idea is to remove most of the kernel space code and only map the user portion of the memory and so even if something like meltdown occurs the pages are not mapped they are not there and so there won't be any leak most specifically to implement that now to page tables one used still have access to the at this space and it is used when kernel-mode code is executing so you can see of the driver on the list on the

left-hand side and on the right-hand side there is what's called as a shadow page table which contains all the user mode code and code that is used for transitions between colonel and user mode and now i'm gay' will dive into the details and exactly how it is implemented

okay so we have two tables page tables now it means that every time we transition from user space into Cairo space we have a new context which only for the memory of the same process both tables as we mentioned the current page table which is the full address base page table and they use a shadow page table those are terms that we're going to use later on and presentation also there are variations of this mitigation on Linux and Mac but we won't dive into their their internals though they are quite similar so the basic design now that we have only a portion of the kernel mapped into two tables means that we have to put the old code for the

transitions in one single place so it will be easier to map it into in the two different page tables now all the code for the transitions all the routine the tendrils entering and exiting the kernel found in in a pea section that his name is kV is code as you can see in the slide every transition routine now has two versions the shadow one which is present in that section and the regular one which is present in the text section in the regular code section in case you have a newer CPU that is not vulnerable to this attack now every is every transition routine as the newer the second version that his access in the shadow page table

has the suffix of the shadow added to it as you can see and basically all what this function does it checks whether or not currently the the shadow page tab is being used and if so on the transition to the kernel switches it back to the current page table so operation could continue as regularly as it was before and also the exit functions also present that exit function from the kernel are also present in that same code section so when we transition from ring free from the user space into kernel mode the CPU puts some of the data in saved some of the data on the stack for the for the transition to occur properly and the OS

can later on use this kind of this data and usually it's the named machine frame which is part of the larger draft form that they always built previously without the mitigation how the transition worked is that in user mode the thread has its own stack and when transitioning to the kernel the same thread at a different stack in the kernel space now we know to simplify things instead of that first we got a transition stack this stack is the one that is being used directly when we start executing on ring 0 it will handle it will be used in the shadow functions that we mentioned in the previous slide and it will change also the stacked for the flats kernel

stack now this kind of this kind of transition stack is not afraid but it's per CPU or per processor Belcore if you if you want each virtual address points each virtual address points to the same physical address on both page tables of the same process and because of that this means that the code section for the transition the KVA s code is placed in the same same via others so we can still use meltdown to break case alarm this is not something that was it was meant to be handled by the mitigation and it sounds some sort of compromise by by Microsoft how the page tables work is actually by just copying the PT the p ml

4 entries from both page tables from the full 1/2 the shadow on of the user space so you only have to keep a very small part synchronized and it's easier to do so it's very similar to how now to how the kernel is being shared between different processes up until now so this this kind of memory context which introduces a pretty much a big performance issue the world the world reports back at the time that it cost the penalty of up to 30% and if I remember correctly and in order to minimize the impact some Tamizh Asians were made first one is by utilizing the hardware PC idea this is a mechanism that is used by that is available on

Intel new Intel processors and it basically means that the software or in this case the operating system can provide a logical tag for the memory address space so when cache entries are being built by the processor in order to append this kind of tag to them and then he knows to operate when you operate in a specific context to use just those relevant ad entries in the cache Microsoft decided to use to tie this one for the user outer space for all processes and the second one is for the kernel address space and if you have an older CPU that doesn't support this this mechanism the the next optimization is meant to try and help with that

basically in the PT there is a flag that marks the pages global meaning that it won't be necessarily flushed in every change of the CRE register so up until now the kernel was marked as global but it was switched to be on the to be set on a user ATO space pitties so now every time we transition into the kernel and switch the page table the user space won't be flushed out of the out of the cache and we'll save some performance on that and the last last large optimization that was made is that elevated processes or administrative processes don't have this kind of shadow applied to them so even if your machine is vulnerable those kind of processes

don't have these mechanisms applied to them this is because they can gain access to the cannon in other ways because they are privileged they can just loan their own driver to hidden right in the kernel address space so there isn't really any any benefit from shadowing those kind of processes as well so when the process starts didn't see when the system starts the initialization process is quite quite short regarding the mitigation when we start we start with the regular interrupt handlers and then we get into the function that check the checks if the actual mitigation should be applied its name is ki enable kV a shadowing once it does actually find that the mitigation needs to be applied which

changes the interrupts endles to the shadow version of them in the code section that we mentioned earlier and then if it can it opt in to PC idea if the hardware supports it and set the global flag ka-ki kV a shadow to indicate that the mitigation is is active and once this function is done the MSR for the system calls will be set properly according to the version that of the function that we need according to the global flag some kernel structures that worth mentioning that had some changes in additions to them we start with the process it basically represents a process in the kernel there are three fields that were added to it and the user directory table base which

is the base address of the shadow page table address policy which is a flag that means whether or not the current process is opt-in opted in into the mitigation and the shadow mapping which is the virtual address and of the shell that corresponds to the user directory table base and for the kpc are some fields were added in order to enable the transitions so those fields are being aligned and padded into a single page so only that page from the entire structure is being mapped to the shadow page table so no other data will be leaked the current directory table base is the physical address of the full page table that we used swap in in the shadow entry

point the RSP base shadow is the colonel's user McAuliffe is the Fred's kernel-mode stack pointer user of a speech a doe is the place we used to save the current user mode stack pointer and the shadow flag which indicates whether or not this mitigation actually applies to the current running Fred so the last thing when I talk about and wrap up the internals part of the talk is how process is created it kind of surprised us that we sight but every when the mitigation is enabled on the machine every process starts out with a with a shadow page table even before it executed when it's being created it's got it also been created with a proper

shadow page table but because there was an optimization so when privileged processes start executing meaning when they're first read is actually starts running it go it will go ahead and remove the shadow page table so now we're getting to the more interesting part we talked about bit about internals and how the mitigation works and there is stuff like that now let's talk about the security mechanisms that where we were were possible to be affected from it we start off with speech guard which is pretty old mechanism it's opportunistic in nature it will try to attack the Canon and run every once in a while and check for a specific content of certain elements such as hashes of code pages or

values in the SS DT or IDT values of M stars and if you detect some anomaly it will crush the Machine hyper guard which is a bit newer mechanism is deterministic in nature it is all dependent though it's not just software it relies on on the hardware and an eye proviso to to be to be enabled and also provide services not for the kernel itself but also for the hypervisor it's made of its method of working is a bit different it verifies action it doesn't check for content post action but because it's part of the hypervisor so we can opt in into certain events using the that cause VM exits and stuff like that so when write write operations

occur I can validate them immediately when an attempt to execute code it happens it can validate it immediately and so on so how can we basically bypass them if we think about that the limitations that they present is that we can't change any pages in the kernel because patch god will catch us and we can't really change any star M stars divides the famous stars because hyper-girl will catch us much faster than beta but if you think about it now what we have is a new area in the kernel in ring zero the transcode but he doesn't have the kernel so odds are he also doesn't have petit God so if we can figure out a way basically to to

grab hold of this area and we'll be able to bypass those those protections so apparently we can do it pretty easily all we have to do is make the pages on the shadow page level private to them and separate them between the actual the full shut the full page table and the shadow page table and now we own those pages and we can do pretty much whatever we like with them and so on so what we can actually do and how we can interact is achieve the bypass so first we have to locate that code section in order to switch it and it's pretty easy we can just pass the P section table and then

allocate new pages for us to use now we need to build our own hook it's a bit different this time first if you remember what what the shadow entry point does is that it checks for if it's currently in the context of the shadow page double switch it so now this check is quite redundant we know for sure that we are in the context of the shadow page driver since those press web pages are owned by us and I private to that page table so we can just remove it we don't need it anymore and now we need to actually be able to run our own code so if we try and run it in the shadow page table will

have its possible but we have some issues we can't really use any functionality that they always provides we also won't have the rest of the data in the camera space there because that's the whole point of the shadow page table that will it will be filtered and reduced view of the of the channel itself and another issue that is will be a bit more problematic later on but that's pretty much seal the deal for us is the fact that we have to handle the relocations on ourselves we don't use the oil odor here so we build the code ourselves we have to and to apply the locations and also ourselves so that's pretty much not a valid option for us

but what if we could actually go back into the kernel page table but to our own code before we get to the kernels code so we can just facilitate this called memory context switch but we can use a WAP gadget beforehand and make sure that we'll go to our own code so if we look at let's say we take a look at the binary of the the page tab of the shared code section that's the original one if you move to the disassembler view we can we can look for that instruction opcode which is the last last byte on the on the snippet if we look at how it looks by view of this assembler we can

see the start of the original shadow entry point that we copied to those to those pages and that will also be the same contact content of the page after we switch back to the can to the full page table now instead we can run it over with a WAP with a basic crop gadget that we build we're going to push the address of the driver of our driver in the full page table and then facilitate the memory context switch by Simon assigning the proper value to the cr3 register and using the same read instruction that will be present in both views so you can see now and the last thing we have to take care of his advice is just

switching the stack so we'll finish with the transition stock and go back to the red scanner stock and from there we are much much more free to do what we want so just to apply the hook is also very simple all we need to do is pass the page tables and be able to change the P offense so we avoid of any dangerous write write operations that those mechanisms can monitor and that's how we can pretty much avoid them avoid them completely those page tabs access who accessed through physical addresses so we can just do what a forensics tool do in order to be able to get them either use the physical memory device or do PT

remapping on our own we decided to go with PT remapping because we thought that it's it's less there is the last of way to be detected that way once we get to our code and to do in the full a shadow and full page table and why not what the other the one last thing we have to do before we pretty much completely free is to flush the pages of the of the hooked pages view transform the cache so we will avoid any recursion because now there is no there is no more check of what what's the current memory context is and so the who can just call itself over and over again if it's not

left out and because we can control the entire page and same as we control the at the entry can also control the exit [Music] ih PCI is another security mechanism it's a bit outside of the scope of what we aimed to bypass but it did did present some challenges in order to implement this technique we can no longer create our own pages on runtime so because it actually prevents it we can't in either can't modify existing pages so basically it means that we have to rely more and more on data pages which is not a big deal and there is enough space to map data pages into the shadow page level near new the shadow

entry points and our code needs to be position independent and then we can compile it in advance and there won't be an issue and our work gadget will be will look something similar to this we need to get the code address and then use the use the data page that we that we set up when we installed the hook and in order to fetch the address for the driver a hook and we're pretty much free from here so if you follow the bit up until now we managed to to bypass the protections that we wanted bypassed particularly by past hyper guard but now we had one small issue it's not that small it's quite big we can only only

apply this kind of hooks and control only less privileged processes and now if we'll be honest those are usually not that of great interest to defenders or attackers usually were more interested in the privileged processes that don't that don't have this shadow applied to them so the hook doesn't apply to them as well so we try to understand if it was actually possible to do some do force them into the mitigation and be able to control them as well and we did manage to pull something off if we start up with a new newly created process basically if we remember if you remember all you have to do when the process is created it does have a shadow page table

so if you can find a way to prevent it from being deleted maybe we can just use it later on so the code that manages the deletion of the shadow page table does some one interesting check on the prologue you can see on the lower side lower right side that it checks the shadow mapping field value if it's none if it's null and if so it just skips the entire function and don't in won't clean up those the resources so if we monitor the creation of the process which we do and we can just save those fields and reset them and once the process actually started running we can opt it back in by restoring by resynchronizing the

deputies of the use of space and restoring the the values and all we have to do is resynchronize the state of the kpc our fields which is very simple to do we just call cause a memory contact switch by touching two different processes and returning to our own and for cleanup all we don't really do need to do anything the system takes care of it for us as well since it it is the one that allocated all the pages and all the page tables there is no issue here so now we get into the more experimental area converting running privilege process is a bit more of a trouble because the shadow page table was already deleted

and we didn't prevent it from being deleted so we have to figure something out but apparently it's not that complicated as well we will allocate a new page on our own and we then have to suspend the process so it won't run and do some memory operations and we think the synchronized the shadow page table PML for that we allocated again we use the same synchronization for the user space but since we allocated the the the page table we also need to synchronize the counter space for the shadow page table which is pretty easy we can just use the system entries from the system process we obtain the process again we stop setting the fields again to what we

expect them to be when the mitigation is enabled and then we continue the different series that on termination we need to opt the process back out and clean the page on our own otherwise it will the system will crash because it will try and free resources that it didn't allocate

okay so so far we discussed our and the mitigations enable us to bypass petrol diaper God and such mitigations now the mitigation that we recommended at a time was rather simple it was also what was eventually done basically it is only the only need check it needs to be added to petrol is to validate the physical pages in the shadow page table match the physical pages in the full colonel page tables for those regions that are used for transition and then indirectly purcha card will validate the shadow page pages while it validates the care the full cannot pages because the first check makes sure that the physical address is the same so it's the same

memory and then the validation that code it and change make sure that it's actually the original code however there is a single complication that this needs to be done for every process that is currently running otherwise it is still possible to hijack specific processes and take take over them so this is exactly what it does it goes for all the running processes and make sure that physical page has matched for those transition areas so this was fixed on redstone 5 inside a build 17 655 and from then it was no longer possible to make those kind of hijacks however these tests only checks that the specific code that is used for transition is not tampered it is still

possible to put code in the shadow page tables execute Colonel Calder to data and most security products don't look at that space at all and different forensics tools normally will not look at those pages so it is possible to put parts of food-kits and high data there and so on so one possibility is to just improve page guard so it will validate those pages those pages are empty the page only contains the transition code that it's supposed to contain and also it makes sense to add plugins to tools like volatility and recall to scan those pages and as well basically any code or any data in this page table that is not part of the transition code is

suspicious and should be should not be there and some final notes so whatever we all know measure inspector had a lot of impact so it made a lot of changes around the world and the way we view and protections and stuff like that and fresh guard was bypassed what we found was not the only issue there was actually a very significant issue in Windows 7 that was called total meltdown which was even much more than meltdown itself by mistake the page table on the kernel of Windows 7 was accessible to user mode and well this pretty much means that you can do whatever you want on the system so this was also paid shortly after it was discovered the

point is that when doing such big sale changes should review as much as possible and even stuff that is not directly related to the architectural change also it's probably going to be smart too to take into account the shadow page tables when doing forensics and stuff like that because as of today it is possible to hide stuff over there okay we're pretty much done questions here

our virtual core is treated the same and are they as vulnerable as and mitigated in the same way as physical course [Music] [Music] yeah okay

any other questions hi thank you for the presentation so the version that was said where the fix is in is that out yet were I mean still an insider preview right so basically the production version of Windows is still vulnerable to this no no it's already applied to the production version inside a bit later today there was another vulnerability announced today that used the swap GS thing it's related to Spectre so another Intel vulnerability that was patched today and it uses the swap GS instruction to basically bypass all the previous mitigations for spectre and and meltdown and including keep it kpti and and all that and the question would be an apparently microsoft did some Cylon

patching back in July and was only today they released advisory for it because this was the disclosure date had time to look into that or do you plan to look at the mitigations for that because there are some new mitigations there we did not review it yet and yes we will but we didn't have time to do it yeah okay thank you

anyone else all right thank you so much [Applause]

[ feedback ]