No Code Execution? No Problem! — Living The Age of Virtualization-Based Security

Name: No Code Execution? No Problem! — Living The Age of Virtualization-Based Security
Uploaded: 2022-10-23
Duration: 50 min 11 s
Description: Windows 11 enables virtualization-based security mitigations by default, including Hypervisor-Protected Code Integrity (HVCI), which prevents unsigned kernel-mode code execution through the hypervisor. This talk examines how HVCI and related mechanisms work internally, how they fundamentally change

BSides KC · 202250:111.5K viewsPublished 2022-10Watch on YouTube ↗

Speakers

Connor McGarr

Tags

CategoryTechnical

TopicReverse Engineering Vulnerability Research

DifficultyAdvanced

TeamRed

ResearchTechnical Deep-dives

StyleTalk

Mentioned in this talk

Tools used

CrowdStrike Falcon

Platforms

Hyper-V

Frameworks

Cobalt Strike

Concepts

Intel Control-Flow Enforcement Technology

Vendors

CrowdStrike

About this talk

Windows 11 enables virtualization-based security mitigations by default, including Hypervisor-Protected Code Integrity (HVCI), which prevents unsigned kernel-mode code execution through the hypervisor. This talk examines how HVCI and related mechanisms work internally, how they fundamentally change the kernel exploitation landscape, and demonstrates practical attack techniques that adapt to these new constraints—including return-oriented exploitation chains and documented API abuse that bypass traditional shellcode-based approaches.

Show original YouTube description

No Code Execution? No Problem! - Living The Age of Virtualization-Based Security - Connor McGarr Windows 11 saw the default enablement of some of the most powerful exploit mitigations on the market - many of them falling under the purview of Virtualization-Based Security, or VBS. These exploit mitigations are instrumented through Microsoft's hypervisor, Hyper-V, which provides a "higher root of trust" than the Windows kernel itself. With the advent of the default enablement of these mitigations - simply put - the "old" way of doing things won't suffice when it comes to kernel exploitation. Hypervisor-Protected Code Integrity (HVCI), one of these hypervisor-based mitigations, works by outright preventing any malicious, unsigned shellcode from running within the Windows kernel. Does this now mean "game over" for attackers? This talk investigates how these new, modern mitigations work and how today's attackers must and can adapt to the new bar set by these exploit mitigations. Specifically this talk addresses: Historical Windows exploitation Hyper-V/VBS/HVCI internals Revamping kernel-exploits to deal with HVCI Augmenting HVCI with Intel Control-Flow Enforcement Technology (CET) Connor McGarr (Software Engineer at CrowdStrike) Connor is a software developer at CrowdStrike, working on EDR features to detect and prevent the latest in-the-wild attack techniques. He previously worked on the CrowdStrike client-facing red team. Connor is extremely enthusiastic about anything low level, C, Assembly, and Windows internals related. In his free time, he likes to contribute to his blog about his latest research - as well as spending time with his family, and studying history.

Show transcript [en]

for the intro so today my talk is about some of the new virtualization-based security features that are enabled by default on Windows 11 really cool technology probably my favorite feature within the windows OS so with that we'll go ahead and get into it so my name is Connor I'm software engineer at crowdstrike I just like tinkering with low-level stuff low-level operating system internals C assembly Etc so basically today we'll talk about um kind of what Windows exploitation right now looks like we'll talk about the security features um and how they affect exploitation essentially so that will go ahead and get into the overview so essentially today attackers they have a few different options when it comes to taking advantage of memory

corruption vulnerabilities so attackers currently prefer to exploit vulnerabilities by leveraging a payload like the Shell Code which we can also refer to as unsigned code execution it's very extensible it allows you to in Shell Code just a blob set up function argument call into malicious functions and it's in path of least resistance normally if you've analyzed any kind of exploit as a proof of concept you'll see Shell Code detonation foreign so with that being said on most modern operating systems today initial access targets such as like web browsers their sandbox right so uh exploits today are chained together with multiple exponents um so usually what happens is a browser vulnerability is taken advantage of that

gives the attacker execution in context of a browser process for instance well that's sandboxed these days so you have to pair that with some other exploit that breaks you out of that sandbox otherwise you can't do things you know like write files to certain directories Etc and usually the windows kernel um since it's native obviously it's a target for attackers um to escape sandboxes and kind of what we'll be focusing on today so on our operating systems today memory is not completely read write execute everywhere thank goodness um essentially this forces attackers to take a three-step approach so what happens is attackers today usually start by riding a final payload to some writable part of memory well as

I said memory is not fully executable and writable and readable so usually that memory that's writable where they can place the shell code is not executable so then they use some kind of first stage payload to mark that region of Shell Code as executable and now with executable memory they hijack control flow of a program so they figure out how do I force this exploit Target to call into my now executable Shell Code um so since we're talking more about the kernel side of things this is kind of how this looks at a high level um at this three-step approach so if we see the memory address uh in red in the middle of the screen it's ffff780 sign

extended we can see some assembly instructions which represent Shell Code and we see This Acronym on the bottom it's called pte which stands for page table entry that describes the memory essentially and right now this has a k which stands for it's a kernel mode address or page and it's writable for w and V for valid right so an attacker they've done that first step in that three-prong approach we've talked about and they've written shell codes somewhere using a vulnerability or a first stage payload an attacker can locate in the kernel of the page table entry which corresponds to the Shell Code address and then they can corrupt the metadata so anytime the CPU goes to execute or

interpret this memory address it now looks at how it's described and it says it's kernel mode for k still now it's executable for e as we can see and it's valid so now we have Shell Code in the kernel that's kernel mode writable and executable so then the last thing an attacker will do is the hijacking of control flow so on the right or on the right hand side we have an array of function pointers um obviously as the name infers points to a function to be executed an attacker can locate those overwrite of a pointer with their own memory address in this case it's the address of the Shell Code which we previously wrote and now

anytime this function pointer is executed what's going to execute the now executable of Kernel show code so that's kind of the approach attackers take currently uh this in the user mode we're talking about kernel mode but in user mode this kind of approach is mitigated with a mitigation known as arbitrary code guard when it's enabled um basically arbitrary code guard is a user mode mitigation that enforces this principle of write X or execute so memory can either be writable or executable but never bold at the same time and as we saw in our example our Shell Code was first writable then we used some kind of vulnerability to make it executable and that violates the basic

tenet of ACG but ACG is a user mode mitigation right so we can put the mitigation itself the infrastructure in the kernel because there's a security boundary right user mode can't just arbitrarily access kernel mode so an attacker first needs some kind of Kernel vulnerability in order to like disable the mitigation Etc right so there's a security boundary that an attacker already has to have some kind of Kernel level access right well we're talking about kernel exploitation right if we assume that that's the most privileged entity of the OS if we put a mitigation in the kernel to enforce right to X or execute but our threat model is we're trying to stop kernel attacks while the colonel can't

defend against itself right so currently Windows didn't have a way to defend against these sort of attacks but we'll kind of talk about how the hypervisor now becomes a higher security boundary and how that's leveraged to implement these principles which we've talked about um so now we'll get into hyper-v virtualization-based security and another acronym called hvci which we'll talk about shortly so virtualization-based security is a suite of security features that basically are provided by the hyper-v hypervisor which is Microsoft's hypervisor so obviously we're investigating DBS and it relies on hyper-v we want to look at it further we need to see how hyper-v works so this is a 101 I guess hyper-v uses this concept of partitions for

virtualization purposes right so when the root or excuse me when the operating system boots uh that's called the root partition essentially and it takes up the whole physical address space of the computer and then anytime a virtual machine is created what's uh what gets created is a child partition which hosts all the infrastructure for the VM you can refer to it as an instance of a VM essentially and as I mentioned the root partition takes up the physical address space and then anytime a child partition is allocated it takes or is allocated from that group partition and here's a brief example from the Microsoft website uh from the msrc blog that kind of just shows how this looks

right so in this example we only have one VM running which takes up a child partition we have a root partition and they both run on top of the hyper-v hypervisor so a child partition a group partition so we can infer that a child partition has its own physical address space essentially and it's isolated from other child partitions and the root partition well how does this isolation work right on Modern CPUs there's a technology called second layer or second level address translation and Intel processors their implementation is known as extended page tables and what slot does it basically allows the CPU to intercept memory access right um so VMS they kind of don't know that

they're not the only OS running on a machine so they just act on memory as if you know they're the only OS and behind the scenes a higher entity the CPU can kind of facilitate making sure VMS access the memory they need to so here's a quick example of this I'll kind of draw on the screen for a second but basically what happens is we have a child partition as we can see and outlined in the red on the right hand side and then we have the physical of memory on the computer right so the child partition for instance over here like at 0x 2000 for instance uh the OS or the the VM excuse me uh it's like on

OS right so it's just running it's running Windows let's say and it needs to access the memory address Xerox 2000 right well it doesn't know it's VM so 0x2000 actually exists at 0x 5000 on the physical computer right but the VMS don't know about that so there needs to be some sort of entity that can say hey I see you're accessing 0x2000 memory address what you actually need to access is Xerox 5000 in the physical memory on the computer um so that is how that sort of um Works in that regard so the way this works is um Extended page table referring to that slack technology we talked about as the name infers there's an additional set of page

tables and those page tables contain all of the necessary information to perform that translation from what the guest or the VM thinks it's accessing into the actual memory on the host so what happens is virtual machines don't actually access physical addresses they access what we would call guest physical addresses which would be that CRX 2000 um address that gets intercepted by the CPU and then translated into a system physical address or the actual physical memory on your computer and each VM is associated with a set of page tables right so you can think of like an entry and like a dictionary for instance that contains all the necessary information for all the VMS running to

perform that translation so here's kind of another diagram of how this looks like so on the left hand side we see a child partition it operates just like a normal operating system right modern OS has a user virtual memory that virtual memory gets translated into physical memory so we just run we act like nothing's happening we're the only OS running right so the VM goes to access some memory well that memory as we know doesn't actually exist on the actual host operating system right Xerox 2000 and the VM doesn't exist at 0x2000 on the host so once the VM goes to access that address that can be intercepted as we see with the arrow and gets converted

into the actual memory that the VM needs to access and the way this has done is each VM is associated with a virtual machine control structure over the MCs and it contains a pointer to the page tables that correspond to each en so as I mentioned each VM has a vmcs structure and each VM also has a set of page cables that contains that necessary information so now that we've talked a little bit about the virtual virtualization infrastructure um we can kind of get back to VBS so VBS just uses these same principles that we talked about right and it can isolate memory basically how we can isolate VMS VMS can only access the memory inside of their VM and the

CPU is responsible for fetching the actual physical memory but we can kind of use these same principles but instead of creating virtual machines and child partitions we split up the OS currently into two virtual trust levels or btls foreign

journals or heard of it uh the seventh edition part one this is taken from there and basically your typical user mode and kernel mode what you operate what you interface with when you boot your computer is located in vtl0 and that's what we call the normal World basically and now there is a higher entity called vtl1 which contains the secure kernel so basically this may seem a little confusing but just imagine we can kind of treat vtl 0 where we infer that you know malicious attackers are we can treat that as a VM basically so as we saw earlier the CPU can kind of gate memory access by intercepting uh the VM memory access we can kind of use those

same principles to gate memory access from there so vtl is basically a PM but it doesn't have like a virtual hard disk networking any of that infrastructure um and the hypervisor basically isolates the OS as I mentioned um into ptls and those vtls just like VMS they can't access each other and this is actually what allows hypervisor protected code Integrity which is the main mitigation we'll talk about um to work and hbci basically is a mitigation that falls under the purview of virtualization-based security so we can think of VBS as this umbrella it has all these cool mitigations like credential guard if you've ever heard of that hvci is one of those that falls under uh

device guard essentially which is part of that and hvci is the answer to our question earlier so we've kind of Gone full circle here how do we block unsigned code execution in the kernel if the kernel is the highest entity right again the kernel can't have a mitigation that's defending against an attack that's already in the kernel you can compromise the Integrity of that mitigation if you're inferring or assuming an attacker has access there already so how does this work well we now have the hypervisor we have a higher entity than the kernel so when VBS has enabled both vtl0 and vto1 our place in the root partition so as we talked about earlier your computer boots

and the root partition takes up the physical address space well a VM allocates from that root partition now we kind of shove both of those vtls into the root partition and what you may be thinking like what's the benefit of that why do we need all that translation we talked about earlier well since vtl 0 and vtl1 theoretically reside in the same address space the expanded page tables we don't need to really use them I mean in some cases they are but we don't really need to use them for translation purposes right we're already in the same address space um so what they're actually used for now is to create an additional set of page

tables with an additional set of Pro and those permissions are managed by the hypervisor so they can't be touched by the kernel and so what happens is these extended page tables are configured by vtl1 which as we saw earlier was the secure kernel so this is the trusted entity in the windows OS and what happens is that the vto1 can configure memory permissions how it sees fit so we can say hey this memory shouldn't be accessible by the kernel this memory should only be writable and it's basically creates those and they're managed by the hypervisor right um so when vtl1 creates these permissions they're stored as in the hypervisor and that's a higher security boundary than the kernel so the kernel

can't directly compromise that and we'll kind of show an example here so in this example we saw a familiar acronym of pte or page table entry right now we describe this page as readable and writable and we also have an extended page table entry which is the hypervisor's kind of view of the memory essentially and it also says readable and writable well now let's assume an attacker can use that same vulnerability to locate the page table entry which is basically the metadata that describes that memory and we say hey we're going to corrupt this an attacker we can locate it because it's stored in the kernel and we're going to corrupt it to make it

executable right so the kernel thinks this page is writable readable and executable but the hypervisor actually says no it's not it's readable and it's writable so what happens is when the attacker goes to execute this page now they think it's executable but actually it's not because the hypervisor says otherwise remember vtl1 can configure the permissions how it sees fit and then it's immutable after that from the kernel so in this case an attacker will crash and you're not actually able to create executable memory um so you may be thinking like there probably has to be some legitimate interface in order for the page table entries which are traditional way of accessing memory um or describing memory excuse me and

there is it's called the hyper call which is similar to a Sysco which we're not going to get into in this talk but there does exist a legitimate interface to do that [Music] so as I mentioned this is done by using the same concept we talked about earlier as when a VM goes to access memory the CPU can intercept that right and perform the translation to access the actual memory under the hood what we can use the same principle to treat vtl0 where the normal people live or like your typical everyday users we can treat them as VM essentially and anytime they go to access memorable not every time but let's say that they go to access some

executable memory for instance we can validate that through the hypervisor and say hey I see you're going to access some readable writable executable memory does a hypervisor actually say that that memory is readable writable executable and if it doesn't we can mitigate it essentially or uh or not let the execution occur so here's the summary basically um vtl1 can set up the proper permissions of what memory should be and it's immutable in context of the kernel the way it does this is through the extended page tables and those are managed by the hypervisor and the hypervisor again is immutable in context of the kernel so an attacker even with a kernel mode vulnerability that we showed earlier

cannot just arbitrarily access the extended page tables which actually contain what we would call the ground Truth for how memory should be defined foreign so now that we know all of that information let's kind of get into how exploitation changes with hvci so so far we've talked about hpci being used for enforcing immutable permissions but we've talked about those of executable or non-executable on a page well hvci actually can be used for a gating additional sensitive items in memory such as the kernel control flow guard bitmap what is kcfg kcfg is the Chrono mode implementation of CFG control flowguard and if you're not familiar what this is is it's a mitigation basically that validates anytime an indirect function call

happens so any call that happens to like a call register or call function pointer a bitmap is created at a compile time of all of the known valid call targets essentially all of the functions we can expect to call and every time an indirect call happens that gets inspected and it says hey is this function pointer do I know about it does it exist in the bitmap if it doesn't well it must be malicious because we didn't know about it at compile time and we crashed the process right earlier how did our exploitation work we can make readable writable executable memory all day long but we need some way to execute it so what we did was we

found a function pointer we overwrote it and the next time that function pointer is called it calls into our malicious Shell Code right well what do we have to do we had to override a function pointer what does kernel control flowgard inspect function pointer calls right so we're not able to do this anymore but again uh kernel control flowgard has this bitmap or this dictionary of all the known call targets right but we run into the same conundrum where kcfg is preventing against kernel control flow hijacking right so we assume the attacker has access to the kernel right well if we implement the infrastructure for the mitigation in the kernel and we assume that the attacker already has

access to the kernel again the kernel is trying to defend against itself we need some higher entity some higher um privilege right that would render control flow guard uh useless because we could crop the bitmap before we hijack control flow right we have kernel mode read and write we can just crop that bitmap to say hey all memory is valid call Target and then we've rendered it useless right well since we have a higher security boundary hbci basically can say on the extended page table entry that corresponds to that bitmap it can say actually no the bitmap is read only and the kernel can't touch it at that point so in this example in the bottom at the

bottom of the screenshot it says memory access error when we try to overwrite the guard I call bitmap which is the symbol that corresponds to the bitmap right so even in the debugger when we use EP which basically edits memory we can't do it because the hypervisor says this page is read only even though the page table entry on the right hand side says it's K kernel and W right we can't do that because ptes are the Kernel's view of memory not the hypervisors so now that we've kind of talked about all this here's where we stand oh and another interesting thing I want to talk about for a second this is the same thing that credential guard uses if

you're familiar with this right so credential guard can do this same thing and it basically can say hey these secrets like an lsas for instance attackers can dump we can actually Mark those pages as not valid um in the kernel and make them hypervise their only Pages for instance and you can't even map the memory into the kernel so credential guard uses this same kind of infrastructure which probably more people are familiar with at the Enterprise level so here's what we can here's where hpci uh puts us the boat we're in essentially we can gladly write to rideable memory all day long with our Shell Code but we can never make it executable why because

we can't corrupt the metadata to say hey CPU we trick it to say hey this is actually executable well that's managed by the hypervisor we can't access the hypervisor even if we could make executable memory though control flow guard in the kernel is going to inspect anytime any uh function point or hijacking of hers well even if we have executable memory we need to execute it somehow we need the target we need the kernel to somehow execute this memory but we can't overwrite any memory essentially with a function pointer to trick the kernel into executing so essentially uh we can't execute Shell Code or unsigned code that's not possible but if we kind of get more meta I guess

with it as you could say what does Shell Code actually do um so Shell Code basically if we think of like a C2 framework like Cobalt strike for instance right it has some Shell Code that calls malicious Windows apis to open up connections read files download files Etc these are all documented functions and and windows essentially uh c2's just abused them essentially like in the last talk we heard um there weren't really any vulnerabilities but we abuse Tools in ways that they weren't intended that's what c2s do so if it's E2 they only use Shell Code to call Windows apis basically to do things what if we could come up with some hbci compliant way to call Windows

apis which is what shell code is used for anyway but without using Shell Code that's great but the main issue is how do we gain execution right well kernel control flowguard has a limitation it only inspects calls and jumps essentially are the we can think though are there other ways to transfer control flow in a programmer well if if you're familiar with assembly a little bit what happens anytime you do a call there's usually a return right which returns back to the function that called it well that is a control flow transfer a return address gets pushed onto the stack and that eventually is used to know where to return execution back so what if we could somehow leak a

kernel mode stack locate a return address and override it and do the same thing as function pointer overriding but do it on a return address because again kernel control flowguard doesn't expect those well that's perfect because there's an undocumented Windows API called NT query system information that can be called from user mode by a medium Integrity process which if you launch a process on Windows by default this is normally what it's uh launched as that allows you to leak a kernel mode K thread object for a given thread so what we can do basically is we can create our own thread calling create thread from user mode perfectly documented function we can create it in a suspended State

and kind of call it a dummy thread and then we can call NT query system information from user mode that allows us to retrieve the kernel mode thread object associated with that thread and you may be thinking this doesn't make sense like there's a user mode thread why how are you able to get something from the kernel well all threads are actually kernel mode objects even if you have a user mode thread it's still represented as a kernel mode object right if all threads were managed in user mode all users are in user mode and they can compromise the Integrity of all those threads right so all the metadata all the structures all the infrastructure for the most part is

located in the kernel so what we can do is we can link the K thread object associated with a thread we can control and if we look on the right hand side of the screen a k thread object is a structure essentially that has two members stack limit and stack base that's the beginning and the end of the stack so basically we can leak a kernel mode thread and we can link the stack from that kernel mode thread by calling a documented API in user mode so when this happens we can inspect all the return addresses on the stack which I mentioned earlier which is what we're interested in and we need to pick one of these to

override essentially so we can kind of think of these as function pointers but the really return addresses we over we can override one of them with our own memory and when that return address goes to get executed what will it do it'll transfer execution to our controlled memory right so in this case we're going to use the Ki APC interrupt address on the stack which is located in the middle and I'm not going to get into apcs because this is not an apc's internals talk and to be quite Frank you can read more about it elsewhere but basically an APC is an asynchronous procedure call and a suspended thread which is what we did with the thread we

can control right we created in a suspended state anytime that happens there's an APC that's queued to that thread and basically tells the thread to do nothing which is why the threat is able to sleep essentially or stay put and what this actually does it puts the thread in a weight state for an object basically and it looks at the suspend count right so anytime the suspend count is zero that indicates the threat can be resumed right so if we create a thread that's suspended that puts the suspend count at one that makes total sense right if the suspend count is one that means this thread is suspended and to decrement it down to zero we call resume thread so

pretty straightforward right make a thread that's um in a suspended State we can resume it with the API resume thread which is callable from user mode and an APC basically it's like a thread in that it can execute execute code but it's not it's not used to execute code in context of a particular thread so the reason I talk about this like what the heck is this for basically anytime we create a suspended thread we know that an APC is going to be queued to that thread to tell that thread hey don't do anything wait right well the way apcs are queued or issued is through a kind of software interrupt so we can infer that when we

make a suspended thread the Ki APC interrupt which is the function basically used to dispatch this that return address should be present because that function needs to be called right so long story short suspended thread is going to have a return address and it's kernel mode stack for KI APC interrupt so we know a reliable thread we can always overwrite well since we've leaked the stack we can use a kernel mode read vulnerability to locate that return address on the stack so basically we have the start and the end of the stack from linking the K thread object we can scan the stack look for our return address that we're interested in and override it and we can

resume execution through resume thread so right now the thread is suspended we locate the return address we override it then when we go to resume the thread eventually that return address will get executed and we'll control execution so I haven't used a lot of diagrams up until now but here's kind of what this looks like essentially so if we look on the top we can see that using a kernel mode revulnerability we can scan this back and we can find the stack address that contains our Target return address right the return address is going to be used to return at uh control flow somewhere else in memory and then if we look at the bottom we've

overwritten it with a dummy value of 41 41 4141 if you're familiar with any kind of binary exploitation that's kind of the de facto what we use to validate or use a proof of concept so in this case we get an access violation because when the return instruction happens as we can see at the top what are we trying to return to the red Adder is 4141 4141 that's not a valid memory address so we crash thus proving we can control the return address and where we call to so we can control the hijacking of control flow or excuse me we can hijack control flow to control where the kernel is going to execute code from now

basically we can force the kernel to execute anywhere in memory but remember we can't execute our own code we can't create Dynamic executable code because of hvci so what we do is we can mimic Shell Code Behavior using an exploitation technique known as rock or return oriented programming so we'll talk about rough here now basically we saw we can corrupt a return address on the stack correct we can corrupt a return address the the kernel will execute that return address and we're all hunky-dory we can control execution right well what if we could flood the stack with a bunch of fake return addresses so every time those return addresses are executed we can control what those are

right that's known as a rock Gadget because each one of those return addresses um they basically contain some kind of Interest interesting sequence of instructions that end in a return instruction and we can chain those together in what's known as a rock chain to call into a Windows API just like Shell Code does which is the whole point of what we're trying to do a C2 for instance calls open process to get a handle to a process to read some of its memory for instance that's done through a legitimate interface the windows API well that's what shell code is normally used for but we can't do that so we need some other way to go about it

so here's an example Rock chain basically so on the left hand side we have the stack essentially and we flood the return act we flood the stack with return addresses from the tech section of a portable executable and executable or the kernel for instance right why do we do this the text section is what contains all the executable code for a given application or the kernel right so basically we can reuse memory that's already executable that ends in a return well why do we do this what a return actually does is it takes the stack pointer and it loads it into the instruction pointer which is what is executing the next instruction basically so we can control the stack we basically

can use a code snippet to execute some code or Rob Gadget let's say move Rix RBX that does some action ends in a return what does that return do it goes to the stack and it picks up the next route Gadget to execute y takes the stack pointer puts it in the instruction pointer so as you can see each one of our Rob gadgets naturally will execute the next one and what we've done in our example is we've called into a malicious well it's not a malicious function it's a documented Windows API located in kernel base which basically is most of the functionality that kernel 32 used to be and we can use fake return addresses to

set up the correct function arguments to call into that function and change the permissions memory for instance so that's an example of what a rock chain can be used for we basically can call Windows apis using peer return addresses so let's look at an example of this like why would we want to do this what can an attacker do with this right so if you're familiar with micro or excuse me Windows Defender the anti-malware service process is known as msmpng.exe and it actually can't be terminated from user mode it's what's known as a protected process light or PPL if you're familiar with it again this is not a PPL internals talk there's much better material than I could ever

give on this but basically what a PPL it's a special kind of process in this case an anti-malware process that prevents other user mode applications or processes from tampering with it for instance so if you want to terminate a process on Windows this would be a good one to terminate you can't do this even with a user mode handle as an administrator so the way you talk with processes on Windows and user mode is through a handle which is basically an intermediary object that can communicate on behalf of you in certain capacities to a current mode object so we can't even even if we're an administrator we can't get a handle to this process with the necessary

permissions to terminate it but we have a kernel vulnerability for instance so what we can do is we can get kernel level access to terminate a PPL as I mentioned it's not possible even with administrative access and user mode even with vulnerability essentially so we need to get our handle to this process from the kernel and we can accomplish this using our exploit primitive to arbitrarily invoke any kernel mode API historically we would have done this in the form of Shell Code as we saw earlier we would have called open processor something in the kernel will ZW open process obtain the handle and it would have been that easy well we can't do that anymore because we can't

create dynamically executable code in the kernel with a hypervisor prevents us from doing so so we saw an example Rock chain earlier this one made me kind of hard to see but basically this using pure return addresses can call ZW open process which is a way to get it handled to a process so basically what we would do is we would use that same primitive to locate a return address flood it with a bunch of fake return addresses to call ZW open process and get a handle right well after we execute those Rock gadgets we've completely corrupted or smashed the stack um all of the information on the stack right those return addresses that's not

just there for no reason those are legitimate return addresses that are need to return execution to a given part of the kernel and so we don't crash essentially right if you call a function and it it infers well I know I'm going to go back to this function through this return address you've corrupted that tenant basically and you're going to cause a crash because any uh exception or trip anything like that in the kernel it calls it's a blue screening depth right so from an exploit perspective once we've done the exploit work we've smashed the stack we need somehow to recover from that so what we can actually do is um in the next slide

what we can actually do is we can append another rock chain to the first one so we call ZW open process to get a handful to the defender process so we can terminate it again we have to do this in the kernel because it's a PPL then we can call ZW CW terminate thread right again we would do this in Shell Code normally but we can't we need to do it in Rob so ZW terminate thread basically will pass in the handle to our dummy thread where we're doing this exploitation work and the kernel will handle all the cleanup for us so basically we call whatever we want whatever API then we end that call in ZW

terminate thread and then that will gracefully um resume or gracefully terminate that threat and here's what this looks like so again we call ZW ZW terminate thread which is a Windows API documented by Microsoft which again that's what attackers will use and we flood the return address or excuse me we flood the stack with fake return addresses to call into this function and then we resume the thread which kicks off all the execution and eventually will um perform the action we would like so basically I've shown that we can do all the exploitation that we would like but there's another caveat of two caveats actually so we can get a handle from the kernel using a kernel mode API call through a

kernel vulnerability and we can get that handle to the process we want to terminate right um but the issue is that handles are actually stored in a per process handle table so if we think about this we're doing this exploit work and theoretically an application called exploit.exe right when we open the handle to the defender process where is that handle going to get stored it's going to get stored at exploit.exe's handle table but if we're getting a kernel mode handle or excuse me first I'll talk about um so yes it's stored in a per-process handle table but additionally um Microsoft Defender registers some kernel mode callbacks and what that is basically it's notification almost in

the kernel that anytime some action happens any entity that's registered a callback will get notified uh for instance there's a process creation uh kernel callback so anytime a process created any application or driver for instance that registers a callback it'll get notified hey this process was created well there's one for object creation which is what a handle is an object and the defender process actually registers kernel callback tables that inspects anytime a handle is created and anytime you try to open up a handle to Defender even if it's in the kernel with the uh necessary permissions to terminate the handle it will strip the access rights so you're not able to open up a handle

to the defender process even if you're doing it from a kernel mode API call in context of a user mode process basically because the fender knows hey this is weird this shouldn't be happening and they're probably going to try to terminate my process this is an issue one for the region I just mentioned as we can see we don't allow handles to be open to that process with the necessary permissions but the bigger issue as well is that the per process handle table right so even if we're able to open up a kernel mode handle so there's a difference between a kernel mode handle and calling the kernel mode API to get the handle you have to set some specific

parameters to say hey this is a kernel mode handle right so let's say we were able to get a kernel mode handle or remember there's a per process handle table all of the kernel mode handles are stored in the system process we are the exploit.exe process so anytime we try to use that handle from exploit.exe let's say we call terminate process to terminate the defender process we have to pass in a handle to say hey here's the process we want to terminate well exploit.exe has a handle table with all of its handles it only knows about its handle table so it will try to look up Hey where's the handle that this user passed in well the

handles are kernel handle so it's stored in the system process not in exploit.exe's process so even if we get a kernel mode handle it's stored in another process that our current process can't touch I know that's a mouthful but basically we can think of it as the kernel handle is somewhere else we can't access so let's start with issue one we can't do a user mode handle so again there's a difference between a kernel mode handle and getting a handle in context of the kernel so in this case there when you call ZW open process there is an object attribute um structure of what you have to supply and within that object attribute structure there is

an attributes member and there's a few values you can select one of those being object kernel handle so when you call ZW open process this tells the call this tells the function basically hey we want this to be a kernel mode handle so that's our first issue we've completely solved that and here's an example of what that looks like we set the value here so we have the kernel handle great defender will will not do anything when we open up a full permissions handle to that process in order to terminate it but it's inaccessible from our current process because it's stored in the system process so what if we could force our process to think it needs to look up

the handle and the system process handle well there is a way we can do this essentially uh they're the K thread object which describes a thread basically it's the kernel modes representation of a given thread there's a member called previous mode and what previous mode is is it describes or indicates where execution um when you call the system service call for instance do a system call it says where this call originated from so if previous mode is set to one this says hey this call came from user mode well if it came from if it's zero we say hey this thread originates from the kernel essentially or this execution so we know we can leak k-thread objects

so what if we could use our kernel mode vulnerability to corrupt this basically why would we want to do that well if we actually look at the an Ida the NT terminate process call which is the syscall that happens as a result of terminate process it captures our current thread and it captures the previous mode value and it uses those in an argument to this function called OBP reference object by handle with tag and what this does when we get into this function uh that there's a few checks the first check is is this a kernel mode handle well obviously yes it will be then it does a text it says is previous mode zero and if previous mode is zero

and if the handle is a kernel mode handle the table we use to look up the handle is the OBP kernel handle table which contains all of the kernel mode handles so the tldr on this basically is if previous mode is zero the handle lookup is going to happen in the system process so we can we can do this in our export for instance we can open a handle to the current thread once we've got a handle to the current thread which is going to do the exploit work essentially we leak the K thread object associated with that thread so now that we have the K thread object we can use our kernel mode vulnerability

to corrupt the metadata of that k-thread object and we can set the previous mode to zero we can clear it out and this kind of tricks the kernel into thinking oh this system call terminate process is originating from the kernel let me look up the handle in the kernel handle table so here's how our exploitation looks like now we would use raw to call ZW open process to get a handle to the defender process we then set up a second Rock chain after that call that calls terminate thread to gracefully exit from that thread we also use the kernel mode vulnerability to corrupt previous mode to zero and then using that same thread we call terminate process and when we

call terminate process using our kernel handle the kernel will look up the handle look up the handle in the proper handle table where it's stored and we'll be able to terminate the process so here's an example of what this looks like

oh goodness oh here we go so in this example we run the exploit it's doing all the work we talked about and on the right hand side we can see that the process terminated it went by very quick but basically if we look at the results of the output of the exploit we do all the things we talked about we locate a stack address that contains our Target return address we corrupt it we then open a handle using all of our vulnerability Primitives we talked about to the defender process we clear the previous mode and we let execution happen but what happens when execution happens we've corrupted the return addresses so when they're executed they'll return into our code

so basically we're able to arbitrarily call kernel mode apis even with hvci enabled again this is what shell code is normally used for but we can't directly use it we have to use some sort of other means as we can see very convoluted means but it would it would have been much easier to allocate some executable memory hijack a function pointer call it and then let all the Shell Code do it for you we can't do it we have to reuse existing code so now let's talk about augmenting heci with control flow integrity and these are the last few slides that I have so bear with me here what did our exploitation infer or rely on it

inferred we could control the stack and flood it with fake return addresses correct using a rock chain well there's a hardware mitigation now called Intel CET that makes this no longer possible so Intel CET is known as control flow enforcement technology and it's a hardware mitigation that basically protects the Integrity of the stack so what happens is anytime a call instruction happens which again how do we turn addresses get on the stack someone called something and that return address was pushed onto there so when the call is done we know where to return right that only gets pushed on the stack but with CET there's a shadow stack now which is only accessible through hard

well I guess it's accessible through software but not directly essentially and it's a hardware stack basically and what happens is any time a return happens the known good copy of the stack which is not accessible from software is compared with the stack currently the normal software stack and as we can see if we corrupt the normal stack but the shadow stack has the preserved copy it sees there's a mismatch and you it will crash basically it's kind of like hbci it doesn't double check from a more immutable um entity right so recall we had to use Rock because we can't directly execute Shell Code with hbci so basically we had to find another way to do it using Rob well

that's not possible with CET so the conclusion here is um when you have a mitigation enabled it usually needs to be coupled with another mitigation right so hvci is probably the most powerful mitigation that Windows has it's an amazing technology but it relies on some um Notions that you can't control flow hijack basically so we couldn't overwrite function pointers we could overwrite return addresses well when you couple uh forward Edge checks like calls and jumps and you also have CET to check the return um control flow when you couple that with hvci that will greatly raise the bar for exploitation because you basically forced uh people into Data only attacks instead of corrupting or hijacking control flow

so Windows actually only uses the shadow stack portion of CET so CET also comes with indirect Branch tracking which is also kind of like control flow Garden checks calls but since Microsoft already has control flowguard and a new and improved version called extended flowguard they only use the shadow stack portion so basically Windows checks return addresses through CET and checks forward Edge calls through the control flow guards um so you would need to basically bypass the Microsoft CE CFG or extended flowguard to keep a technique like this alive because you're not going to bypass CET um that easily and your Windows machine has a lot of cool mitigations like this for free that greatly raise the bar for exploitation

so I would highly recommend that you enable them so that's the end of my talk uh thank you and if there's any questions I'm running short on time but I'm happy to take them other outside as well thank you [Applause]

No Code Execution? No Problem! — Living The Age of Virtualization-Based Security

Related talks