Keep Your Return Address Close and Your Enemies Closer

Name: Keep Your Return Address Close and Your Enemies Closer
Uploaded: 2025-12-10
Duration: 48 min 58 s
Description: João Moreira and Rick Edgecombe discuss how a security researcher and kernel engineer collaborated to identify and resolve a critical design flaw in Linux CET Shadow Stack support just before release. The talk explores the trade-offs between security, performance, and compatibility that emerged duri

BSides PDX 202548:5841 viewsPublished 2025-12Watch on YouTube ↗

Speakers

João Moreira Rick Edgecombe

Tags

CategoryTechnical

TopicVulnerability Research

DifficultyAdvanced

ResearchTechnical Deep-dives

StyleTalk

About this talk

João Moreira and Rick Edgecombe discuss how a security researcher and kernel engineer collaborated to identify and resolve a critical design flaw in Linux CET Shadow Stack support just before release. The talk explores the trade-offs between security, performance, and compatibility that emerged during this late-stage security review, and emphasizes how early collaboration between security researchers and systems engineers produces more reliable outcomes.

Show original YouTube description

Keep Your Return Address Close and Your Enemies Closer. How a kernel engineer and security researcher collaborated to tighten up Linux shadow stack - Joao Moreira & Rick Edgecombe Intel's CET Shadow Stack is a CPU feature aimed at preventing Control-Flow Hijacking shenanigans by implementing a redundancy copy of the process stack, which can be verified for integrity through the program execution. Supporting CET Shadow Stacks in Linux applications is something that took a long long time to be implemented and deployed, and given the magnitude of changes required both in the kernel and in the toolchain, there was a reasonable chance that security details could be missed in the process. In this talk we'll cover the interactions between a kernel engineer and a security researcher regarding a last minute security finding that ended-up surfacing an intricate trade-off discussion around safety, performance and compatibility. These discussions led into redesigns of the shadow stack support at the brink of its release and are still relevant as new feature designs still stumble on the gritty details of these trade-offs. Besides the technical scope, this talk aims on emphasizing how the collaborations between software engineers and security researchers can be fruitful, fun and crucial to achieving more reliable security outcomes. João Moreira is a systems security researcher passionate about compilers, OS internals, and digging deep into low-level bugs. At Microsoft, he works on securing cloud infrastructure by reviewing service designs, building secure architectures, and developing defenses against emerging threats. Prior to Microsoft, João worked at Intel, SUSE Linux, and spent time in academia, where he focused on low-level systems topics like control-flow integrity and binary live patching. His research was presented at conferences such as Black Hat Asia, the Linux Plumbers Conference, and the Linux Security Summit. Every now and then, João contributes to open-source projects like the LLVM compiler and the Linux kernel. More recently, he’s been trying to figure out this AI thingy — but he still struggles to write short conference bios with the help of chatbots. Rick is a Linux kernel engineer who works on security related features, virtualization and memory management. --- BSides Portland is a tax-exempt charitable 501(c)(3) organization founded with the mission to cultivate the Pacific Northwest information security and hacking community by creating local inclusive opportunities for learning, networking, collaboration, and teaching. bsidespdx.org

Show transcript [en]

[music]

[music] [applause] morning folks. It's really exciting to be here. Uh I've been living in Portland for a while and always like aiming to talk at besides PDX and this year the dream came true. So yeah, really cool and especially to be sided with Rick who's a co-orker I had the pleasure to work with back at Intel when I was still there and we did some cool things together which we're going to talk about right now. Uh also I need to mention this like an honor to us to know that you guys decided to be here instead of like be going to the fair to get like a nice launch. That means a lot. Thank you

for being here. So let's go. Uh first thing just some disclaimers for the sake of Rick's employee but let's jump straight into it. So the thing that we're going to talk about here is like comes from like the control flow hijacking kind of problem. If you are I mean raise your hand if you're like a low level person if you understand like memory corruption and all of that. Okay, we have a good amount of people here. Cool. So uh we're talking about control hijacking and uh I'll might go a little bit nitpicking on this cuz some people may not understand what we're talking about. So I'll try to give them the feedback uh the background about this.

So uh if you're writing code in C or C++ and all of that, you have like your bugs that sometimes uh allow attackers to manipulate memory in ways which were not intended in the first place. So you have like problems with pointers or like arrays checking and and buffers and all of that thing. So let's say you you you have like strings which are badly manipulated. Eventually like an attacker might figure out a way to override into a buffer right right beyond the limits and then corrupt like extra data that might be lying there. And in C you also have like this thing that we call like the code pointers. So you have like function pointers, you have like return

addresses which eventually is data that that eventually is used to like uh redirect the control flow of your program. So let's say you have like a function pointer that's supposed to call like a function fu, but it could also like eventually get up an address to call like the function bar instead of like fu and things like this. And uh because you have like this function pointers and you have the memory corruption bugs, attackers might just uh exploit those, find a way to overwrite your code pointers and then redirect your control flow of your program to whatever he wants it to go. And there's like a bunch of bad things that might come out of those. Uh this is like

really old problem. I think it was first figured out like in the 80s or something. And because of that, I mean researchers came up with mitigations and attackers came up with new techniques to bypass mitigations. And then you have like more mitigations and then you had like more uh bypasses and all of those. So it's like kind of a cat and mouse game always like trying to chase. The first thing you have is like right uh x or execute kind of memory. So uh imagine you you're able like to redirect the control flow of your program and then you have like uh variables inside your program that the attacker is able to manipulate. Let's say you have like a

description field that the attacker is just able to put whatever he wants there. And uh if you if you have like a memory that is writable and that at the same time might also be executable. This means that the attacker might just be able to execute to put whatever code he wants in that field that he's able to write to and then corrupt a code pointer and redirect the program there and then execute whatever he wants to. So people figured out that it was like a bad idea to have executable memory that's also writable and they came up with like this special bit in the hardware that says hey I mean this memor is like executable

but it's not supposed to be to be written to whenever the program is running so uh do not allow it to be written if if the CPU or program tries to write here just like throw a fault or something. You also have like another thing called ASLR which is basically like randomizing where things are in the memory layout. uh as you can expect if you're working with code pointers you need to know where things are so you can prop appropriately point to them. So with like ASLR technically whenever you run the program it's going to like load things in different areas and it makes it like harder for the attacker to sort of like know where he needs to point to.

So sort of like an obfuscation technique and uh because of these mitigations people came out with like a bunch of different ways of like bypassing them. So you have like code reuse which is basically you don't need to inject code into the memory address space. You can just like use things which are already there. You have like functions that whenever they're called out of context they are like strong enough to allow the attacker to do a bunch of stuff. You also have like memory disclosure which is sometimes you you figure out like a way to corrupt a pointer that's not like writing to a place but to read from a place and uh basically you're reading

you allow you it allows you to read into the memory address space put that on the screen somehow and then the attacker just figured out like where those byes are are what they mean and he bas like figured out like the entire uh memory layout of your process and by doing that he basically like figures out like your SLR and bypasses uh uh this technique to able like to inferior where like the things in the memory And uh the latest technique which is like probably like the most powerful one is like what we call ro or return rendered programming which basically is the idea of like reusing executable code. Uh imagine you are able to let's

say write to some place in the memory and uh then you able like to put a fake stack there and uh basically like by having this fake stack in place you are able to uh chain pieces of code together uh as long as these pieces of code are like ended with a return instruction. Right? I'm going to give you an example about that. But uh the idea is you put this fake stack there and then you somehow find a way to corrupt the stack pointer so that you are now like pivoting into the fake stack and now your program is going to run through that stack and that stack has like the the addresses of the instructions that

you want to be executed. And the cool thing here is that you can jump in the middle of a function. You don't necessarily like need to go to the beginning of a function. So you can just like jump and execute let's say the last two instructions and then because you go chaining all the last instructions in each function you can like sort of like build your own like kind of payload attack and and build do whatever you want as in a way like similar to what if you were able to inject actually inject code. Uh another cool thing depending on the architecture so if you were talking x86 x86 runs analign instructions which basically means that you can actually

point to the middle of instruction and execute it. So sometimes there are instructions inside the source code which are not like intended to be there. So if you're just like go like looking into the binary and also uh evaluating each string of bytes as like unique things. So not like looking at the instruction in its beginning but let's say you jump uh one offset from the beginning of the instruction and you look at it again. Now it might be a different instruction. So that basically allows you to find a bunch of different uh instructions which were not intended to be there that might allow you to do a lot of like malicious stuff. Uh so these

like pieces of code that we are chaining through the fake stack we call this the gadgets and uh let's see how it goes. So imagine that this is like your fake stack. We're able like to inject this in the in the memory address address space of the of the process. Keep in mind that this is data. This is not code. So it's like really easy to inject a fake stack in the process usually. Uh pivoting to the fake stack is a different kind of thing but is also like doable especially if you have like memory corruption stuff. It gets a little bit easier. So uh imagine you injected this fake stack and uh you have like your normal program

uh running and eventually it's going to run into a return but now we have like the fake stack uh uh pivoted to. So uh what happens is that that return address is going to use the address which is on the the top of the stack to return to. So it's going to like return into a place that we found in the code that's going to like run a pop RDI instruction and then a return. So what the pop RDI instruction does it's in x86 it will look on the top of your stack and it will like get the value which there and put like it to inside the RDI register which is the register usually used for

like the first first argument for whenever you're calling uh a function that's like AI defined not not going to go too much into detail there but basically whenever you call a function your code is usually generated in a way that is going to like put the argument inside the RDI register and uh whenever you call that specific function with the setup in place that function will consider that the first argument is inside the RDI register. So when this runs uh we grab like the the thing which is like on the top of the stack. Now the the address is not like on the top of the stack anymore because when we executed the return uh there was like an

adjustment to the stack pointer. The stack pointer is now pointing uh u one frame down. So we are going to grab this address of string which is like a string that we are also like able to manipulate inside the source code inside the the memory address space. And uh so what's going to happen is that we're going to put the address of the string inside the RDI register. And then uh we are going to actually like be pointing to a string that we manipulate which is in this case is like the bin sh uh string which some of you might know is like actually like a a shell uh application that allows you to run like a bunch of things and to run

your own programs, your own scripts and your own everything. So if you're able to turn a specific program into actually like an execution of bin sh this means you basically like own the whole thing with the permissions that it has. So we're now like pointing to a string that says this. And then uh there's like a return again, right? Um and the return is going to use the address which is on the top of the stack. Now we're like one frame lower because the pop also adjusted the stack pointer. And uh whenever we run this specific return uh what we actually like returned to is into the function system which uh defined by the I'm not sure if it's pos

API or or I think it's POSIX right uh by the API that's going to like run a system command uh in your L Linux and uh the system command is going to like get one argument and this argument in this case going to be the RDI which is actually like pointing to -ashbin- uh sh and uh basically we're running system to call uh uh the shell for us and that's basically how you might get on in a eventual or or hypothetical scenario. That's that's more or less like how a rap uh uh goes through. Uh we have going to have like questions in the end, but it's like really important to sort of like get

this concept. Anyone has like any question that want to get clarified before I move forward. No, that's great because we don't have a lot of time. Thank you for not having questions. So, uh because of this thing, there was like this new mitigation idea which is control integrity. It turns out I happen to be working on this thing. I don't know why for a very long time because it's super annoying. But it's also super cool. And uh control for integrity is is this thing that okay I mean like high level idea. We have a program uh this program is like going to have like a bunch of indirect branches. So we're going to like depend on addresses which

are written in memory uh to know where we should go to in specific contexts. What if we're able to sort of like limit where these indirect pointers are pointing to? So let's say if you have like a a function pointer, you're not going to allow this function pointer to point to anywhere in in the the address uh space of the process. You're actually going to say, hey, I mean these are like the set of addresses which are considered to be valid for this specific indirect call or for this specific return. And uh for forward edges uh you can basically like use uh compile horistics to do this. It's like kind of hard to do this uh as super precisely

because if you study computer uh complexity you know this is like not sound problem like points to analysis is not a sound problem. So it's like really hard to figure out what are like all the val value valid targets for a specific uh function pointer but this talk is about shadow stack so I'm not going to go in detail here if you want to know details about this uh case is sitt there he's like the expert into this so he might give you all the details just ping him. uh on the return side of things uh we're going to talk about shadow stacks. Uh it's actually like in a sense a harder problem but also in a sense a

better problem because here we are able to define the exact target for a return address. Uh when we are talking about uh specifically like about using runtime information. So if you are calling a function right let's say you have a function A calling a function B basically you have a return address which is going to be right the instruction right after the call instruction in the function A and uh if you're looking at that during runtime you are able to say hey I mean whenever B is returning it needs to return to this specific instruction which is after the call inside function A. So it's something that you can actually like tell precisely where you're supposed to

return to and not like have a set of targets that you are actually able to return to. So this like computed during runtime and the thing I mentioned about the oh thank you for reminding me about time. Um so uh for this thing which is like an academic proposal I think it's dates from 2005 originally by a guy from Microsoft and then like actually there's also like the J secret before that uh then academically it became it became popular from the the the academia this guy from Microsoft Martin Abodi and uh it sort of like started spreading out now you have this in the Linux kernel supported you have like this in a bunch of different places and uh the idea here

is like you you're supposed to be able to protect against a control flow hijacking attack even in the face of arbitrary rights and reads. This is like a really really high bar kind of thread model because I mean it's really hard to protect against arbitrary reads and the academics when they came up with this idea and when they start like studying it and proposing it they had like yeah I mean this is supposed to resist arbitrary rights and arbitrary reads and uh we're going to see in this talk why in practice it's like so hard to set such a high bar and uh why it is like so complicated to achieve it. Uh so speaking specifically about shadow

stacks now uh just before I switch to Rick uh basically the idea is the following you have like two stacks you have your program stack and you have like a shadow stack and whenever you do a call you put the return address into the regular stack but you're also like going to put it uh onto the shadow stack and then you continue executing your program. Uh the shadow stack is like a memory which is not writable. It's only like writable by co- instructions. So the only uh uh instruction is actually like able to write there is a call instruction and it doesn't get like any argument. So the only thing that's going to like write there is supposed to be uh

the that's going to be written there is like the the actual return address. It's not like in depth through because you have like an instruction to write on it but it's not currently supported. So I'm not going to get into too much detail here. Rick's going to talk about it. But uh my point is in terms of like architecture and design, it's just like something that mimics the actual stack where you keep like a value that you can check for before you're returning. So whenever the function B is returning to the function A, it's going to use the address which is on the regular stack. But before returning, it's going to check the shadow stack, check if both of

them match. If that's the case, fine. You're able you're able to return and all good. If these are different, this means that your actual stack was corrupted. So please stop the execution of the program cuz something nasty is taking place. it should work pretty automatically, right? Uh Rick's going to tell you otherwise. >> So, yeah, I'm going to talk about uh talk a little bit about um what was happening in the the Linux kernel enabling for this feature um to try to get some context for the security research we're going to talk about later. So, yeah, so like Jav said, uh you know, you call and you push to the shadow stack, you pop or you return and

it pops and verifies from the shadow stack. And this pretty much works automatically when you have a normal program that's just calling functions, returning from functions. Uh but it turns out that um shadow stack uh or that that user space has a bunch of more exotic rare things it does with the stack that can confuse the shadow stack implementation. And this is not something that's defined by the hardware at all. It's something that software has to decide whether to support these operations because some of them are very close to the rock that uh he was explaining. Um so whether they should be supported or how they should be supported is kind of up to software to

decide. Uh so a couple examples of that a couple examples of that. Um so for the the main example I think is user level threading. So this is where you have like an operating system thread and it has a stack and then user space sometimes can go and create can create a new stack just allocating memory and then swap to that and then start calling functions and then while that operating system level thread scheduled you could actually have more sort of software threads that are they're switching back and forth. And so obviously if you switch to a new stack and start calling and returning um you you might be returning from a place you didn't just

call and the shadow stack's going to get confused. Another example is long jump and this is an API where you can set a point in your execution then go off and do more computing. You know maybe you jump to a new stack maybe you're far away doing something completely different and then you can long jump back to that place that you said basically remember what I was doing here. So obviously this is going to be confusing for shadow stack when all of a sudden you completely reset the stack back to some previous location. There's also a sig alt stack which is a Linux kernel feature where you can uh configure the kernel to handle signals

on a different stack. So your program might be executing along and all of a sudden it takes a signal and then you just jump to a new stack and the program doesn't even this is something the kernel does. So the program doesn't even know it just all of a sudden finds it's executing on a new stack. And then lastly, like uh as a sort of catch-all thing, JITS can do a lot of weird stuff with the stack and shadow stack um you know and the compilers and all the kind of tools where you might be able to try to fix these things up um don't have a lot of visibility in into that. Uh so

upleveling a little bit from like the the technical issues, I'll talk a little bit about what people at the beginning were looking to use shadow stack for and we had a lot of interest from uh from distros. So the distros uh they have kind of an interesting way that they do compiler hardening. Um so there's a bunch of features in the compiler that you can turn on and it'll make your it'll it'll harden the the program that's being compiled. And so uh but these uh these compiler hardenings they work pretty much automatically. Oftentimes they work but not 100%. But they work close enough that dros can actually turn on these compiler hardenings like under a project build

file. So someone might have an open source project and a dro wants to turn into a package and they're going to go in there and sort of optimistically turn on these these features. Uh, and this works well enough that like there's some rare breakages, but then they'll be reported as a bug and then the dro can go and say, "Okay, well, this package will just turn off this hardening." That was the problem. Um, so, so the dros kind of wanted to use shadow stack in the same way. They wanted it to be so you can have uh you could just sort of turn it on and optimistically and it should just work and you'd have shadow

stack across your whole dro. So, you can maybe imagine where this is going. you know the there's a bunch of things that don't work automatically and then there's a use case which is we want this thing to basically work automatically. Uh so this was kind of a conflict and um the GIC C direction for this and this was not work I did so I'm kind of representing some other people's uh uh you know outlooks on this thing. How they how they sort of went about it was GIC has a pretty extensive test suite that exercises a lot of the lib APIs that do this tax task uh this stack manipulation stuff and they reckon that

if they if they could surpass the test suite then they'd have enough coverage of the way programs tend to behave to sort of do this kind of droide enabling. Um so they went around and they uh they did a bunch of um special like shadow stack implementations. So like long jump behaves differently when shadow stack's enabled. Uh and then there's a there was a bit in the elf headers in the binary that basically a dro could mark this binary works with shadow stack or this binary doesn't. So the kernel or the loader could decide to turn on shadow stack or not. So uh there was kind of a so there was like a couple trade-offs here. um you

know for uh uh and there was kind of a push pull between Linux kernel community, gypsy community, distros all kind of wanting to sort of say well we want a little bit more of this or we want a little bit more of that um that kind of was helping us tease out exactly where we wanted to have the solution. So like uh uh compatibility obviously was very important trade-off um because that was that was going to enable this use this use case of enabling the the the droide shadow stack. But performance kind of comes into play too because some of these operations like long jump for example is more of like an order one type operation where you just kind of

say reset to this point and the registers get reset and pop you're back to where you started. But with shadow stack um some of the implementations that were done for long jump involved unwinding the stack in a sort of order end way. So if you had an application that depended on long jump to be fast and a few of them do, then it could all of a sudden be surprised by um these long unwinding operations for its fast switching. Uh and then lastly, you know, security obviously is pretty important. It's a hardening feature. So that's kind of one of the main goals, but when you look at, you know, it depends on kind of how you gauge security. So you can

imagine if we had a a solution that had like 99% hardening level, but it only worked for 1% of apps. that's not going to get a it's not going to be very widespread. But if you had a an a a hardening solution that worked for like 70, you know, worked for like uh say it was like a 70% hardening level, but it worked for like 99% of apps, then you'd have sort of like in an area under the curve sense more protection. So all these kind of trade-offs um there wasn't like a real simple answer for for what to do. And so we were con we were constantly debating well is this important or is that important? Um and

this is around the time that I got in involved in enabling and when I was looking at the solution I found uh you know I was evaluating the whole the whole solution and I found this one uh particular corner which was uh the implementation around U context. So this is the GIC's uh user level threading um API. So this is the the thing I talked about earlier where you're switching between software stacks. And so the implementation said, "Okay, well, if you're going to have a new software stack, you're going to need a new shadow stack to go with it." Um, and so, uh, the hardware, uh, to switch to a new shadow stack, the hardware, and I'm

going to talk sort of generically about the hardware because there's actually like I think there's up to three shadow stack implementations, um, now like in different hardware architectures, and they all work uh, pretty similarly in this regard. So you have like a token that's on the shadow stack and it's like a special value and there's an instruction that says I want to switch to the shadow stack at this point. The hardware is going to check the token and then um and then if you know if it passes it'll allow the shadow stack pointer to be set to that shadow stack. So to to switch to uh to have a shadow stack um you need to have to

actually start using a shadow stack you need to have the shadow stack memory permission memory and you also need to have this value this special token value in the shadow stack. And so uh the way the kernel design was at the time there was a pro shadow stack. So this was like a memory permission like readon or executable just like the normal memory permissions uh that could be you Linux has some uh sis calls like m map or m protect that lets you create memory or change memory to different permissions. And so how GIC used this was it would first create a writable it would it would m mapap some writable memory and then it would write the special token

value using a normal write to the uh to the memory that was intended to be shadow stack and then it would mrotect which is the sys call that lets you change the memory permissions. would it would improtect this region to be shadow stack and then you had shadow stack memory with the token you wanted. And when I looked at this, I was wondering about the region of time when the memory was writable to normal instructions because GIC C was writing a token that it wanted. And I wondered, well, what if something else, you know, what if that's like the good that's like the good right to the shadow stack, but what if there was some other thread or something that

was had ability to write at the time and it could write bad stuff to the shadow stack. And I, you know, I with this pro shadow stack design, I talked about the compatibility trade-offs earlier. It really gave user space kind of a an you know a nice safety release where if it got itself into into it got itself into some sort of a bind where it needed to go reach some compatibility goal. It could always toggle the shadow stack to writable fix it up however it wanted shut it back to shadow stack and then and then go about it business. Um but I but at the same time we didn't know whether this was a real you know

sometimes there's hypotheticals that don't really come up and so it was kind I but I I knew Xiao is a uh you know there's this there's an unaffiliated group of people who care about kernel CFI and Xiao's certainly one of them and uh and I I so I went to him with a question and said basically like is this a real thing like is this exploitable um and because I want to know about whether this is like an important trade-off to make since we're going to to take some some cuts on the other the other uh the other um the other trade-offs. Uh I started then uh looking into the thing that Rick was working in and say

hey I mean can I come up with at least like a proof of concept u implementation of why this might be a problem in the future or why this is like not as strong as the thread model is supposed to be and I started first looking at this thing that they call like the make context and swap context and that's like GDC supported API for you to create uh different runtime threads within your process. Don't think about like parallel stuff here. Only think that you have your program running and eventually like you want to have like a different context of execution uh that you want to like let's say jump to it and then jump

back from it while your program is running. Let's say you want like implement C routines or things like this. This is like useful for that. So uh because I mean you're like creating a new context, you're like going to need a shadow sack for this new context and you're like looking on how the shadow sack was allocated and all of that. How does it work? And uh the way that the make context which is the the function that actually like going to like create the entire data structure for uh your new context. It it works. It basically like uh it's going to have like this e context truct there where it's going to like store all the data. Whenever you

call the function, you first need to allocate the actual stack, the regular stack for the for that specific uh uh new context. You do a maloc. You allocate some memory there and then you pass the pointer to this new uh chunk of memory to make context and uh then make is like going to start like uh prepping everything and like put the the data into the the context t and eventually it's going to allocate uh the memory for the the the new shadow stack and it's going to like uh make it like a writable uh memory because it needs to put the token into there. the token that he was just talking about. So here locates like

a a writable chunk of memory, writes the token in there and then it causes this call to transform it into a uh into a uh shadow stack memory. So it cannot be used as a shadow stack before you turn it into an actual shadow stack page and then it's going to like uh do some tricks with like call instructions to make sure that it puts the start context there. So imagine that you're running a context and uh eventually your context finishes executing. It needs to return somewhere. uh but there's like nothing below it. So it's going to like return into this uh start context function which is basically the function that handles the the the context kind of like

ending or finishing its execution. So basically like it put everything in place and uh makes everything that it's kind of ready to run and uh beyond that you have like the swap context function which is actually like the function that you're running your program and eventually like your program wants to jump into the different context. is going to use this swap context thing to jump to the different uh uh execution thread and uh basically on return whenever it like finishes running the context it's going to use the p start like I just mentioned so if you take a look at the start allocation inside the make context function that's more or less like the the steps that you have

there first allocates a writable uh memory then it's going to like write the shadow stack token inside of there then it's going to turn the page into a shadow stack page then it's going to like save the uh the shadow stack uh pointer into the context t The shadow stack pointer works similarly as the the return stack pointer the the the RSP inside the the regular stack. So it's going to like point to the top of our shadow stack. It's going to put it like into this U context C structure. Then it's going to like uh pivot into the new stack and into the new shadow stack. It's going to do some tricks. Uh basically it's going to like run a call

instruction in front of like a jump instruction to make sure it puts the jump instruction as the return address. And the jump structure is going to like jump into the start context. It's not super relevant but anyways. And uh then it's gonna like pivot back into the original stack and the original shadow stack and gonna like continue running until until swap context is now called to execute this new context we just created. So this is all good, right? It's supposed to work. It's beautiful. It's cool. Uh life's great. But not really cuz there's actually like a race condition here if you were paying attention and if you're like a a reader, attention attention reader. Uh I mean if

you have like another thread running in the same process and now I'm talking about the parallel thread indeed. uh this the the that that window when the shadow stack is actually like writable that basically means that a different thread could just come and write into the shadow stack before it's actually like a shadow stack. So we're able to put like a bunch of uh uh trash into the shadow stack before it's actually like usable. And uh that's basically on those seven steps I gave you. That's basically like where the race condition is. So like the first three steps uh you can basically like get a different thread attacking it. Okay. But is this like really dangerous? Is this like really a

problem? Because if you think about the stack uh before you use a return address uh you actually uh need to write to it right so you first do a call and the call is going to write to your stack and then you're going to do a return and then the return is going to read from the stack. So all the data that's being uh read was actually like written for written into the shadow stack before the shadow stack was already like unwritable. So this basically means that whatever we write there is not going to be uh really used because it's going to be like overwritten by the the chain of calls uh before it's actually like used.

So can we really use this let's say in like an exploit scenario? Is it like really something that we should be concerned about or is this just like you know kind of security paranoia? [gasps] Well except that there's like the stack pointer which is also stored in the U context uh um strct and the context strct is not into the shadow stack. It's like in regular memory. So that basically means that we're actually like able to corrupt the shadow stack pointer and by doing that we might be able to point above that amount of thresh that we just injected there which means that uh we might make our value survive uh even after the the shadow sack was

started like being used. So how would that happen? So let's take a look. So imagine like we're now trying to raise the shadow stack and put like our own rob chain inside there. So you have like your regular stack, you have your regular shadow stack, and you have the context uh strct which has like this the shadow stack pointer there. So we erased it. We wrote like a bunch of uh rob chain there like something really evil and that's going to take the world for us. And uh we also like set uh the the address of the push start like described into the the stack and into the shadow stack. We also like have the token there

which was written written by the the make context function and we have the the shadow stack pointer pointing to the top of the stack. All good here. And then uh you have a function A call in a function B. You wrote to the to the regular stack. You wrote to the shadow stack. Uh basically we this this chain I mean did override our uh our rob chain that was there. The red thing. Uh then the B is called C. I mean our addresses our our evil stuff got overwritten again and it didn't work out. And then when things start returning start like comparing uh the stack with the shadow stack and things were matching. So we were not like

really able to do anything meaningful here, right? But now with corrupting the shadow stack pointer which is in writable memory, the thing that we can do is basically we corrupt the shadow stack, right? We put like our our rob chain there, but we also corrupt the shadow stack pointer. And now we're not pointing to the top of the stack. We're talking we're pointing like one frame above it. And what happens is that whenever we do have a function A call in the function B, uh the function A is uh the address the return address in the function A is actually going to be written one frame on top of the things that we wrote there. Which basically

means that now our contents are surviving this this uh series of calls. So uh you exploit like a different memory right you override the regular stack you override the the the address of the the push start in the regular stack. And now you have like this function A call function B. And whenever like B returns to A, it's going to compare the two addresses of the function A. It's going to work out fine because these match. But now whenever the context stop running and it's supposed to return to the push start function, it's actually like going to return to the place to that we control. So basically what that means is that we're actually like able to bypass the

shadow stack policy and the thread model that is supposed to be enforced by this uh just by like these amount of rights. So that's that's pretty much like how you uh break the shadow stack or used to. Um so keep in mind this is a PLC. It we use like pretty strong uh uh primitives to write this PLC. We assumed that we had arbitrary rights and uh what we were trying to to say is that hey I mean are we like really being uh loyal to the to the thread model here or not? And the thing is uh I mean basically what we show is that the thread model is not like being capped. we had like the

the arbitatory rights, we were able to bypass the policy. Uh if you if you take this and bring it to the to the field of like writing exploits, you can keep in mind that it's going to be like really hard to write an exploit based on this for servers. But if you're talking about let's say consoles or like mobile devices and this kind of things that we have control over and that we're able to manipulate things over and you want to eventually unlock and things like that, this becomes like a really serious problem because it might not be that hard to implement an exploit based on this. uh for implementing this PLC we basically like uh did like a brute force

thing. So we wrote like a loop that was like kind of running the program running the program and trying to uh write the right addresses uh in the right time and make sure that we are able like to kind of bypass the thing in the right moment. Uh we had like to keep it like adjusting the timing because timing is like really hard and for race condition you need to keep trying until you you you write to the you write to that specific memory in the right time when that specific thing is actually like readable. if it's like before or after that you're going to like end up with a crash. But we just like kept

trying trying and honestly did not like take uh many many minutes uh uh before I mean we were able to achieve this. Uh basically what it proves is that the academic CFI was not being kept at this moment and because of that we had a couple of discussions say hey I mean perhaps the JBC uh way of implementing this is not the way we want to move forward and we might like want to think about a different way of like not having this uh race condition there waiting to be a disaster. We looked at this PC and we said, you know, like Josh said, it's um it was short of like a full application that was exploited, but it

was like enough of an example that we could say, okay, this is like not a completely hypothetical thing. We uh there so we ended up changing the the kernel design actually to get rid of the pro shadow stack and we added a new a new SIS call. So we I talked earlier about how there's the M map and M protect SIS calls for working on memory and so we added a new SIS call map shadow stack and it lets you go map shadow stack with a specific value sort of pre-provisioned in it. So you could put the token in it um and it pops up with the token already. It never goes through the writable the writable stage.

Uh and so the sys call has like you can sort of ask for certain certain specific values the kernel knows how to do. At first it was just the adding the token, but the ARM the ARM solution added a few more uh flags for this. So it's kind of grown since then. Um yeah, so that's what we ended up doing as a result of this. And uh so yeah, so one of the things that what kind of the genesis of this talk was we John and I were talking about how this was maybe a uh it's a little bit of interesting technical story, but also maybe it was a good example of how um you know security

researchers and engineers can work together well. Uh so I was going to give some some try to pull some lessons from my side. Um and I think one one big lesson obviously is it's good to go engage the security researchers early during the design because the stuff wasn't upstream yet which means we could we could just adjust it. Um much easier to change that than after it's upstream in the kernel because kernel is a stable API. It means we'd have to support you know if there was something that was like not not a good design we'd still have to support it you know even if we made a a new solution. Um, and then another thing is, uh, you know, I think

it helps to ask researchers to to probe the design or ask them more specifics because I think I see sometimes where people talk to security people and they say, "Oh, here's my really complicated design. Can you just quick take a look and tell me if it's secure or not?" And it's like it's almost it's almost funny because like the you know to for a security researcher to sort of evaluate something like that, they need to actually spend some time and analyze it and probe it. And so I thought it helped to sort of say instead of going to Xiao and say hey is this secure or not to sort of say hey can you exploit this

like show me your best exploit you can do against this and we'll take a look and see whether it seems reasonable or not. Um and then lastly I think you know one of the things that if you're a person that knows about security and you're like a super genius and you may think that you could just do this stuff yourself. Oh yeah. Uh, so but I think that it helps to have someone if you ask someone else to look at your design, they're going to bring sort of more of an adversarial perspective that you can't really bring yourself no matter how good you are. And so I think it really helps to have a dedicated security researcher

looking at this stuff to really give it a hard look. Um, you're going to get, you know, you're going to get better um, analysis than you could do yourself. >> Okay. And from my side as a security researcher and like real quick because I have just two minutes, uh, first thing, be available to support others. uh you won't be it won't be possible for like uh engineers to engage you early in the development if you are not available if you need like a big chunk of time if you need like to sort of like get a bunch of requirements first before you like jump into something. So my idea is make sure that you you if you're there make sure

that you have time to know engage with them to like put time in the project to make sure that you are able to uh uh do all the things and and help them. Uh that's that's actually like our our uh uh work. I mean, I used to think that if we're like in a in a MMO game or something like that, we are playing support. We're not playing tank or DPS. We are actually like the support. We need to help them build this kind of stuff. Uh follow the product design closely and early. So, if you can get like earlier in the design stage, you avoid like a ton of problems. If you're fixing things after they were deployed,

that's like a mess because you have like products in the wild that actually like using that and that becomes like really complicated. So, get early there. So I always prioritize looking at things before they actually like a product order before like they are out there. Uh I also like keep the thread model in mind. I think that keep the thread model like as a security researchers like you're really northstar that's where you're supposed to look into. So don't don't like overthink attacks. Don't overthink exploits. Don't overthink like a bunch of things that you need to have in place for saying that something secure or not. Just think about the threat model. Is the thing supposed to

uh resist uh an arbitrary right? No, not really. So that basically like is is how it's supposed to work. That makes our work much much much more simple in the sense of like I don't need to find an actual vulnerability to actually like build an exploit to actually go and say that this thing is not like the best design. So if you keep the thread model in mind, you make it like much simpler. You uh make it much more compartmentalized and much much like easier for other people to digest too. And finally understand the trade-offs uh through all requirements, right? So I mean we are we are security people. We have security in mind a lot of the time

and all the time. But uh there are other requirements besides security that we need to make sure that everything works. I mean it's not worthy having like a 100% security product that doesn't work right. So with that uh we're done a few seconds late and uh thank you so much for being here and I guess we are open for questions. [applause]

What's the state-of-the-art in terms of defeating Shadow Stack for Linux versus Windows? is like you discussed rap uh as one of the ways to defeat uh de depth and ASLR. So uh and it seems like shadow sack is fairly new within the last five or so years. >> Uh yeah, so that's a good question. Um I think it I think like you said it's new. Um and like Jiao was saying there's a cat-and- mouse game and I think Shadow Stack hasn't been um hasn't been explored as much uh yet. But I think also, you know, it's a it's when you talk about defeat, I think it's it's it's intended to be like a hardening mechanism. So, um I think it it and I

think it would be I think as I think as as security researchers start to look more at it, there's things we could do on the kernel side um to sort of adjust those trade-offs a little bit. And I think it would be I think it I think there's going to be more um research and like changes like this um in the shadow sex future. >> Yeah. I mean from from my side I tend to think that uh I mean CFI regardless of like it being super tied to the to the thread model or not. It's still like kind of super strong in the sense that it has like a lot of uh cost for for

attackers to write exploits. And I think that that's a great thing regardless of like it being super loyal to the thread model or not. uh but uh I've seen like research towards like trying to break CFI and mostly in the sense that memory uh page tables are not like super kind of protected at this point and that's like an easy target for attackers. I think that there was a talk at black hat maybe two years ago where a guy broke the forward CFI in the kernel by rewriting to the memory uh memory page tables. So basically he messed up with like the translation of the the virtual address and by doing that he made up

like that he was able to build whatever chain he wanted to. So that was like a cool thing. So uh if you think about the specific thing we just did show, we did we're not like breaking the the shadow stack policy itself. The policy is still like very tough. We're like in the kind of breaking the how it was implemented. So if you have like this thing kind of working properly, I think it's it's really good especially for shadow stacks which is like a onetoone kind of matching for returns. Uh so we basically like need to explore sideway uh uh uh uh vectors for like maybe get into the the shadow stack and find a a way to sort of

like break it. And also like something else that people have been talking a lot is about no control data attacks which basically sometimes you're able to do a ton of damage to the to the program just by exploiting uh data structures and not like really taking doing the actual control for hijacking thing. >> Uh Joe you had said that um uh servers might be less vulnerable than client systems. Could you maybe describe why you felt that way? uh because I mean for this specific attack you need like three arbitary rights right you need first to write in the shadow stack then you need to first write to the to the context then you need to write to the actual

stack uh return address and uh I'm not like a super experienced exploit writer I mean I wrote a couple but not many for my experience I think that it's like super hard to actually have three arbitary rights in the wild and like actually like do something like this in an environment you don't control Even if you do that, there are things like uh you need to bypass ASLR to have something like this. And uh in the wild, it's a little bit harder to to to bypass ASLR. I mean, it happens, but uh makes it like more complicated. And then timing all the three rightes, you know, in like a natural production environment. I mean, I I don't want to

say that it would never happen in the I mean, I'm saving enough to say I I don't think it's super safe. I think it could happen. I just think that it's a much harder thing to weaponize if you're talking about the the cloud server scenario in comparison to you have like access to the machine where you can probe things where you can get like addresses and all that. >> How many how many tries did you have in the PC? Like how many like in terms of number of tries did it take? >> Uh I think it was like around 10,000. >> 10,000. Yeah. >> Yeah. >> That's noisy. >> Yeah. So basically like uh I I think I

was able like to my my PC running in a loop. It took like two three two three minutes to to run it uh running until the like break thing and I think it would take like around 2 minutes to self adjust the timing considering like all the context of other things that were running on the machine as we're influencing on that and all that. So yeah usually like two to three minutes in the local machine I would be able to to bypass it but I had like the the three arbitrary rightes and I knew the addresses I had to write to. So make made it like much simpler to implement the PLC. >> Hello. I found this talk actually

incredibly interesting. I just had one question. You guys were talking about the make context and the swap context and I'm wondering because earlier in the talk you guys mentioned long jump and assuming that that's fast and like if you create that co- routine but then want to go back to your primary stack like does that utilize the long jump and was there any kind of investigation done there? Yeah. So, Long Jump um has had a couple different uh attempts at making it more compatible, but I think today the upstream one is not like you could run into problems I think with the long jump uh the way it's implemented, but there's a there's an instruction called

uh on x86 and I like the the other ones have something similar, but it's called incsp basically let you unwind the shadow stack. So if you're on the same stack, the long jump implementation can sort of incp basically unwind the shadow stack back to a certain back to the place you were. And if you're on a another stack already, there was some I don't know if it actually landed up upstream, but there was a scheme in mind which was to go look on the stack from the point where you're going to jump to search using a normal read for a token because if you left the t I didn't talk about this, but when you leave a stack, you can also

leave a token behind so you can swap back to it. So you can search for this the token on that stack with the expectation that if you left the stack you must have left one there. Uh and software has to sort of that that that software has to sort of um you know actually make sure that happens and then you can find that and then from there inc SSP back but even then that has problems because if you swap at the end of a stack you could um take a signal while you're unwinding and then overflow the shadow stack. So um there's uh there's another and actually I have a backup slide. I don't know if you want

to go. Yeah. So there's there's another there's another feature and this is kind of like where I think you know it could be interesting to see as shadow stack evolves. I think you know someone asked about whether there could what's the state of attacking the shadow stack and I also think about what's the state of shadow stack compatibility um that we're going to find once this thing is you know used in more and more applications. And there's a an optional feature called x86 is called WRSS and then the ARM the ARM version is GCS str and it lets it lets you have a specific instruction that can write to the shadow stack. So it's like a privileged instruction and

the idea is that you could go write um you if you enable this thing you put these instructions only in special places like long jump for example and then you could uh massively simplify like the schemes I was talking about where you're searching for tokens and and all this kind of stuff um gets fairly complex and slower. But with with these instructions you could actually just write a token and switch right back to it. And I think um you know I think we don't really know this this we actually we were actually just debating this on the mailing list the other day is like is how safe is this from the compatibility side? It like is a huge

win because you can just write tokens where you need them. Um but uh how if does it actually uh does is does it come into play in security and I think you know we don't know and um more research could maybe sort of guide you know someone says hey I try to look at this thing and we didn't we didn't find any problems with it then we you know that would be like a reason to maybe think huh we can we can enable even more apps with shadow stack if we rely on this thing. So does that answer your question? Yeah, >> somewhat. Yeah, I was wondering if make context or swap context possibly use

long jump. >> I don't know if you >> Oh, no, it doesn't. I mean, it does something similar, but it's got its own assembly, I think. And you know, we were also talking about gibb specifically. And uh that's where that was like the first that was the first libby that got shadow stack support. But there's other libs that could do things, you know, it's not like defined by the kernel how you have to do this. Like the libs could do it however they want. they could use a long jump if they wanted or you know. >> Okay, thank you. >> So just to to add a little bit to your question um long jump and side jump I

think that they use the regular stack whereas make context cont when I actually create a entire new context for this thing to run. So it's a little bit different but just to give you a need a pointer here. There's like a paper from I think four years ago called chop. Uh it's from an academic conference. I won't remember which one and I won't remember exactly here. uh but this this paper they actually like uh describe a problem trying to exploit uh shadow sacks on um on the exception handler context for C++. So it's like similar kind of problem you know you need to unwind the shadow sack to handle it properly. So if you take a look at that

paper you sort of like figured out like all the questions you have and how does that kind of work and what's like the the bypasses and all that. >> All right folks thank you very much. Uh please >> please give a great warm round of applause to Rick and Jo. Right. [applause] Yeah, you're wrong. Better. Excellent. [music]

Keep Your Return Address Close and Your Enemies Closer

Related talks