← All talks

Eliminating an Entire Class of Exploits

BSides PDX · 201825:43166 viewsPublished 2019-02Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
Ravi Sahita (@rsahita) Return-oriented programming aka ROP techniques have been used for many 0-day attacks. In this talk, we will describe an approach to eliminate this class of exploit using CPU instructions including what else is critically needed in software to take advantage of this defensive mechanism. we will also discuss what other exploit classes must be addressed broadly. Ravi is Principal Security Researcher @ Intel Labs.
Show transcript [en]

Uh good afternoon. So yeah, I'll try to cover um and and a topic hopefully, you know, a lot of lot of you guys have heard about over the last, you know, 5-7 years. Uh uh It's good to also be back at besides. I was I was at a different besides like 4-5 years back when we started working on this problem. And it takes quite a while to sort of uh you know, do some of these things that where we try to like eliminate a whole class of, you know, vulnerabilities or exploits, right? So, um I'll try to go over some of those experiences and what uh what we proposed in this in this space to to address

control flow attacks, right? So um So, I'll start with the summary. Um there's really two topics uh I'm going to cover. First is I'll briefly talk about uh you know, how do control flow attacks work? Um how do they actually achieve the goals of the adversary or the the software attacker in this case? Um to essentially do privilege escalation or essentially break the the intentions with which the software defenses were built, right? Um and then I'll introduce this new capability that um is proposed for future uh processors where we hope that with the right amount of software ingredients in place, it can actually address this type of um exploit class completely, right? And um hopefully move

software into a much better place in terms of being able to defend both at the kernel level as well as at the application level, right? So, so let's look at sort of the the evolution of um of attacks, right? So, you know, for people who have been in this in the security space or in the software security space for a while, uh we know this is sort of has has always been sort of an arms race, right? There um there have been uh you know, approaches where people would do sort of trivial attacks with in terms of like stack smashing or executing data off of the stack and you know, CPU designers looked at it and they said, "Well,

that's kind of dumb. I mean, that should just not be allowed to begin with, right?" And uh there were capabilities put in place to stop the execution off of the the stack. Um so, that addressed a lot of the data execution problem in general, right? When you know, a single bit was added into the into the OS page tables, even some Uh uh to um basically block the execution off of the off of data regions, right? Because data should never really be executable unless you're doing some really wacky tamper resistant programming or something like that where, you know, even there you need to like convert your code, you know, data to executable. Code injection attacks were then blocked

by people doing additional additional things to essentially, you know, add in safe exception handling, address space randomization, adding code integrity checks, things like that where, you know, it sort of made sense to know what you're executing, right? For once, so you know, people would do measurement of the code when when it was being loaded into all loaded off of the disk. Operating systems added driver signing and things like that where, you know, you would validate the code before you would execute, right? So, pretty sort of standard sanity check kind of things that that should be performed right? Um But as you were saying, this is sort of an arms race and our attackers keep evolving as well. They're not sort of

standing standing uh in one place, right? As defenses are being put in place, somebody will figure out, "Hey, how can I mean, people are creative, hackers are creative, you know, they'll figure out like, hey, what can I do that breaks this this new set of defenses, right?" So, a lot of control flow attacks were sort of the new the new realm where people people moved to. So, let's look at um a brief introduction to like how control flow attacks really behave. So, when you write code and you may write in any language, you know, high-level language, low-level language, but eventually end ends up in some sort of machine code, right? That the machine executes. And the programmer already always writes

code in a certain way where, you know, even though you use a compiler, it gets translated, but there's still some intention that the programmer has in, you know, what order the instructions should execute in, right? Um Well, an interesting property that attackers can can use is they don't really have to follow those rules that the programmer set in place. So, they can execute code from any place that they want once that code is executable on your system. So, that was sort of the main property that attackers use in this class of exploits called return-oriented programming right? Um where if you see this this uh you know, byte sequence, the yellow instruction is what the programmer

intended. It's moving some some, you know, string, you know, some some value into a register in the CPU, right? It could be be reading that value. Um But if the attacker jumps into the middle of that code sequence starting at like the the hex 41 bytes, um the machine starts executing from there and what the machine sees as the interprets as execution of the instructions from that point on is completely different from what the programmer originally intended. So, what the attacker executes in this case is is essentially a move off a register into that memory at that offset and then followed by a return right? So, the C3 is a very interesting byte pattern in um

in the ex- x86 architecture because it essentially transfers control um to some uh to some different location, right? It's called an indirect branch because you have a you have a programmer um a value that the program execution goes through that's not really under the control of the programmer at that point. It's dynamically changing, right? So, let's see how um how that gets used. And by the way, the term for that in this in the terminology we're going to use is a gadget because if this sequence is very interesting where you can execute one or more instructions followed by a return to the next to the next block of code, right? So, let's see how we can use the the

attacker can use that return. So, let's say I give you a body of code, you know, it's let's say, you know, 50,000 instructions, right? 50,000 bytes of machine code and you can run you can scan through that code and you can find all the places where where there is this byte pattern C3 in the code, right? That's an interesting pattern. Yeah. Thanks. Um because now you can you can now work your way back from that instruction and you can find all the interesting gadgets that are present in that body of code. So, how can how can an attacker use it? So, if you recall, we already took care of one problem where we made the stack not executable.

But remember that the stack is still readable and writable. So, if the attacker can overwrite the can use some logic flaw in some program or in the kernel, right? And they can load up the the stack with a set of return addresses right? If you combine the fact that the way the the stack behaves is when you execute a return instruction, the processor takes whatever address is on the stack and transfers control to that location in the code, right? So, now you take these two pieces together. It's like you have the ability to load up the stack with a bunch of bunch of addresses of your choice, of the attacker's choice, and you have a priori analyzed that code

that you're trying to attack with where you know what kind of these in where these interesting gadgets lie in that code right? So, you can just basically put those two pieces together and you can write your own program on top of the the code that was already executable, right? So, that that's really a uh a paradigm or or a model that attackers use which is called return-oriented programming. So, the closest analogy is like a ransomware note, right? If you look at a ransomware note, the way it's constructed, you know, the old style ones that you would see in like, you know, old style movies where somebody wants to send a ransomware note, they would copy letters from different, you

know, magazines and newspaper articles and stick it together and then you would sort of wouldn't tell who wrote this because they it was completely, you know, unintelligible, right? You didn't know the handwriting. It's kind of a similar analogy in the digital space where you have code that's executable. If you can find and analyze the code offline and exploit a vulnerability in that program, then you can write your own code on that system, right? You can literally build your own program at at run time, right? So, that's that's pretty scary because now the attacker can essentially just bounce off these gadgets and execute completely different different code. So, So, any of our, you know, previous techniques of, you know, code signing

and all these things are sort of out out of the door because you you verified your code, but it doesn't really help because this is a run time exploit and it uses your verified code to begin with right? Um so, this this technique is called return-oriented programming or or ROP uh or rob. Um And unfortunately, there's not just one type. There's actually three types of them out there, right? Because there are many types of indirect branches. So, wherever you have indirect branches, you can exploit this type of attack. So, there's three indirect indirect branches in in Intel architecture and there's similar sort of indirect branches in other architectures like ARM as well, right? So, this is sort of a general

problem. Um the other variants are called COP and JOP right? Um basically based on the other two types of indirect branches, a call-oriented programming or or a jump-oriented programming, which essentially work the same way, but instead of using the stack, they would use some corrupted memory somewhere in some pointer table and then have the call essentially use that corrupted data to to go essentially to the similar similar kind of gadget, right? Uh so, how do attackers use rob? Um So, typically what happens is I just keep track of time. Um The the exploits are typically broken down into two stages. Once you figure out that there is some sort of a buffer overflow or there's some exploit flaw

that you can use in some some logic, you write a first stage rob and you can see a lot of these rob constructions by the way are Metasploit and and other other tools if you've used those to see if you want to see like examples of these. Uh the first stage rop will typically execute and and invoke really critical system calls that open up the doorways for for dropping down the defenses to your sort of to take your machine back to the stone ages, effectively. Right? So, if you can if you can execute this first stage rop, you can start doing either second stage rop or just rely on shellcode from that point on because you can open up memory to

become executable again. You can you can make heap executable again and things like that, right? Um And and sometimes people use second stage rop as well. But then, once you've done those two stages, your your result essentially is that you can take privilege control of that thread, whether it's a browser application or, you know, other application or even the kernel in in in many cases. Um So, you know, a lot of tools have been have been created by many many sort of researchers and analysts out there to understand how you can actually um make rop easier, right? So, you know, anytime like people find out a particular attack class, you know, like people wrote Metasploit and things like

that, people have also written compilers for rop. So, you can actually take a body of code, let's say you take some version of Linux or you take some version of C library, um and you can upload it to a website and it will actually dump out all the gadgets for you, right? It'll tell you like, "Hey, if you wanted gadgets of size five or search for gadgets up to size five." It'll drop all all those gadgets for you. And then you can kind of combine that with other other tools. And there's many versions of of that. The other thing we found was why we started working on this problem was we said, "Hey, if you look at the the type

of this type of attack, it's really a fundamental attack that software can't really defend against." Um and we saw that with browsers and and many other zero-days being using this kind of rop attacks. So, uh that was sort of our key observation is that pure pure software mitigations are falling short in this space because if you have software depending on software for security, it's sort of you know, an arms race you can't win, right? Because at some point your defenses are going to break down because, you know, you can't rely on the on the software to ensure that your instructions are executing in the right way and things like that, right? So, it's uh there's a there's a couple of different

other artifacts to consider where why does this attack work? Because we have sort of, you know, we don't have any any force instruction alignment in the architecture, right? Um it's a very dense instruction set, so you have a lot of combinations you can you can search for. Um we have a one-byte return instruction, the C3 that I showed earlier, which we have to maintain because of compatibility, right? We cannot say, "Hey, let's use a 15-byte return instruction." That might solve the problem, but then you break software all over the place, right? Um so, we have to, you know, make sure this works with with existing software. And we can't change the ABI. Like we couldn't say,

"Hey, let's just switch over and use a completely different stack, right?" And change the programming model because that would break the Linux ABI, the Windows ABI, and everything else, right? So, those were sort of some of the constraints that that we had to work with. So, we came up with this architecture called CET. That's documented in a in a public spec for about a year or so now. We We updated the version last year. Um and the the idea behind CET was essentially to, take the intuition or take the the explicit information that compilers have from how the program programmer intended the program to operate and encode that into into instruction set that is usable to

enforce those properties at runtime, So, the first intrinsic is is a new set of instructions called end branch, end branch 64, and end branch 32, which essentially act You can think of these these as like landing pads or landing tags. So, when you see an indirect branch happen, which takes your execution control from point A to point B in the program, what this instruction lets you enforce is that when your indirect branch occurs, that you actually land on an end branch instruction. So, it effectively sets off a it arms the system or arms the machine when an indirect branch happens, and it disarms the machine when the end when it sees the end branch, right? And if the machine doesn't see

the end branch in the execution sequence, it generates an exception. So, it's like it's a really really simple construct, right? And it's really really there that's why very low, you know, cost in terms of like implement performance and things like that. Um but it's very powerful because now for for the first time, you know, we we we can actually enforce things in software that we can inline enforce as policies that cannot be bypassed just by simple circumvention. So, today if I have, let's say, AV software that I'm using and the AV software wants to hook certain APIs to enforce that, you know, I'm doing some sort of filter filtering of like parameters, right? I'm checking

the parameters that are being passed in. Today I can very easily just circumvent that check and nothing will stop the stop the the attacker on the system. I can just jump to like the I can read the memory at the at the hook point or at the start of the function. If I see that the location is is it has a has a prequel jump on it, then I know the API is hooked, and I can just jump past that that jump point, and everything will just work from there, right? But now with with things like end branch, you can actually enforce those checks inline because the machine will generate an exception if your indirect

branch, when you called into a DLL and things like that, did not have a a landing tag on the on the other end, right? So, this is a compiler-inserted capability, so obviously this requires a recompile, right? So, this would have to be an explicit opt-in capability that um somebody who wants to, you know, harden their code can recompile with that version of the compiler that enforces this this intrinsic. Um the second one, which is the more interesting one, which actually prevents the the rop exploit I described earlier, uh is essentially a shadow stack, right? So, so shadow stacks have been around, you know, have been discussed in like the academia for like many many years

now, right? And so, really the problem we solved here was more from the perspective of the Intel ISA, how do we really add the right set of instructions that, as I described earlier, don't break the ABI, right? So, we want to provide the right intrinsics to software so that they can actually manage the shadow stack correctly without breaking the existing software infrastructure that that we have, but still enforce the security property that attackers should not be able to use return-oriented programming fundamentally right? So, with the way shadow stack essentially works, I actually I'll describe it in a in a animation since we're running out of time. So, so let me show you sort of intuitively what what end branch does,

right? So, if I have this this code and, you know, this is compiled, what end branch essentially looks like is those landing tags that that I've marked with the stars at the beginning of the exported functions, right? So, now note that you don't have to put end branch in functions that are not indirectly called, right? Because if the functions are indirect are directly called, those are sort of enforced statically in your non-writable code to begin with, right? It's only interesting to enforce end branch for functions that you're that you're exporting, that you expect to to be the targets of of indirect branches, right? Because those are the ones that can can get uh and, you know, can get attacked, right?

So, so if this sort of sort of a transition happens, the machine says everything is fine. This is a normal call indirect call going to an end branch, so it it continues normally. But if you see some some indirect call going to some location where there is an end branch not present and CET is enabled for that process context, you essentially will get a control flow exception violation, which is a new violation reported to the OS, right? So, the operating system will stop the process at that point, and it's a fault-like exception, so the OS can go look at the scenario and see what's going on and either allow the app to continue or stop it because obviously

it's if it was a CET-enabled app, then it's a it's a violation that it wants to explicitly disallow. The other one is the the shadow stack, and that essentially in a in the in terms of the software construct is think of it as a regular stack, but it's access protected, and it only holds the return addresses, right? So, this is how we could maintain the ABI and not break the the the parameter passing techniques on the regular regular data stack. So, when CET shadow stack is active, you essentially have both the program stack and the shadow stack, and simply effectively what what CET does when it's when shadow stack is enabled is it checks on on return instruction on

that C3 instruction that I showed earlier. It checks the return address from the shadow stack in the data stack, and if there's a mismatch, it generates a control flow violation, and if it's if it's correct, it lets it continue as normal. The other important property to note here is for this construct to work correctly, the shadow stack has to obviously be non-writable completely to software, right? Otherwise, this access control mechanism doesn't really work, right? So, that's that's a key property of the the shadow stack that's enforced through um new page table enforcement mechanisms that go with this architecture, right? So, the when the OS creates a data data stack in linear address space, it also

creates the shadow stack and marks it as non-writable, and turns shadow stack on. From that point, you get this property being enforced, right? So, the nice thing about shadow stack is for the most part, the application doesn't need to be recompiled. It need It may need to be recompiled or relinked with new runtime binaries, right? Runtime libraries, so that your OS exception handling and things like that that knows that the shadow stack is active can work correctly, but for the most part, your application code doesn't need to change uh with shadow stack. Um There there's a sort of a final point here of like whenever we do these kinds of new instruction set that try to, you

know, enforce these new kinds of properties, a lot of software potentially breaks, right? So, something we have to keep in mind is how do we create this transition path, right? So, that we say, "Hey, you you can go enable CET for specific targeted applications or even applications that may not be completely ready for CET, right? Because you may have legacy binaries or libraries or DLLs that you're using in that process address space, right? That have to be you know, supported, right? So, there's a there's a notion of this like what we call a legacy compatibility bit bit map. It's it's actually like a really really you know, it's it's sort of a bad idea in the architecture that

we don't really like, but it's sort of a you know, a you know, a reality that we have to deal with that if you really want your application to be secure, you do not want to turn on legacy compatibility bit map for your application. It has to be fully and completely CET compiled and enabled. But, the reality is we might see a transition path where certain applications have to turn on load libraries that are not fully CET compatible. And then this is a way for the operating system to say, here's the libraries or code that are not fully CET compatible. So, if you do see a transition into that path, look up this bit map to this to a decide whether to

to generate an exception or not. Right? So, it gives a flexible path to to the OS to to manage that. So, let's look at some of the key results. Um and I'm going to run out of time, so I'm going to just go go through this quickly and I can take questions. Um so, we did a sort of a security analysis of CET using simulators, right? And one thing we realized was we kind of were thinking about like is shadow stack by itself enough or do we also need end branch to be to be supported? Even though the architecture allows you to enable one or the other, right? Or both. What we realize in our analysis from

Linux is that you have to really a good implementation that's using software implementation that's using CET should turn on both the capabilities. Because it's very easy for an attacker if you block a rob to just pivot over to use copper job and and really use the same gadgets as is. And we showed that this is some code that we found in Linux for example that essentially iterates over an indirect branch and indirect which is essentially a cop, right? So, so if I turned off rob and the attacker was forced to using cop and I did not implement end branch, didn't compile my kernel with end branch, they could simply use this this loop here to invoke

this call call indirect and invoke literally the same gadgets, right? Um so, that's a that's an important property to take together. Um so, one thing you guys might be wondering is okay, so what's what am I going to pay for this, right? What's the cost of turning this capability on? So, what we've done in analysis right now because we don't have hardware for this, it's a it's everything is on simulators right now. Um we've analyzed the sort of the three three cost aspects of CET, right? One is what's the performance cost for shadow stack, right? And we are aiming for this target to be around less than 2%. What's the power power impact? That's important for large data centers that

are turning this on at scale, right? And the third one, what is the code size growth? Because if you have really really small kernels and embedded kernels, you may not be able to take the cost of your you know, your code becoming too bloated because of adding end branch instructions, things like that. So, what we found is for code growth with you with end branch, the geo mean is about 0.2% with Intel compilers, with GCC compilers, it's about 0.4%. Um so, that's that seems you know, uh reasonable for for you know, that kind of uh defense capability. So, last couple of things to end. So, as I said, the spec is online and you you

know, it'll be great if this community can go look at the spec and give us feedback. Uh you know, it uh it's always good to have sort of a um really focused audience look at the look at the spec, especially for these kinds of defense mechanisms, right? Because it's harder to to build defense mechanisms uh and it's much straightforward to you know, build one one attack mechanism. Um there's a bunch of software enabling in place. Uh the GCC patches or GCC support for this capability is in the GCC version 8 right now that you can turn off turn on with the uh FCF protection flag. Um there's also Intel C C++ compiler supports it. And

there's a tool called SDE that is essentially is an emulation vehicle that has CET capability in it. So, you can actually compile your code for CET and run it on SDE and and make sure it works correctly, right? So, you can actually get a lot of your your code transition over to to CET even before the CPU shows up. And then finally, there's OS patches for for Linux um that are in in review right now. So, um so, are we So, we can can if we say we address rob, are we done? Are the attackers basically going to just say, okay, we're we're also done. We're going to you know, close shop and go home?

That's not really going to happen. So, we are looking at from the research perspective how do attacks evolve? We're seeing this new class of attacks called data oriented programming uh that's that that seems to be emerging right now. It's at the research stage. There's a couple of talks at Black Hat this year. Um the couple of recent papers that have been published from a couple of good universities. Um so, the the the research continues, but I think uh you know, with with explo with blocking rob exploits, we have at least shaved off a bunch of exploit types that can be that that use that technique. And the attackers will have to start looking at much more

advanced techniques. Um like data oriented programming. Data oriented programming, the general idea here is by corrupting data, you can essentially use full functions, right? And cause the code to execute differently from what the programmer intended. Even though you may not be violating the control flow integrity properties. So, a data oriented programming attacker is going to try and attack a program even when even when your your code is CET compiled, Um Yeah, so I'd like to end with that. Um as I said, a lot of the tools are online. So, folks are interested, they can go download the emulators for it. Go download the latest compilers. Compile their code with CET and actually go evaluate what it takes to harden those

code against those types of exploits. Thank you.