ROP - From Zero to Nation State In 25 Minutes

Name: ROP - From Zero to Nation State In 25 Minutes
Uploaded: 2020-07-20
Duration: 25 min 27 s
Description: Omer Yair - ROP - From Zero to Nation State In 25 Minutes BsidesTLV - Tel Aviv - July 2nd, 2020

BSides TLV · 202025:27801 viewsPublished 2020-07Watch on YouTube ↗

Speakers

Omer Yair

Tags

CategoryTechnical

StyleTalk

Mentioned in this talk

Tools used

PINJECTRAL x64dbg

About this talk

Omer Yair - ROP - From Zero to Nation State In 25 Minutes BsidesTLV - Tel Aviv - July 2nd, 2020

Show transcript [en]

Our next speaker is Omer Yair. He's going to talk to us about nation state attackers and how anybody can use the same tools and capabilities that nation states have in 25 minutes. So from zero to anti-hero would maybe be the title for that. Homer manages the endpoint team for Javelin, which is a security company acquired by Symantec. What he does is manipulate operating system internals all day. In his free time, he revives historical photographical processes, which sounds fascinating. And here's a fun fact: at the age of seven, his father taught him how to format floppy disks, a decision that father soon regretted when the C drive was formatted as well, very quickly after that. So, are we

ready? Are you ready? Take it away, the stage is yours. Thank you very much, Karen. Thank you. Thank you everyone for joining me. We'll talk about today about Europe and how we can take it from zero, from a simple rope to a nation state weapon in less than 25 minutes. So, as Yossi said before, all you need to have is time and motivation. We'll show you how much time you need. So let's start. We don't have time for introduction. If you want to follow me or to know me, follow me on Twitter. That's @yair_omer. And what we'll talk about today, we'll talk about process injection. We'll see how we can use Rope for process injection. And then we'll just do demos all day long. That will be a

fun presentation. So let's start. So process injection, that's our goal today. That's the weapon we want to develop. That will be the main goal for each round that we have. And we will have second goals because we are a nation state and we want to make a really good weapon. So we will want to avoid detection, we will want to have a stable method of injection and we will want to have as minimal requirements as possible for the method that we are using. So let's start. For round one, we will use the basic process injection. All we need to do is to call virtual alloc x to execute memory, to allocate memory on the target

process. It would have to be page execute read write. Then we write our shell code to that address. And lastly, we need to call create remote thread to execute that code and that target process. Very simple, very basic and effective method to execute. inject code, but because it's so basic all endpoint protection actually monitor the calls to virtual alloc X and specifically look for page execute read/write and also when you add the create remote thread to that process that's a sure conviction that there is a malware on the machine. So, to summarize the first round, we managed to run code on target process. We completely blew away the detection. We failed on this part, but

we have a pretty stable and no requirements at all. If you want to know more about process injection, there is a great talk by Itzy Kotler and Amit Klein from the previous DEF CON and Black Hat called Process Injection Techniques Gotta Catch Them All, where they both presented a new tool called PINJECTRAL, which I will use today. I extended it for our demos. And they showed a new way, the latest and greatest injection technique, called stack bombing so if you look so we use it for the round two if you look at the poc code for stack bombing you will see this and if you are scratching your head what this thing does don't worry

i scratched my head for you and this is the pseudo code for the injection so first you call suspend thread on the target thread on the target process that you want to inject to then you call get thread context get hot context actually gets the state of the cpu all the registers of the cpu at that specific time in the thread And then they use a clever method to write into process memory using APC to actually override the stack, but not only the stack, they are overwriting the return address, and this is how they hijack the execution from the target thread. Now, it's important to know that because they are using APC and overwriting the

return address, it imposes a limitation on this technique, and the instruction pointer on that thread has to be in five specific locations to make this injection work. Lastly, they call resume thread, and because they changed the return address, when the thread resumes, they will hijack the execution into their own ROP. and run their ROP. So does it work? Of course it works. But there are some limitations for this. So in this round, we did manage to get code running, but it was only a ROP. We didn't manage to inject a shellcode. On the other hand, we completely bypassed all the detection because no one is detecting this kind of injection. But there is a small

issue with stability, which I'll touch later on in the presentation. And as I said before, there is a kind of requirement that the instruction pointer would be on five specific locations. for that method to work. So from this moment on we will try to improve the stack bumping to make it into a nation state weapon. So let's talk a little bit about return oriented programming and if you can inject code into a target process, actually the code that you are injecting controls the flow or the logic that happens. So every instructions follow the next one. But what happens when you can't inject code or don't want to inject executable code? You can actually manipulate the

stack and hijack the execution. You do that by looking for existing code in memory, snippets of existing code, which we call gadgets. And if, for example, you want to have a logic that copies a value into memory, you will find the following specific gadgets or snippets of code. And those snippets of codes need to end with a red code. We'll see exactly why. So if you have the pop-eax-ret, you will fill the eax register with that beef, and then the ret opcode fetches the return address from the stack, which allows us to continue the execution to the place that we want, that will take us to the next gadget, which will be pop-ecx-ret. So now

we filled ecx with the address of 6123, and ret will take us to the next gadget. Again, we are controlling the stack, we are controlling the execution, which will be move eax into the address that ecx responds to, and now we have that beef exactly in the place in memory that we wanted it to be. So now we'll take the simple rope from the stack bumping technique and we will make it into a rope that can actually execute code. So let's do it in round three. And we will use a method called file mapping or section in Windows. It's a way to show memory between processes. And we will split the usual file mapping into

two steps. We will only create the section on the injecting process and fill it up with a shell code, but we will not map it from the injecting process because we want to fool EDR and make it harder for them to connect between the injecting and the target process. But what we will do, we will modify the ROP so the ROP will get a handle to that file mapping and will map the shellcode into its own process and execute the section. So I like to write pseudo code for Rope because it's easier to understand what happens inside the Rope. So Rope will get the section, the handle that we want to run. It will also

get a-- We'll need a place to save the variables. We will call the function anti-map view of section, which actually maps the section in memory. That function receive the first parameter is the handle. The second parameter is the process we want to inject to the current process. And luckily, in Windows, the value of minus 1 is a pseudo handle for the current process. And we also pass a variable that will tell us where that memory was allocated, because that memory is not there when we run the shellcode. that we will use it later. Also note that we only need to allocate the memory as page execute read, which is much less alarming than execute read

write because the shellcode is already in that memory. Lastly, our ROP will have to call the payload. So let's see how that works. Okay. So we've injected, as I said before, we hijacked the return address. On the top one window you will see the assembly code. On this area you will see the stack. So the left column is the address in memory, the right column is the actual values of the stack. On the lower right panel you will see the register. So we have the stack pointer, the instruction pointer, and the rest of the things. So as I said, we hijacked the return address. So now we are going to the first gadget, which we

call the stack pivot. So the stack pivot is a pop RSP. So it will change the stack pointer to the beginning of our rope. So now we are starting our rope. You can see that the stack has changed. And because we are in 64-bit, we need to make sure the stack is aligned. Now we are in something we call the pop gadget, which fills the registers for us. It's just popping the registers. And in 64-bit, when you call a function, you need to pass the first four parameters in RCX, RDX, R8, and R9. And this prop gadget actually prepares us for the call to anti-map view of section. So you can see that we filled

RCX with a handle to our section that we want to map. RDX is the second parameter. It's minus one. And R8 holds the value of where we want to map. received the address that the memory was mapped to. But also note that we didn't choose a random address. We actually chose an address on our stack. And I'll show you in a moment why we chose that address as well. for this reason. So let me continue with the anti-map UF section. The actual opcode that goes to the kernel and map the memory is this syscall opcode. So you could see that after we run it, it will fill the address that we want with the address

of our shellcode. And also you can see the RAX is the return value at zero. It means that we succeeded. And also that we have the shellcode map that address. So if we continue, in 64-bit, you need to-- to clear the stack after each call, it's not happening for you. So we use a gadget that adds RSP to clear the stack. And you can see that the RET, because we saved the return address on the stack, that's our shellcode. So if I let it run, you can see that we got hello world here. So let's summarize that third round. We managed to turn the ROP into a viable way to inject code into processes. But

on the other hand, we failed on detection. And the reason for that is because we use anti-map view of section, that's a function that a lot of ROPs are using. And endpoint protection systems actually places a hook on that function. And that's where they run their heuristics to find if there is a ROP. And because we are a ROP, they will find us. And we want to actually... work on this part for the next round. So in round four, I will show you how we can use right of passage technique, which is a technique I showed in last DEF CON, to bypass all the rope mitigation. And to understand right of passage, you need to

understand a system called semantics. And a system called semantics that we've just seen, anti-map view of section, for example, It works in the following way: first you need to pass the first parameter from R6 you copy to R10, then you load EAX with the value that we call the system call number, and then you issue the syscall opcode. The syscall opcode actually goes to the kernel and performs the anti-map view of section in the kernel. But the kernel doesn't care who or where the syscall was issued from. The only thing it cares about is what was the value of EAX and that tells the kernel what system call it needed to run. So as I

said before, endpoint protection places their hook on the anti-map view of section, so we cannot go through anti-map view of section because that will convict us. So what we can do? Well, endpoint protection don't hook all the functions. For example, anti-heal execution, which is literally a function that does nothing, also is a system call, but is not hooked. So how can we use it? We can start with a PopRxRet gadget followed by a PopR10 gadget and we will use anti-heal execution but we will not go to the beginning, we will go directly to syscall. And if you look at it like this, we can actually mimic any system call we want without going through any

hook so we can bypass any endpoint protection system that's trying to catch our rope. So let's see how we can take that rope and make it into Right of passage hook and bypass protection. So we're beginning again by hijacking the return address going through the stack pivot gadget and Now we're starting again with a pop gadget, but notice that we are starting with a pop rex So now we are popping the values we want also know that we are popping our 10 as well as our 6 and when we return we will return into NT yield execution but directly to the system call. So you can see that Rx holds the value of 28, it's

not the system call that NT yield execution meant to run, it's actually an anti-map view of section. Also note that we have Rcx holding the section that we want to map, R10 also holds it, and R8 points to where we want to save the others. If we issued a system call and we continue with our op, you will see that we got to our shellcode again because we managed to allocate that memory, that shellcode inside, and we get our message box. So we actually managed to avoid detection. We are going back to the green light on avoiding detection, and we still have only two issues left to solve. to tackle. So about requirements, as I said before, the instruction pointer in the original stack bumping has to be in

specific file locations. We want to have a stability to run something like stack bumping that will not care where the thread is running. And the other issue for stability I will talk about now. So let's start with round five. So if you think about a ROP, a ROP is like running a function. If you hijack a thread, you're actually running a function within a function, and that alters the registers that the original function was expecting to have. So the problem with ROPs and also with the stack bumping is that some registers can change from the function that actually was running before the stack bumping. And that can cause stability issues because if it was expecting

a specific return value, for example, in Rx and it sees something different, then your program can work different. So how we can do that and how we can hijack a thread anywhere that it executes? If you remember, in stack bumping, we called getThreadContext. So if we had something like this, and let me just hide a regular rope. We are just talking what we are adding to it. We called getRentContext so we can pass the context to that ROP and at the end of the ROP, a function called anti-continue, it's like calling setThreadContext, actually replacing the whole registers on the thread with the context that we got before we started the ROP. actually revert the thread

state to what it was before. But if you try this, you will soon see that it's not possible because when you call get thread context and then resume the thread, the volatile register, specifically RAX, RCX, and RA8 to R11 can change between that time. And I don't have time to explain it. I can explain it on my Twitter account if you will follow me. I will add the explanation over there. So we can't use that method. So maybe if we can control the state, we can write some kind of gadgets that will get us the complete state when the rot starts without modifying any register, and then use that state to call anti-continue. Well, that's

a really good option, but you know what? There is a function in anti-DDLL that does exactly that, and that function is called RTL capture context. But we have a small issue here, because RTL capture context needs to receive a parameter. And if you remember correctly, the first parameter is passed on RCX. So we will have to override RCX. So we need to change the rope to be something like this. We'll need the first gadget that will save the original value on RCX without modifying anything. Then we can change RCX to pass the state parameter, call RTL capture context to capture the context. Then we will need another gadget that will allow us to override the

RCX and the context with the original value that we saved before. And after that, we can run our ROP, and at the end, call and continue to revert the thread to the same execution as it was before. But also note, because we are calling RTL capture context from the context of the ROP, both the stack pointer and the instruction pointer will be different. So we also need to revert them back, and we do that because we managed to call the get thread context before, we can actually rely on those two registers when we build the ROP. So that's something that we can do. So let's see what we need to do to change the stack

bumping technique to make it work. So we change it a little bit, don't get alarmed, it didn't change a lot. We are still calling suspendThread, we are calling getThreadContext just as before, but now to hijack the execution, we are changing the thread context. We are only changing the instruction pointer to point to our first gadget, we are changing the stack pointer to point to the top of our stack, and then we are calling setThreadContext to change the context of the thread. Next, we will call writeProcessMemory, the actual writeProcessMemory, we are not using APC, so we have no limitation on this writing. And we don't need to overwrite the return address, we are just writing whatever we want on top of the stack

without overwriting anything on the original stack. And lastly, we call resumeThread. So let's see a demo of a stable write-of-passage technique. Okay, so... I just hijacked the execution and you can see that we are not going back to the opcode that we were before. We're actually changing the stack pointer to our fixed gadget and this gadget actually copies RCX into a parameter inside NTDLL and if we will look at it, we can see that this parameter inside NTDLL holds some value but when we execute this opcode you can see now that this value actually holds the value of RCX and that's the original value that we want to save. So we managed to get RCX. Also note that because

this gadget has the PopR14 Inside of it. We actually wrote to the stack the original value of our 14 that we got from the context So we are not changing our 14 on this Gadget and the auto red upgrade will take us to a pop our CX red gadget that now will load where we want to save the context Before calling the RTL capture to context So if we look at memo at the memory we can actually look at the context that we want to that we will fill up with a with our context and And this rate will take us to the RTL capture context. So now that function will start filling up the context structure with all the values in all the registers. So let's see

them starting to fill up. And we will note that, as I said before, R6, the stack pointer, and the instruction pointer will both be, will three of them be a little bit off. And we will need to fix them. Let me finish running through this function and I will show you exactly where those are. Okay, so we got to the end of that function. You can see that RCX is holding the current RCX and not the original one. You can see that the stack pointer is also pointing to the current stack pointer and not the original stack pointer before we hijack the thread. And also, let me enlarge it a little bit, you can see that the instruction pointer is also taken from the current stack and not... from

the original. So we'll use a few gadgets right now to fix those stuff. So first let's start fixing the RCX value. So we're going back again to the pop registers. This time we only need the RCX and RDX registers. And the retop code will take us to a function called RTL copy LUID. That function actually takes the value that RDX points to, and in this case is where we saved the original RCX value, and it will copy it to the address that RCX points to. And RCX, you can see that it points exactly to the place where RCX should be in the context. So when we run this, you can see that now this opcode

will actually override this value with the original value of RCX. So we managed to revert RCX value to the original value. Now what we need to do is revert both the stack pointer and the instruction pointer. And for this, we're using the PopRx gadgets. And we're going back to RTL copy LURT, but only to the second opcode because we can copy the value of RCX into the address that RCX points to. And because we managed to call getThroatContext before injecting, we could save those values on top of the ROP originally. So now we can see that we will overwrite the instruction pointer with the origin value, and we'll do the same sequence of gadgets again

to overwrite the stack pointer. And you can see it changing right now. Okay. So now we have a proper state context that we can revert the thread to after we finish the ROP. But don't forget that we changed a parameter inside the NTDLL. So we need to revert that value back. So that's what we'll do now. When we injected the ROP, we used read process memory to read that value, the original value, and placed it into our ROP. We're using, again, the poprx, poprcx, and moverx, the address of RCX to override that value. You can see that this value now holds the original RCX, but when we issue this command, if you remember, this is the original value that was there, so we actually didn't change, we can run

the ROP without changing anything in the thread, but we're also not changing anything in the process itself. So now when we hit the ret opcode, we'll go back to the original ROP. So going back to the pop Sequence we're popping the registers now and going from red to until execution, but you are smarter than that You know that we're not executing and the yield execution We're actually executing at the map view of section because again, this is a right of passage No need to change that their way So when we issue the system code, you can see that this value will get the address of our shell code so we manage to map the shell code to memory so now again, we need to clear the stack and and

we are returning directly to the shellcode and now we are running the shellcode so let me skip the shellcode because you know it works you know it will pop the the message box i'll put the breakpoint at the end and you can see we got the message box and got to the end of the shellcode so now when that when we return from the shellcode back to the rope we need again to clear the stack and now we're going back to the pop gadget that we've seen before and this time We are going back to NTL execution, but then again, as I said before, you are smarter. You know that this is not NTL execution,

this is number 43. And if we want to know what the system call number 43 is, Well, if we disassemble anti-continue, you can see that number 43 is the system call number for anti-continue. So now we will actually execute anti-continue, passing it the context, the original context of the thread, and when we run it, we will actually revert back to the original thread without changing anything. So let it run, and you can see now the process resumed to running the same stat as before. To summarize, we managed to run code on a target process. We managed to completely avoid detection. We have a stable method to inject ROPs that has minimal requirements. And may I say that if there is anything new in this technique,

I've never seen a ROP that manages to hijack a thread and resume the thread to the same execution point as before. So this is like the new stuff that I added here. And we actually have a nation state weapon ready for our disposal. So takeaways. So ROP can be fun. There are so many ways we can improve this ROP even further. Just imagine what we could do with at least three minutes of time. We can remember that adversaries have much more time than that and they can also improve their methods as well. It's important to know that Intel control flow enforcement technology is right around the corner. And it's actually a technology that they will embed in their CPU that will allow them to mitigate

ROP exploits for the CPU itself. But because it requires new hardware and new compilation of the software, it will take about five to seven years to reach the global population. everything else and as always we need to break it to make it better so thank you everyone

ROP - From Zero to Nation State In 25 Minutes

Related talks