← All talks

An Introduction To Assembly And Reverse Engineering - Tom Blue

BSides Newcastle16:0287 viewsPublished 2024-01Watch on YouTube ↗
Show transcript [en]

right go on um hello I'm Tom I'm going to be talking about assembly and software engineering so um who am I what are my

credentials yeah so um mon to cry was a large run attack in May 2017 um it affected organizations such as the NHS FedEx and other big important things and also little important things and um yeah it affected a lot of things this talk is going to be done from the perspective of like one cry has just happened and we're going to try and use reverse engineering to try and figure out what like how to stop it um originally there was going to be a live demo and I that when I did this talk in Cambridge but I've got 15 minutes here so there's not going to be time for that so I don't know how this part's going to

work but like we'll see I guess um so like we we start with assembly um what is assembly it consists of simple instructions an instruction could literally just be moving information from like a register into a register or between registers or from like a register to memory um registers are like fix sized variables that exist within the CPU and can be accessed very quickly because they're within the CPU however due to the fact that they're fixed size and limited in number um things that don't fit have to go into memory it will go more into memory later on and assembly runs line by line like anything else so oh there supposed to be like animations for each point to come up but

oh well um so um memory is broken up into various segments in um in programs memory there's a text segment which is the only executable segment in most programs and um it holds the code um there's a data segment which holds variables and part of the data segment is read only data which holds literals and constants there's the Heap which contains dynamically allocated memory and the stack which is fairly important for this talk uh it contains local variables and consists of these things called stack frames so how does a computer know which line on you didn't ask but I'm going to answer regardless um there are a number of special purpose registers um which are not used for the program's logic but

are used to kind of manage the program's execution so that the instruction pointer which points at which instruction we're currently on it will increment by one each time an instruction is executed and can be manipulated by things like return and jump and stuff like that there's a Bas on the Zack pointer um I'll go into this go more into this into another in another slide but um the base point just points to the bottom of the stack and the stack pointer points to the top of the stack and that's how the program knows how big the stack is and um the base and the stack pointer are updated as we go into different functions as you create new stack frames

um there's a Flags register which is fairly important too it contains various important flags for programs execution like the zero flag which tells you if the um if the value of the previous operation was Zero which means that um which could mean like if you subtract two values from each other and the result is zero the values are equal so um I touched on stack frames before oh these this meant to come up like one by one as well but um so stack frames are like each time a function is called a new stack frame is created um and that holds all the local variables and stuff for that for that function however it also holds um the

return address for the for the function um and it contains the parameters that were um used when calling the function so so when the call instruction is run you push the instruction pointer and the base pointer onto the stack and then you set the stack pointer to be the base pointer because the new base is the top of the previous stack frame and then when you return you pop the um instruction pointer back into the instruction pointer and the base pointer back into the base pointer because the stack pointer should by the end of the function be back at the bottom of the stack frame and so the base needs to return to where the previous base was

and this is when the stack frame gets destroyed so when variables go out of scope um it's because by the end of the function the stack frame is destroyed this again will be important later on so um I don't know how this is going to work because it didn't work that well last time but I'm going to do it any anyways um I found this really cool thing called godbolt which um compiles C into um into a assembly and you can see what lines of SE look like in the assembly so does anyone have any suggestions for um what to write and see to see like what stuff compiles to you I'll um sorry hello world hello world

okay I can't actually see what I'm writing by the way so um did that work

no yeah know the reason it didn't work that well was because I couldn't see what I was doing as I was doing

it yay that did something thing so um you can see at the start of the function you push um you push the instruction pointer and the base pointer and stuff and then um I didn't add a return but it automatically added a return because you're meant to return at the end of main um you can see the parameters are put into registers and then the call instruction is run the call instruction would then push the instruction pointer and the base pointer into the stack and set the stack point to be the base pointer and then change the instruction pointer to be where printf is printf then runs and when printer finishes um it will return the

instruction pointer is set back to where it was before so it'll be on the next line after the call instruction um and the stack pointer H the base pointer is set back to the base poter that was there before which is all kept on the stack so um does anyone else have anything they want me to write or attempt to

write sorry

oh again like I can't see what I'm doing so oh that was was that outside of me oh that's

annoying um I'm going to move the big screen back to my own thing so I can actually see what I'm writing and

then

y that seems to have worked um sorry oh that's a good point actually um

so um labels have been created and [Music] um as you iterate through the for Loop it'll compare the iterator to um 10 in this case and once it reaches that value it'll jump out the jump instruction um the the the jump instruction changes the instruction pointer to wherever the jum label is so jle jump L will check that um the I can't see what point of that is but I whatever register that is is less than nine and if so it'll jump back to L3 wherever that is um is that L3 anyways yeah you probably get the point but yeah so um and then I presume that point gets incremented by one yeah does the ad so it'll increment it by one

and once it's once it's less than nine it'll jump back and once it's not less than nine obviously it'll fall through and complete the rest of the program so um I think I should probably move on so um I didn't know what image to put here and me and my friends and just GI to each other constantly so I thought it'd be funny to put here as well um so how does this really to stop being one to cry um compiled code can be converted to assembly because machine code and assembly have a oneto one correlation so what we could do is we could try and figure out how the encryption happens and if we can undo it

we can look at how it spreads and try to prevent it we could try and write an antivirus signature for it or we could hope that the malware authors put a convenient kill switch that we could trigger normally in the live demo I'd actually do that but then I don't have time for a live demo so you're just GNA have to deal with this um there was meant to be a slide here I forgot it for the last presentation and I seemingly forgot to add it for this one as well um which was like explaining more about what reverse engineering is but we're going straight into Dynamic analysis I guess so um a dynamic analysis tools they let you run

code line by line and you can add break points and stuff like like a debugger in fact bdb is really useful for dynamic analysis because um what you're able to do is you're able to look at registers and memory mid execution and if there's like a bit of assembly which you don't want to analyze or you don't really understand you could see what happens when you run it see the changes in the register values and kind of infer what the code does from that um um yeah you can look at memory areas such as stack and the heat but when you're analyzing malware like one you could accidentally encrypt your own files Which is less than

ideal um so to kind of help with that we have static analysis um with static analysis we can look at function symbols Etc quite nicely and a lot of um St analysis tools such as edra have the compilers which will turn um assembly back into a c like language sorry okay so [Music] um yeah the cool control flow graphs which show you how the both the control Flor of the program and you can kind of it's like trying to figure out what the code is doing using tools that analyze it without actually running the code which is very useful when you're looking at malware because you don't want to accidentally run malware on your system um I would have used gidra to

look at one and find the kill switch but again there wasn't there's not time for that in this talk um and yeah there's no risk of accidentally running um malare in your system uh this SL because I didn't have I couldn't be bothered anymore um so why is this useful we can like work out what software is doing we can find um difficult to find vulnerabilities and we can modify what um application Behavior which is known as patching so um patching you can directly modify code without the source so you can change individual instructions so you could change like a jump not equal to be a jump equal and um two skips of like authentication

checks um yeah you can literally just change what the program does which is very powerful and is a lot how a lot of privacy happens um padding is a pain though because um instructions have to be aligned for the CPU to be able to execute them and when you change them it changes the padding and you have to like pad things to make them work um I'm going to go quickly because there's like a minute left um buffer overflows uh prerequisite for R um a buffer overflow is when you have so like on the stack you might have a buffer of like a certain number of B and if you try to write more than that um you'll

overflow and you'll overwrite other things on the stack so remember that on the stack there's data like the return pointer and the like base pointer and parameters and stuff like that so with return oriented programming it's super cool and um what you do is you change the return value so that when the return when you hit the r instruction it'll pop a value off the stack but that stack will be a value which you've written yourself and you can redirect it to any part of the code usually there'd be a couple lines of assembly you want to run followed by another return instruction which it will then send you to somewhere else in the code which you usually have

like kind of put in place and you effectively get arbitrary code execution because you have a standard library in the text segment which is the only part of memory which is executable um in the past you were able to write Shell Code on the stack and redirect execution to the stack but because the sack is non executable you can no longer do that so return arranging programming is kind of a way of achieving arbitrary code execution uh whilst bypassing the non-executable stack um yeah that's the end I guess I'm not sure if there's time for questions but if you have

questions