
hello I'm Tom I'm going to be speaking about assembly and software reverse engineering so um who am I what are my credentials soan cry was a large Ransom attack in May 2017 it affected large organizations such as the NHS FedEx and other big important things and also little important things and like many things it affected a lot of things we're going to be talking from the perspective of like one acry has just happened and we want to use reverse engineering to try to figure out how to stop it so um Where Do We Begin we start with assembly assembly consists of simple instructions such as moving a value from a register into memory or between
registers and stuff like that a register is just like a fix sized variable uh which is built into the CPU and so therefore can be accessed very quickly um however due to the size of them um things that don't fit in registers have to go into memory memory takes longer to access but it's still relatively quick and um it runs line by line like anything else does so um in memory you have different segments which contain different parts of the code and um most memor segments aren't executable apart from the text segment which holds the instructions that are being executed um the data segment holds variables um do readon data holds literals and constants that don't change within um
the program The Only Exception is if a constant is defined but not given a value in which case it's stored um with the dynamic data the dynamic the Heap just contains dynamically allocated memory and the stack which is fairly important for this talk contains local variables and it consists of these things called stack frames which will be explained later on so um so I've mentioned um registers however there are a number of special purpose registers which are important for the execution of code um the the most important one is probably the instruction pointer which points out which instuction we're currently executing this increments by one each time a line is run and is um altered by things like jump and return
and call statements uh instructions there's the base and the stack pointer and the base pointer points at the bottom of the stack the stack pointer points at the top of the stack um there's a Flags register which holds a number of important Flags um relevant to the code executing such as the zero flag which it indicates that the result of the previous operation was Zero um there par flag which indicates whether the last the the result of the last um instruction was odd or even there's the interrupt Naval flag which indicates whether interrupts can basically happen at the SP of the execution so um going back to the stack the stack consists of Stack frames as I
previously mentioned um these stack frames are created whenever a function is called um when you call a function you push the instruction pointer and um the base pointer onto the stack and you set the stack pointer you set the base pointer to be equal to the stack pointer which means that effectively you you've created a new stack where the bottom of the current stack is the same as the top of the previous stack and then anything anything which is then um declared within that variable expands that kind of separate stack the stack frame then when you return after all variables have been used and you're back to just the um saved base pointer and the return address the return function
will pop the instruction pointer back into the um well into the instruction pointer and the base pointer back into the base pointer so it returns to the previous state it was in so um if you're familiar with like variable scoping where a variable only exists within a certain function that's because once the function ends the stack frame is destroyed and so those variables are no longer accessible whereas variables declared in functions um in the function that called the function you're currently in they still do exist because that stack frame still exists under the stack frame you're currently in um this is relevant for like stuff later on in the talk so um um I didn't know what to put
here as an image so so so compiled code can be converted into assembly that there's a one to one correlation how does this relate to stopping oneoc cry um we could try to figure out how the encryption happens and see if we can undo it we could look at how it spreads and try to prevent it we could try to write an antivirus signature for it or we could hope that the malware authors put a convenient kill switch that trigger so so what is reverse engineering um I supposed to have a slide here about like never mind um um I guess we're going straight into Dynamic analysis then and dynamic analysis is basically where you
run the assembly code line by line like you'd run any other code it's kind of like a debugger in that way and you're able to set break points and see register values mid execution So like um if there's a bit of assembly which you don't really fully understand you could set a break point after it and see what happens to registers and kind of infer what went on through that um you can look at memory areas such as the stack in the Heap again to try to figure out what code is doing without having to necessarily look at every single line which is very useful it speeds up um the rate at which you can analyze
binaries however you could accidentally encrypt your own files which would be a bit of a pain so um when working with malware like wry in this example there's also the option of static analysis um in with stating analysis you can view functions and symbols Etc um and there's usually a decompiler some tools have decompilers which will take the assembly and um convert it back into a c like language which is easier to understand I realized now I was meant to go on on a oh this hasn't shown oh there we go yeah so I I was going to demonstrate how what certain lines of uh bits of C look like in assembly so if anyone has any suggestions for lines of
C to write to see what they compile to
no print [Music] World um I don't I need to import the [Music] thingy is pretty is in STD isn't it oh it's working the issue is I can't see it from my laptop which is a pain [Music] um but yeah no so like the start of the main function is pushing parameters and stuff on onto the stack which is used for when you then um later on have to return it's like the base pointer and the instruction pushed on so that when you run the return statement they're popped off and um and the the program can continue to where it was before then um registers are used to hold parameters for functions and once all the
parameters are stored you call a function I can't see the call oh there is a call there on line 11 um where it calls a function the call will then set the instruction pointer to be the location of of where prf is and execution will continue There the base pointer will be saved onto the stack and this there'll be a new stack frame within printer once printer finishes um the instruction pointer is popped off the stack again and the execution continues into the main function we've written where it then returns zero um is there anything else you want to see in assembly we [Music] call um no no probably not I mean if you can if you can give me
an implementation of it I could type it out for you but you what two all right this is awkward because I like I'm still on present R viw so I'm looking at the PowerPoint um wait so you want me to like declare an integer and then add yeah add it see the aritic function the [Music]
call yeah that did um I can't actually see very oh I think this has probably been optimized out because it's not being used so if we declare it as volatile I think that should work if you declare a variable as volatile it basically tells the compiler that like even though it looks like this variable isn't being used don't optimize it out because it might still be used somewhere um did that change it's changed
[Music] yeah so it's like yeah it's adding it's saving it to a register and then adding to that register um but yeah I suppose if there's nothing else we can continue this was meant to come after the stack frames bit but then I just completely forgot so um so yes some tools have decompilers which will turn the assembly back into something like uh something that's not quite C but it's kind of like pseudo C and it's like it's easier to read and understand what's going on however the decompilation output isn't always um perfect which we'll see in the live demo hopefully um and then you get cool control flow graphs which show you how the kind
of how the execution low go like where it branches off where certain lines of assembly are run and stuff which is quite useful um because then you don't have to like go through and find each jump and see which bits get skipped at certain points and things like that and there's no risk of accidentally running malware on your own system which is ideal um so um this is a really slide because I put no effort into it um so what what other the reasons is this useful for like this is kind of going off of the whole like we're trying to reverse one cry um storyline like what other cool things can you do with um
software reverse engineering well you can obviously work out what software is doing that's kind of in the name and you can modify application Behavior which is known as patching and you can find otherwise difficult to spot vulnerabilities so um to go into patching patching is where you directly modify the code without the source so you go through and you change individual instructions which is really useful because you can like skip authentication checks you could change like a conditional jump to an unconditional jump and cause certain checks to be skipped entirely um you can pretty much change whatever the code does which is very powerful um however padding is a pain because the CPU requires things be in certain length
Trunks and when they're not they have to be padded to be that length however if the padding is off um it will just not execute and pading padding is a pain when you're patching um so I want to talk about return oriented programming but to understand return oriented programming we need to talk about buffer overflows first so um stack data often has a fix siiz buffer like um if you have a char array of size five that's a 5 by buffer if you're if there's nothing controlling the data that goes in the user can sell input data that's larger than five byes and it will overwrite everything that comes after it on the stack so if you for example had a Char
buffer and then an integer you would be able to change the value of the integer by writing into the Char buffer depending on where it lies on the stack um so yeah data over the buffer size will um overwrite other things on the stack which is BU for overflow so um going on to return oriented programming this is super cool because um as I noted before um in in the memory segmentation bit everything other than the Tex segment is non-executable that wasn't always the case in the past the stack used to be executable and people would be able to write what's called Shell Code on the stack and redirect execution to it and like basically
basally you have arbitrary code execution where their where their code is on the stack and they're just redirecting the instruction pointer to point to it you can't do that anymore because the CPU will refuse to execute anything that's in a non-executable um memory segment which is everything other than the text segment in most cases what return oriented programming does is you pop a value from the stack into the instruction pointer using a buer overflow because remember the instruction pointer is saved onto the stack if you're able to overwrite the instruction pointer you can point it to um these things called gadgets which are little bits of assembly within the uh Tex segment because remember the
standard library is a part of the Tex segment and contains loads of functions you can find a couple lines of assembly which you want to run which end with a return statement and then you could have overwritten the next return value so you string to another part of the text segment which contains a couple lines of assembly you want to run that ends with a return and by stringing these together you effectively get arbitrary code execution whilst the stack is still nonexecutable because the code that's being executed is in the text segment not on the stack um it it yeah it's effectively arbitrary code execution right so the bit where everything goes to um live
demo I hope this actually lets me view um what's going on oh yes it does [Music] cool right so um I've already let gidra analyze um this one a cry binary by the way so like um obviously there's a kill switch in um one cry which was what ended it initially however then new versions were created without the kill switch and the whole point of this demo was to try to find the kill switch I spent about 4 hours analyzing a sample of one cry which didn't have the kill switch I was up till like 3: in the morning and I thought I'll run strings which shows you all the strings within the binary and I saw there was no kill
switch there and uh when I downloaded the right binary I found it wasin like 3 minutes and by that point I was done so um so gerer has automatically pointed us to the entry function that's the first function that's run but it's not created by um by the programmer it's this is like default Windows code to kind of set up execution and um if we take a look at functions normally you'll have a win main function but um because of because there's been um some obis symbols have been removed and um therefore you can't identify the win main function through the symbols in the in the binary however you'll always be able to find the entry
function and the entry function will always lead to win main this function over here should be wi main so we get [Music] um okay can you see all this by the way like is this readable what what about from the back no okay um let me know when stuff is big enough that is that better okay so like [Music] um so what grra has done is we have the assembly over here which we can go through but we also have the decompiler output and
um and what we can do is um a lot of the data types in in the function because there's been a bit of observation gidra hasn't automatically been able to identify so undefined for is just an undefined for by um data type of some sort but we can kind of look at what's going on in the code and figure out what they are and redefine them so we can improve the comp uh the decompilation output that gedra has given us by looking at um the code and the assembly and kind of deducing what things are so like for example pu3 points to what's obviously a string so um we can find pu3 which is an
undefined 4 by pointer and if we right click on it we can um retype the variable to be a character pointer then we can go through and do this with like um all the other data types so um so local 50 is copied into pu R4 which is appointed to local 50 and then local 50 um I remember it being somewhere
here I mean it's um it's a 14 by character character array so we can redefine um local 50 to be Char 14 and then that cleansed up a lot of the code because now that gon knows what this is rather than uh because all the other variables were basically it not recognizing that this was one large array rather than multiple variables because of obis now that we've kind of identified that it's a 14 by um character array it's automatically been able to clean the deom decompilation output and um uh and make it much more readable so um so the so these like five lines is an example of where gidra fails to pick up un very like basic patterns these are
created by two lines of assembly these two over here and what this is is basically it is a repetitive move um if it lets me like the this is move sd. rep so it's a repetitive move um moving one thing to moving the string to somewhere else it's a string copy so um if we if we change this to be um to be a decimal value it's kind of a bit more clear what's going on we have um we have Ivar 2 which is a part of this for Loop where it's being constantly decremented and then um and then what we're doing is we're adding four to p 3 p 3 is a pointer to
character so we're advancing the character pointer by 4 bytes each Loop so we're moving along the character array and um we're copying we're copying P R3 into P R4 so we're effectively copying um P R3 into P R4 this is just a string copy between those two variables and um and so then what happens is um internet open URL a will try to open a URL which is obviously the obvious URL is over here and we can rename this to to make it more clear what's going on so like we know this variable is a URL and then we can call P3 the thing that's being copied into URL copy [Music] um which yeah makes it
a lot more clear what's kind of going on with the for Loop
within um sorry I I missed bit I was meant to go over so the um function itself is undefined for however we already know that it's a win main function so we can open up Firefox and we can look up win Main and windows documentation comes up and going through Windows documentation we can look at the function signature um if we can look at a function signature and then copy that into G so that g understands what's being like copied what the function is actually doing
uh it doesn't like the semicolon at the end yeah so um now G has a better idea of what parameters are being passed into it and whenever you identify like a standard Windows function or a function that's that exists within documentation you can go and copy the function signature into grra to then improve the decompilation output and then when we look in um functions we get the win main function which is now identified because we gave it the function signature all this data would normally be present but it was removed due to the fact there was some OBC with um with one cry and that's the case with most malware reverse engineering so anyways um this strange like URL is the kill
switch domain and by registering it it would prevent um one cry from from running because uh because it checks that the kill switch domain isn't registered before continuing execution and it will abort execution when the kill switch domain is registered originally when um when Max registered the kill switch domain it was to tried to track infections of w cry he didn't realize that it would kill w cry but it did which was kind of cool um but yeah I suppose that's like a very ungraceful end to the live
demo and then yeah the end um if if you have any questions um and questions [Music]
yeah
yeah [Music] funun
so um what you do is you'll have when you're overwriting the stack you'll have a bunch of like return addresses and so the first return address will be popped off into um the instruction pointer and that code will be executed and we'll hit another return and if you have another return pointer which you have after that that will then be popped into the stack and that code would be executed and so it's like rather than um have a function itself which can facilitate adding another return pointer you just have lots of return pointers in in one kind of string and yeah each time a return statement is run it will pop up the next um the next return and it will go off
into another part of the codee and then eventually um if you know where what the original um insertion pointer was which you can get through leaking values on the stack you can copy that at the end and cause a function to continue normal execution after having gone off running these random bits of um assembly where you've been returning to like random places um are there any other
questions okay you give another round of