← All talks

Marc Messer - Getting Started in Reverse Engineering

BSides Knoxville45:29458 viewsPublished 2023-05Watch on YouTube ↗
About this talk
Spend an hour learning how to analyze executable files for fun and profit. This helps lead into low-level computing, computer architecture, and operating concepts in a hands-on fashion. With a bit of knowledge, you can get started on malware analysis, reverse engineering CTF tasks, vuln research, and more. In this talk I will briefly introduce some fundamental RE concepts and how they relate: - Going from source code to executable to process - Executable formats and sections - Linking and loading - String analysis - Basics of reading x86 - Basics of stack frames - Basic instruction pointer overflow exploit
Show transcript [en]

everybody give it up for Mark yes sir okay um can everybody hear me okay sweet okay um I'll go ahead and get started then um so my name is Mark Messer uh I'm local to Knoxville so I'm super excited to speak to everybody here I used to live like right up the road by Fulton so that's that's how local I am actually um and yeah reverse engineering is a really self-indulgent uh tedious process and I'm thrilled to have a room of people who want to listen to hear me talk about it for a bit so um I don't really have that much of an introduction for myself because kind of the less I talk about work the more fun I can have talking about this but I work in the defense space primarily in incident response um it's fun we get to see some some novel malware and it puts you in a situation where it's actually useful to have some reverse engineering skills on hand where a lot of times that's sort of fiscally irresponsible you might say um so yeah we'll go ahead and get into it and then um I made a lot of Art and illustrations for this I've been working on this for a while so I'm really excited to present here to you guys um so at a high level what's reverse engineering you know forward engineering is what we normally do where we we have some kind of need for for a thing we go ahead and we write it we make something and then we have our output and reverse engineering we we don't have all of that information typically we may not have any source code we may just have like a binary you may just see network traffic you may see whatever you know within this I'm mostly going to be talking about just binaries and like um windows executables but we'll get into all that so if we have this binary and we don't have any source code then how are we going to try and learn more about it you know there's the obvious thing of like running it and seeing what happens if we suspect it's malware or something that might be in a virtual machine or some sort of environment that's designed for us to you know see if it drops a file or opens up network connections something like that you also may do static analysis where you're really just looking at the object itself and seeing what you can sort of give in about its process or properties without even running it so then what's the output like why do we even do any of this for a lot of stuff it's because you want to write Yara when I first got into it it was because I was working on working I was I was a teenager and I was interested cheating in games because I'm terrible at them but I still like winning um I don't do that as much anymore but extracting Intel you might want to learn about your adversaries you might want to um you might have a product that's EOL or something and I don't know if that's legal or not necessarily but you might need to learn more about it so you can continue keeping it around if you absolutely have to I'm learning to write better malware if you're interested in writing better malware going and looking at a lot of malware is a great way to go about that DRM removal software cracking that kind of stuff I should warn I guess at this point too if you're like a like a really big evangelist of um DRM software and like terms of service and stuff then there's a really good talk across the way that will probably borrow you or bother you a lot less so then another thing of note for this is um due to the tedium it's it's sort of like ctfing if anyone played in any capture the flag events you know ctfs can be very frustrating because you might just get an image or something you don't know if you're supposed to use that for osin you don't know if it's like a layered PNG or something and you have to extract a flag from it you know you just have an image and you're told like get me a flag um that can be just a really overwhelming process and really frustrating and you feel like you fail and so I think there's a few things that you can do to try and make that a little bit easier um so taking what you know and using that to allow yourself to ask questions about what you don't know a lot of times if we're working on an engagement or something we call that like pivoting like we we can say okay we notice this traffic we're going to utilize that see what else we can learn about what's going on um I would say too and this is really helpful for software development as well anything that seems initially complicated you probably just have to break it into smaller less complicated steps sort of like um sort of like taking like a really complex algorithm or really difficult math problem or something like that you know how do you make it digestible and something that you can actually approach and then failure is something that you just have to be super comfortable with um I would say I'm sort of like the Michael Jordan of failure in a lot of ways in in that you can always learn something out of that failure and just reapproach it and and go again and see what you can do because most of the time if you're just trying to figure out how something works you know he started out not knowing how it works so it can't it can't get any worse um that's maybe not a good mindset but that's how I think about it but anyhow so so to jump into this we're going to go into a few technical terms and really what I'm hoping to do here is say hey you know there's all these sort of prerequisite topics that you need to to be comfortable saying I might not know everything about this but how do I take enough information of what I do know to start approaching you know the environment in which a processor executable runs um the structure of of that binary itself and then understanding some of the instructions that are running to to the point where we're essentially looking at a mechanical process at a certain point right when we get down to sort of the electrical engineering aspects of it and we're not really going to go too far into the weeds of that but just know like we're going to introduce a lot of Concepts to show where they matter within the reverse engineering context and it's okay to feel like you still don't understand a lot of stuff because I feel like that every day in reverse engineering is like my daily job um so here we're going to talk about forward engineering really quick so we've got source code on the left hey look at that we've got source code on the left right over here and we're just doing like a really simple C Hello World um and we know that eventually we have an executable which if we looked at it in a hex editor we would just see like hex output gibberish right um for those of you who've looked at executable executables before like you might immediately notice like oh this is a PE file like a like a Windows dll or a Windows exe something like that and in between we have the compiling process and we have linking um so this is important for us to note because a lot of times when people are trying to reverse engineer stuff you're just going to look at like decompiling something like that which is a bit of a misnomer right because you you can't really decompile something per se it's been compiled but you are wanting to learn how you can understand that compiled output and maybe think of it in like a c like fashion where we we have sort of a pseudo code for apps that that looks like this depending on your goals so once something is compiled say with like GCC or something like that we have this compiled code output that starts to look a little bit more um obtuse is maybe the way to say it so we have we have some instructions that if you've never looked at assembly before they probably look pretty foreign and if you have looked at assembly before um they still probably look pretty foreign and that's okay um you know we've got some instructions here that we'll talk a little bit about later but right now just don't even sweat it although one thing to note is how many different instructions do we really see here like I I can't count that high so I guess like five or six something like that and we see them kind of repeated so really there's not that much to memorize if you think about it um but so after we have that compiled output we have something called linking and linking is really important because a lot of times our code isn't necessarily just the code we wrote for example in in this source code right here we have printf hello world and then we you know return to whatever um it's main so I guess it exits the program so that printf for example is not code that we we wrote in there we didn't specify how printf works like we didn't have to go write custom code to go and output something to the console which is what that's doing somewhere in the machine that is referenced and and pulled in so that when we run this executable um it's it's executing this code that we did not use that we symbolically reference with printf so printf is therefore called a symbol you'll hear uh you'll hear me call that um term out several times and then the the aspect of making that code accessible that we did not write is called linking and so in this context it's going to be talking about um you know like external dlls or something like that that we're not Reinventing the wheel we're not writing our own custom thing you know we're just we're just printing that out um so hex code let's talk a little bit about hex code and why we're using it uh most of you may know this some of you may not but hex we're just talking about something in base 16 that allows us to shorten things so um as we represent them in sort of binary um methods so so looking at base 10 for example the value 1000 we can see that takes four characters for us to represent and then if we look at the value 1000 in binary it takes a little bit more to represent and unless you're just like a super genius you're not going to look at that binary sequence and be like oh that's a thousand of course some people may be able to do that I definitely am not and then if you're really really sharp you might look at that hex 3e8 and immediately recognize that that's a thousand but why do we really have it represented that way it's because it's just easier for us to shorten our representations of this information it's also easier for us to read as bytes you know if you combined down here you know these two um whoops these two you know sections 45a and the 9000 you know that's that's a byte so it's it's super easy to just represent everything that way and parse through it and understand what we're looking at um and what does it represent in this context a lot of times it's opcodes so if you think about the actual circuitry on a circuit calling out specific things like moving something from one register to another is going to be like a literal op code that's you know a capability of the chip um so b805 for example in this which I have over here on the right would be moving to the eax register the value five and the eax register think of that as like a variable space that's just baked onto the chip like it's physically holding those bits um data we'll see some data in there as well just strings of text other values that you parse out from from hex that kind of thing so a little code in getting us a bunch of output before we dive into more of the sort of fun stuff um one of the weird aspects that it took me a bit to understand is like why why does something like print hello world spit out this like huge assortment of binary you know understanding that you have to have you know stuff linked in and make it run that's still odd um so just note that like a lot of what you see in binaries is actually just meta information that is there so that the binary can be loaded in to the operating system executed and um then you'll be running whatever you want so that said there's a bunch of headers that are in binaries we're going to talk about that that's an important aspect of looking at them and then you'll have different sections within the binary that different different parts of your code are going to be in there so for example in the hello world you know we had the string of text hello world that's not executing code that's that's a data section so there's going to be a section that literally just holds read-only data and that would be going there and the rest of your code would be in like a different section of the binary the way I think of this is sort of an analogy to books which I have on here of course um if you pick up a book you don't just like flip to the first page and then start reading it and you know wonder who this copyright character is and who the Library of Congress is you're going to know like oh okay I have to go to a table of contents I'm going to go to whatever and that's where I start and the operating system essentially needs all that type of information too you're also going to have meta information in a book you would have like appendices or something like that think of that like our data sections or relocation tables stuff like that that's that's all going to be an aspect of it so let's talk about portable executable files specifically so um so those are going to be windows executables so most of the time when you see something like that you're going to think like oh I have a exe you know I want to play a steam game or whatever I run the exe that runs the program you'll also see dlls they're actually PE files as well and each of these follows a predictable structure because the operating system needs to be able to like pull that into memory and run it obviously so we will actually go and look at a binary in a minute but that structure you know we have in something called the MZ header which is there because Windows basically says hey if you try to run this on MS-DOS we don't want it to just kill the system we want this to just spit out of things saying you can't run this on dos and then move on so there's actually a Dos program baked into the front of every single PE just in case you try to run it because Windows is really really dedicated to ensuring backwards compatibility for things that shouldn't exist but um I don't know we're not going to go down that route that's that's the whole talk um so then let's talk a little bit about sections that we're going to see when we look at these so I've already mentioned a little bit about that but so we'll see like a DOT text section that's where our executable code is we'll see a DOT data section that's where our you know writable data typically is um read-only data our data stuff like that and and really don't feel like you have to memorize all this stuff I'm constantly referencing documentation um all day like I have books on my desk I have usually a window open with something you know all sorts of stuff so um really it all just kind of comes with familiarity so let's look at an actual PE file headers in here I think this is actually is this legible to people out there thumbs up thumbs down okay cool yeah I wasn't sure what to expect so that's a happy circumstance but um so if we look in here you can see that we have some some bytes that I've called out in certain different colors right and that's because if we're looking at just a raw hex dump of some of this information then we can tell like oh this this byte over here is referencing that this is a PE file so that when Windows tries to load it it's going to know how to treat that file we'll see um you know let's go see down here we can see the image base that the the code is loaded into and from there we calculate certain things called file offsets in the sense that like we know that this is going to be mapped at some location into memory and that if we take that location that it's mapped into we can see something like the base of code ox1000 right there and we can know that um you know a thousand bytes or whatever that represents into the code from whatever that base base location is we can see the beginning of our code so some of this is saying hey you know from wherever this is loaded into memory you count out this many bytes or or you know what have you and therefore you can find certain things um so I think we might even be able to see the address of the code entry point right here and again that's just a that's just a raw thing you can see in the hex dumps so a good way of going into this kind of thing really to familiarize yourself is is in some ways just opening something up in a hex editor looking at it and seeing what you can figure out which is coincidentally what we are about to do right now so for this I'm just going to pull up a hex editor I'm trying to make all this stuff um concept focused and not really tool focused because like if you're if you want to ask the right questions that matters a lot more than like which hex editor are you doing it in it doesn't it doesn't really matter they they display hex you can go look at it cool what else do you need um and then just referencing whatever you need to so here we have a binary that's from ophir harpaz I've linked to her Twitter later in here but she wrote a site called begin.re that I think was pretty helpful and she had some really nice binaries for beginning to get into analysis so let's take a look at this let's say what we think it's maybe doing and we'll see if we think that we could crack it so here is something called um o10 hex editor okay that doesn't really make much more space but that's fine um so here we can see that we have certain certain header sections called out and just highlighted so here for instance is that MZ header for in case this is loaded into like a Dos machine and then you can even see below that this program cannot be run in DOS mode and like this this is a full program like if you just cut it out and you run it in a Dos machine it would work and in fact what's kind of funny is you can just change like because it's a hex editor and we can't edit this we could just go change that text and have it say whatever we want um it wouldn't really do much for us but that is a possibility but so let's go ahead and take a look and see what sections we have so we see we have a text section we have our data section so again that's our read-only data section we have a data section we have resources we have relocations so going through and looking in here we can see you know our text is just going to be assembly op codes but let's go ahead and look at our read-only data and see what we can find so in here we've started to see string data that hopefully you guys can see from right here but it's it's just asking us you know hey enter a password and then it has this um very suspicious lead speak crack me in there who knows what that's for and then it either says correct or wrong password so without knowing hardly anything about this binary I think pretty much everyone in this room could just guess like hey you enter in some kind of value you compare it to some other value and then you either get a yes or a no and right now I'm pretty sure if we were to just run this executable which we can't because I'm on a Mac and it's an EXE um we would be able to most likely solve this crack me and a lot of a lot of things you look at just going and identifying strings and like the read-only section like if you just pull up your average run-of-the-mill ransomware sample um not to I guess endorse people doing that willy-nilly necessarily but if you did and you just wanted to go and find the ransom note you'll probably see it just in plain text because whatever whatever method they use to get it onto your system was probably how they worried about hiding it um as you know anything that goes and encrypts everything on your drive is is kind of screaming alarm Bells at that point but anyhow let's go ahead and jump back to the presentation and we're making great time by the way y'all this is fantastic so um please don't run for the doors but we're going to talk about assembly for a little bit and uh hopefully they've chained t