Dr. Matthew Miller - Reverse Engineering 101

Name: Dr. Matthew Miller - Reverse Engineering 101
Uploaded: 2026-06-09
Duration: 51 min 1 s
Description: An introductory tour of software reverse engineering aimed at developers and security folks intimidated by the field. Covers the gap between source and assembly, hexadecimal and text encodings, static versus dynamic analysis, common tools (GDB, Ghidra, IDA, x64dbg, radare2), and how to recognize com

BSides KC 202651:0180 viewsPublished 2026-06Watch on YouTube ↗

Speakers

Matthew Miller

Tags

CategoryTechnical

TopicReverse Engineering

DifficultyIntro

StyleTalk

Mentioned in this talk

Tools used

Compiler Explorer Cutter GDB Ghidra Hex Fiend IDA Pro LLDB Ltrace Radare2 readelf strace Voltron WinDbg x64dbg

About this talk

An introductory tour of software reverse engineering aimed at developers and security folks intimidated by the field. Covers the gap between source and assembly, hexadecimal and text encodings, static versus dynamic analysis, common tools (GDB, Ghidra, IDA, x64dbg, radare2), and how to recognize compiled constructs like functions, branches, and system calls in stripped binaries.

Show original YouTube description

Reverse engineering appears to be one of the black arts of software development. We will explore why this type of learning is difficult, why information is lost and where to start learning. This talk will demonstrate the fundamental primites of reverse engineering, show the current tools and provide a path to learning to reverse engineer software. Software reverse engineering (SRE) is often perceived as a "black art" reserved for elite security researchers and malware analysts. However, as software complexity scales and proprietary systems proliferate, the ability to deconstruct compiled binaries has become a vital skill for developers, security auditors, and hobbyists alike. This presentation provides a structured entry point into the world of SRE, moving beyond the intimidation factor to focus on fundamental methodologies. We will explore the core pillars of the reverse engineering workflow, including: 1) The Translation Gap: Understanding how high-level code transforms into assembly and machine code. 2) Static vs. Dynamic Analysis: Balancing the study of dead code in disassemblers with the real-time observation of program behavior in debuggers. 3) Toolchain Mastery: An introduction to essential industry tools such as Ghidra, IDA Pro, and x64dbg. 4) Pattern Recognition: Learning to identify common programming constructs—such as if, loops, switches, and function calls—within stripped binaries.

Show transcript [en]

All right. How are you guys doing? >> Excited. All right. That's what I want to hear. I'll have to see if I can use a clicker. Wrong way. All right. So, I am supposed to stay between these orange lines. So, I will point at this one just because it's a little closer. Um, who am I? Clearly, I'm a dad. I'm also the funcle. So, I'm the fun uncle. Um, and I think these are in order of importance in my life. So, I am a gardener and I make all kinds of jelly. So, am I does that make me a jammer? I don't know. Maybe, maybe not. Um, if you like elderberry, I make that. Um, I have

been an expert witness. Um, and I work as a cso for a small company that does family history and health risk assessment. if anybody's interested in that. So, today we're going to talk about reverse engineering. Um, some of the problems that people have when getting into it. Um, the read versus run. Um, as far as when you're analyzing it, we'll go through a bunch of different tools. Um, and then we'll look at some patterns. So, what I like to say is that anybody can get into reverse engineering. So, for example, I went to school classically trained as a computer scientist and I just knew there was I wrote stuff in C or whatever language it

was and there's this magical thing called a compiler and then I just ran my code and sometimes it broke, right? And I went to teach at Dakota State University and they needed somebody to do re reverse engineering. So, I went and took a class with Chris Eagle at Black Hat, learned how to start doing that, and then I got into doing some expert witness for some of the uh techniques that the FBI used. Um, so anybody can do it. I just think it takes a little bit of um work to get in there. So, why it is hard? So, if you're a developer, any developers here? All right. Right. You're going to write something like this. You're going to

write some source code and then magically it's going to get turned into this. Anybody in here know how to read this? Two people. All right. Perfect. Right. So the important part is we're going to learn about what some of these things are. I'm not going to make you an expert because that's why this is called reversing 101. All right. So in our source code, we're going to have variable names. We're going to define functions. You'll probably see the functions. We have comments. There are zero. Oh, go back. There are zero comments in that code, right? There are no variable names in that code, right? All I'm going to see is strings, registers. I might see external

functions. Depends on the type of software you're looking at, right? So, you have to decide, am I looking at malware, which is trying to hide itself, or am I just looking at software, right? Reversing sort of encompasses both worlds, right? So, if this is just normal software, then it's fine if I run it. Um, and they probably will not hide stuff, right? unless they're trying to hide proprietary pieces of information. And then at the very bottom, obuscation, right? So if you get into obuscation, right, there's a lot of techniques. It's like what I like to do. I like to make puzzles that require you to deoffuscate them. All right. So one of the first things

you'll know need to do is you'll have to learn what hexadimal is. How many people know or read hexadimal? A few. A lot of hands actually. All right. So hexodimal we got numbers A 0 through 9ine and A through F. Um you can see them here in the first column. Um and the reason we use hex is you can store four digits binary digit of information in each one of the hex digits. So I can use two hex numbers to describe one bite which is the smallest increment that any computer can compute in. And then we also have numbers in computers. And so you'll see them in binary. There is a method to doing it.

I'm not going to go over that today. Um, but you'll also see numbers that are signed, right? And this is using TW's complement. This goes back to how computers were first designed, right? Choose complement allows you to use all of the bits. Um, but what it what it means is the very top bit is your sign bit and if that is um set then you have a negative number and if it is not set, then you have a positive number. So, we'll see some we'll probably see some examples here. The second or the third type of information you need to understand is how to read text, right? So you have a couple different possibilities. One is

you have ASKI. I got a nice ASI table here, right? And that's going to describe all of your standard ASI characters using one bite because it was the American version. Um, when you look at new apps nowadays, right, you're going to see that it's going to use Unicode. UTF8 is probably the the nicest one just because it's variable length. So you can do all of those Chinese characters and other languages um within that. Um and then here to note like most of these are null terminated, right? So if you see a C program, right, it's going to be null terminated. If you do something in Rust or one of these other languages, right, you may not see null

termination. You might see run um length in there. And so those are going to change how you're going to look at that binary. So the run versus read, right? static analysis. You go through, you have a binary, you look at it, you figure out what the instructions are. Um, and then you have to make assumptions how the code is run, right? Sometimes people don't understand how different pieces work, right? And so if you have malicious software, it can hide inside of there or it could pull additional code that it's going to go ahead and execute. So static analysis is good, especially if you've got malware. Um, again, you have to understand what is the situation in which you're running

the code in, right? Do I need to do it statically or do I want to run it and see what it does? All right. And then if you have a disassembler, right, there's one called the dead listing. It just starts at the beginning and just right spits out all the instructions, right? And what we have is we're going to have branches, right? So, how many people know what a branch is in a program? All right. So, a branch is I'm going to do either do A or I'm going to do B or I just might do A and then I'll continue going on. Right? And so our disassemblers are recursive and iterative so that when they hit a

branch they say I'm going to take both paths. I'm going to look at both sides and I'm going to say is there code over here and is there code over here. Now if I'm writing malicious software can I take advantage of that? Yeah. I might throw in lots and lots and lots of branches that don't do anything. So you have to go and look at them and try and figure out what they do. Um and then you just keep doing this until you are done. Again we can hide information in these if we choose to. That's for a different talk. Right? This is to introduce you to how would you do reverse engineering. All right. The second type is going to

be dynamic analysis. Right? So, we're going to take our program and we're just going to run it and we might throw a bunch of instrumentation features on it. You might run it in a VM and see, okay, what does it connect to on the internet? What files does it touch? Right? So, this is another method of doing it. Um, and when you do that, right, you're going to have it in an isolated environment hopefully, unless it's your own program. Um, and then you might put in break points to see if I can find where that code is executing, right? If it's hiding things. All right. So, I started teaching and when the first time I taught, I had all my

students were using Visual Studio. How many people think Visual Studio setup is enjoyable? How many thinks it's a nightmare? A lot, right? Like a lot of things on Microsoft, it's slow. It's a pain in the butt, right? So I created so this is a link to a open- source GitHub project. Um it's a library that uses NASM. So NASM has the same syntax as Microsoft. Microsoft has one called MASM. And if you look at all the disassembly we're going to look at, right, it's going to be x86 mazm type assembly. There's another one called AT&T. Anybody who likes AT&T is torturing themselves. There's all the dollar sign or percents. It's just a mess. Um so this project

lets you run on Linux, right? You can run it in Docker, you can run it on a Mac, you can run it in Linux, you can also run it on Windows as well. Again, you have to install a few things. Um, but it lets you do that. And then it has some tools for disassembling the code. All right. So, when you get started, you might say, well, I don't have that, right? There's a great How many people have heard of Godbolt? Anybody? A couple. All right. So, this is a super awesome project, right? So in this project you have you can take your C program and you can say I want to look at if you look over here so this is x64

compiled with GCC version 13.2 too, right? You can pick ARM, you can pick um you could pick um ARM th mode, you can pick um Windows, you can pick GCC, right? It gives you all of these options to compile it. And then it's going to show you oop wrong button. It's going to show you the assembly code that is for that program. And if you guys look, I talked about hex. If you look under it, it also shows the hex bytes of that. Right? So again, when I started out, right, I would see people talk about, oh, there's some shell code and I would be like, that's magic. I don't know how any of that

works. Right? That's just a bunch of these bytes are the actual thing that gets run on your processor, right? So if you see the bytes, if you're a processor and you see the bytes 55 and hex, you know that's going to be a push RBP, right? Right? So, it's going to push that register to the stack and then as you go, right, you'll want to change it. So, the last one had some stuff in there. This one, it's doing a pxor and you're like, well, what is that? Right? You guys all can look it up, right? But these are different instructions. So, that's doing an exor of a register in itself, which does anybody know what that does?

Sets it to zero, right? Which is what our program did, right? And you can see the color coding, right? This is a terrible color of yellow. And these two instructions here are also a terrible color in yellow. Right? If you hover over it, it'll show you which ones it is. You can make your program as long or as short as you want. I mean, it's a really good tool for again getting in like zero barrier, right? You just have to know some C. And if you don't know C, ask Claude. He'll tell you how what C looks like. All right. So the thing to know about assembly is that we have these things called registers, right? A register is a little

bit of memory, right? That can be manipulated by the processor. It runs at clock speed. So if your clock is 3 GHz, you can update registers at 3 GHz. What does it do? Nothing special, right? All it's doing is math. So add, subtract, multiply, divide, bitwise math. So or, exor, and all of these fun things. It can move data to a register. It can move data from a register and then we can do comparisons. So comparison is is something zero. So I can say compare rax to zero and then what it does is it sets a bunch of flags and when it sets flags I can do conditional jumps. So conditional jump is how an if is implemented. So if you

want to say if my register is zero then I want to go and do this bit of code over here or if it's not zero go and do this bit of code over here. All right. And then with our registers, we have a few special ones, right? So the I put EIP. That should have been RIP. My bad. Um that's the 32-bit version. Uh so this is on x86. It's the um the extended which is R um instruction pointer or on ARM it's called the program counter. We have stuff related to the stack. We'll talk about that. So we got RSP and RBP. And then on ARM we got SP and FP, right? And these are the names of the

registers, the things that actually run on there. And then we'll have some general purpose registers. So I'll show a stack a example here that shows both of them. Right? So these are the same equivalents on each of the architectures. Right? Most modern processors are running one of these two. Right? If you got a Mac, it actually runs both, which is nice. Um, but we can we got our instruction pointer. We got two registers associated with the stack. We've got a bunch of general purpose registers. ARM has a little bit more. Um, and then ARM also has a thing called the link register. The link register is just a nice way of if you are a function

that doesn't call any other functions, you don't have to save where you're going to go next on top of the stack, which is super nice. Oh, I forgot to mention there's exactly one AI generated picture in this slide deck. I will let you guys figure out which one it is. All right. So, the stack. So, I call it a stack of plates that is on the ceiling because when I think of stacks, I think of high addresses, right? Addresses going up um in number from the bottom and the stack grows down, right? So, I always call it a stack of plates that grows from the ceiling. What does the stack do? The stack stores local variables, right? So,

if I use a variable inside my function, it's going to be stored on the stack. if I'm need a parameter to it, it might be stored on the stack. We'll talk about that. Um, or if I'm going somewhere when I'm done, it actually stores that on there, right? You guys have probably heard about either Stack Smashing or Stack Overflow. Like Stack Overflow the thing, not the website. Um, this is where we get it, right? If you overflow the stack means you keep calling functions and the stack runs out of space and then your computer crashes. And here's a nice visualization that I came up with. So when we call a function on top of the stack, remember they grow

this way. They're going down. So I have my parameters up here. The return address gets pushed onto the stack. The stack pointer points to it. And as you can tell, my brain was in 32-bit mode. These are these all have ease on them. Um, in the 64-bit version, R is replaced with E. I taught I taught assembly for quite a number of years, and I always taught 32-bit. Um, and the reason I taught 32-bit is at that time, most malware was written on 32-bit because you didn't know if you were going to be running on a 64-bit machine or not. And so malware mostly was written in 32-bit. I'm not sure today if that's true, but

that was why I taught it. Um so once we get in the function we set this thing called a base pointer and so the reason we use the base pointer is this is for programmers right programmers knew that if I have this always points to the old address and at plus 8 we have our return address then it knows that the par the first parameter is at eBP + 10 that's in hex so that's 16 bytes right and the next one is at And then we also know that our local variables are going to be stored below it. So if you see a number with a plus, you know that that is a parameter coming

into that function. And if you see it with a negative number, that means we have a local variable that's being accessed. Also feel free to raise your hand if you have questions or not. All right. So what are parameters? If you're writing a C function, right, a parameter is going to specify how that code is going to operate. Again, on x86, all parameters were stored in the stack, right? So, if you wanted to call a function, I have to push push push push call my function. And then I'd have to oh, do I need to pop or do I just restore from there? Now on x64 which I think this was um uh Linux first did this they had a

version called fast call. So it would push up it would use up to eight registers because we had more registers um on 64-bit. And so the first six integers or pointers were stored in RDI, RSI, RDX um and so on. And then the floatingoint numbers we have these additional registers I didn't talk about but XMM anybody know what XMM is? No. So these are are our SIMD registers. So they are uh able to be um I think they're 128 bits by default. Um so these are very large registers. The registers we're talking about here are 64 bits. There's XMM, YM, ZMM, which goes up to 512 bits for a register that it can hold data. And so if you have floatingoint

numbers, right, you're going to use the XMM registers in order to do that. So if you looked at my previous example, I had that floatingoint one. You get you saw the PXOR of XMM0 with itself, right? That's because it was using a floatingoint number. All right. Um, no. Don't worry. We'll skip that one. All right. So, how do programs work? Right? So, there's really only a few different ways that a program can do anything, right? So, the first one is internal processing, right? It just executes instructions. I'm going to do this and this and this and this and this. You're going to get to a point where you can't do it yourself, right? So, if you need to change the memory

addressing of a piece of memory, right? you might have to use either a system call or some library or function in order to do that for you. Right? So if you ever look at mp protect which is a function to change your permissions on memory, right? That's a library. So if your program is doing anything other than its own computation, it's going to have to use something else to do it, right? And that means that you're going to be able to figure out what it does. So if it needs to talk to the network, it has to do something else. It either has to call a library like nll or it has to do a sis call, right? Which

is a program that just it sets up a bunch of registers and then calls a sis call which it used to be interrupt. And when it does that the transfer from the processing of that code to jumping into the kernel, right? Because the kernel has additional permissions that we as normal code don't have access to. And so you have to enable that. You have to do that either through interrupts, library or function calls or possibly interprocess communication. Generally the way interprocess communication works is you're going to write to some shared piece of memory and each process it has the ability to see that and so then they can go ahead and execute that. All right, dynamic link libraries.

So to do external things a lot of code uses dynamic libraries, right? So these are pieces of code that are shared. Many applications can use them. NTDL right on Windows. So Windows they call them dynamically linked libraries DLL. On the Mac they're called DIBs. I don't know where they named those. Um and then Linux we got shared objects, right? And these are all pieces that can be used in order to um call code that you don't have, right? or you don't need like recently we've seen a lot of DLL attacks where you put a DL in a directory that a program is going to look and then it loads that DL automatically. So when you get started what are the

easiest tools? So these are all command line tools. Anybody on Linux can run any of these right? A really useful one is strings. Again if you have a just standard program strings is going to tell you a lot. It actually will tell you what version of GCC was used a lot of times, right? It'll tell you URLs, right? You'll get format strings. All of that will come out just in strings. And then NM and LDD um are going to parse. These are again these are Linux specific, right? So they will go through and read a object file. So an object file is I have a compiler. I can create an object file. Um, and then I can use

the linker to put those together to create an executable file. And then LDD is going to tell you which shared libraries you have. Yes. >> Oh, go ahead.

The these are reading the static binary. So if you have a binary, it'll tell you which ones it says that it wants to use, not which ones are in the kernel. So if you had a different kernel, it would tell you the right ones that it wants to have. Yep. Thank you for the question. All right. And then a couple others. So I tend to use object dump. Um so I have again the program that I wrote, right, has the ability to um compile code. I also include in there a script called dump that sets the default so that it picks the function that you choose um as well as it picks x86 instead of the

stinking AT&T syntax. Um read elf will read additional information from ELF. So ELF is the format for Linux specific binaries. So all Linux binaries use the ELF format in order to store their data. Um and then there are two sort of open source tools that people use. I don't really use these myself. Um, but Radar 2, there are some fans out there who love that one. And then Cutter is another one that is a fork of that. And these are tools that you can use. They are free. They are the command line. And so they're pretty easy to get into if you really want to. All right. And then a example of dynamic analysis. So I have used um tracing um

because if you have a giant application right and you want to figure out okay what things does this do if I just open it up and you don't have time because your pentest is only a week to go through and read every single function you can hook on some dynam dynamic tracing and it'll run the program and it'll tell you every single system call that that program did when it loaded and then as you interacted with it right so for example I found um on a Mac, right? I was doing a pen test of a piece of software and I found that it was trying to open up and execute a script that didn't exist. So I made the script and

it ran as root. So I found a privilege escalation just on the fa fact that it was making a system call to run a binary even though it didn't exist. Right? So that's a super easy way to just find like go through and find any time a program does exec on anything like if it's doing exec which is go and run this process and tell me when it's done that's something you might want to look at and then I love debugging right there's two types of debuggers there's people who use a debugger and there's people who use print right both of them are valid right sometimes you're in production and you can't hook a debugger

to something. Um, but what a debugger does is it allows you to run a program and you can tell it I want to load this program. I want to stop at this spot and then I want to go and look at each instruction as we're going. Right? The one I like is GDB. Um, there's also because I run a Mac, LLDB is the default. Um, I don't like it as well. Um, but you know, some people like that. And then X I typed that wrong. x64 debug um is another one that runs on Windows. So if you're a Windows developer, right, you can totally use that one. Um the the other two are generally on Linux. Um and

then Windows has I don't know, they call it wind dbag, right? The Windows debugger. Um you can also use that one, right? Any of those are totally fine to use. Now when I use a debugger, it is impossible to see on this screen. If you were to look, you would see all of the registers here, right? So the the default debugger does not include that. And so every time I execute an instruction, it's going to print out all of the registers. It's printing out what the stack looks like. This is showing a listing of the program that I'm executing where this is the line that's going to get executed next. These are the previous ones, and these are the

ones from the future. And then there's a little command at the bottom that lets you interact with that. Um there are a couple other ones that are also available. So some people like Voltron, some people really like pone debug, right? So these are additional libraries that you can use. Again, if you're on Linux and you want to turn into obviously and so I put in the uh cheat sheet. I didn't put in my VI cheat sheet. How many people run vi? How many people can quit vi? Um, I use DI cuz I took an operating system class and that was the only editor that was available. So, you had to learn that. But GDP is really nice,

right? So, you can load a program. You can say, I want to set a breakpoint. It'll run until it hits that break point, right? You can say, I want to look at the registers. If you don't have a GF installed or something like that, you can print out variables or memory. You can tell it to continue. The one I always use is step construction. So that is going to do exactly one instruction at the assembly level. Now you'll want to use next instruction if you um use for example printf. How many people want to debug printf? Probably zero, right? Nobody cares how that's implemented. So the next instruction says go and call this instruction and when you're done come

back to me and show me the next instruction after lesson. So if you learn these right you can use GDB and if you install the plugins it makes it super nice to look at. Additionally, there are a lot of hex editors, right? So, again, we talked about hex. That is a way we can very easily see something. I don't know how many people can read binary for more than 5 minutes and not get confused because there's a lot of ones and zeros, right? But I can look in a hex dump and I can say, "Oh, 55. I know that's pushing BB. Okay, now I now I can kind of understand some things." Or C3, right? That's a return.

I land.

All right. All right. So, the thing I think most of you probably would like to do is not write assembly, but I think you know when we looked at the keynote, I think learning how to write assembly is going to make you a better reverse engineer, right? practicing it, struggling with it for a while is going to make better cuz then when you go into one of these um disassemblers, it's going to make a lot more sense. You're going to be like, "Oh yeah, that's why it's doing this and it's from this." Right? And I think the other thing is if you compile your own code, even if you write it in C and then

you look at the the executable, you were like, "Oh, why is it doing that? Oh, I like I never saw that before." Like this is something that's happened to me, right? Like I used to do a lot of you know 32-bit code and then they switched to using 64-bit right and there's additional instructions that like weren't there with it right or they changed because it's you know different compilers. So ones you can use ID pro we got binary engine we got these are just three I'm not saying this is an exhaustive list but these are three pretty good ones. Slider Pro used to be really expensive, probably still is, but it is also free, right? So, you can go

download the free version. Um, they gave you a temporary license for so many days. Um, that you have an SDK. It used to be the standard and this is what it looks like. So, I got my functions and then like you guys might see stuff that makes sense like I see that it's calling print f, right? So if you're used to programming in C, that makes sense. Up here, you're like, I don't know what is this LEA that actually just loads the address of the data that it wants to print. It loads it in the register that it needs to use. It'll print at. And so again, IDA, this is for non-commercial use. Um, but it used to

be the standard that everybody used. And then there was like an up andcoming startup. I had a shirt that has it on it. I've used it multiple times. Binary Ninja. um they are clean the veins they have a nice highle um view of things right they also have a pseudo C1 they have the assembly um and they are free as well um if you use their cloud version right so that's a free version it's less expensive than Rita um their price has been going up it has a much prettier UI that's that's really nice um but they're they're pretty much the same also has a Rust decompiler, right? So, if you have malware, which right now malware authors

are using different languages than just compiled C, and so our tools are having to catch up with these new trends that we have. So, here is an example. Again, it's hard to read. Um, this is in binary ninja and this is in the disassembly view. So you can see this is oh a full page of stuff right but if I switch it to using the pseudo C version now I get something that looks like a looks like I don't know anybody who is going to be doing a bunch of casting of pointers to be honest right that's like not fun but now I've got six lines right so that's like a lot more readable again if you have a piece of

code that is not obuscated right to break the disassembly. So again, right, that is all of these have these abilities, okay, which is really nice. And the only reason that IDA has a free version is that the NSA released DEDRA, >> right? So Dedra is an open- source. The NSA publishes it on GitHub. I don't think they're out to get you. Your mileage may vary. Right? It is written in Java. It looks like it's written in Java. I don't know how many of you have used Java, but it looks terrible to be honest. Um, but it is free and you can write plugins for it. Okay, so this doesn't look that much different, right? If you if you, you

know, stand back and blur your eyes a little bit, it looks a lot like IDA Pro, right? It's not really that much different. They do have a free disassembler, right? So again, this is mostly doing it now. I don't know how many of you know, but like sis call is not a real thing inside of a C program, but they put in, hey, there's a sys call here. So, you're going to have to go look it up. Um, it's pretty easy to look up um and figure out, okay, what is this sys call doing, right? And there are different things those different sys calls. All right, I still got time. All right, so today I I I put in the talk we talked

about patterns. So here are some very simple patterns but these are the basic patterns that make anything turning complete right you have to have if selection you have to have loops and then code right with the three things that makes something ting completely that's a computer science thing computer scientist all right so I got a nice if example right if x is greater than y now I will say that if you turn on optimization on your compiler Guess what it'll do? >> It'll take out the if and it'll actually take out the print f and turn it into books. Oh, thank you. >> Which is to just print it straight. So, by default, my project just turns off

optimization just because that way you can see what it'll do um without having to like force it be through a scanf or um be an argument to me. So we go through and we will see some things that are maybe make sense. Um so we see a couple of constants. These are in hex. A in hex is 10. 14 in hex. 16 + 4 is 20. Right? So we can see those. Remember what I talked about. If you see a minus, right? That's a local variable. So RBP minus 8 is going to be X. Then RBP minus 4 is going to be Y. Okay. So then it's taking well it's loading we got eight. So we got X here, we got Y here. It's

loading X into a register called EAX. So that's the 32bit version of REX. It's loading it in because these are ins 32 bits. It's then comparing X and Y. And then we got a jump less than or equal to. And so if it's less than or equal to, it's going to do a jump to main plus X33. Now, where is that? It's down there somewhere. The nice thing, so I did this. This is a dead listing, right? So I did this on um you can see I have my nice function here called dump.sh. It just dumps main. Whatever mate is, it's going to dump it. And so this is just a dead listing. If you were looking at this in a

disassembler, right, like IDA or binary ninja, it's going to have arrows showing where those jumps are, right? It's going to do the calculation for you and show you, oh, this is down here. All right. Now, we got a loop. And the big thing with a loop is a couple of things, right? We got our initialization. So for those of you who are not programmers, initialization happens the first time. We're creating a variable called I. We're setting it equal to zero. We're checking the second thing it does is check our condition. So if I is less than arc C and arc C is the argument coming into our main. And then when the loop is done, it's going to

increment I by one. So I is equal to I + 1. So I is zero. The next time I is going to be one. Then I is two and three and four. And then we're calling print f with our format specifiers and two values in there. All right. So the things that you will see so here what we're doing we're coming and we're loading some values right something gets zero. Well RVP minus 4 gets set to zero. So that's our I. But then the very first thing you do there's two types of jumps right. This is an unconditional jump which means always jump. So it is always going to jump to main plus 80. Now I did this in the

disassembler and it was nice. It shows you this is main plus 0. So main plus 80 is right there. So the way it optimized it is doesn't do the check at the top but does the check at the bottom. So I go ahead and it's down here at 80. It's going to load our two variables. So it's going to load RC and it's going to load I. So, we're going to do that in two lines right here. And then it's going to do a conditional jump, which means I'm only going to jump if one of these is less than the other. Right? Specifically, I believe it's this one is less than that one right there. So, it's

going to then jump up to main plus 28, which is right here. And then it's going to do the code. So, we can see the jumping, right? And then our one addition is going to be right here. So we're going to add and then this was our I we're going to add one to it. Okay. So you'll see the initialization. You'll see it have a body, right? The things that we're going to do every time. And then we'll see our conditional check, right? So conditions down here. The one line here is to do the add. And then it's just going to keep jumping over and over and over and over again. Okay. You guys are all having so much fun.

learning a cent. All right. So again in god bolt right it provides a lot of nice things. Now one thing to notice is that you know you got a curly bracket here and a curly bracket here. Right? So those are actually you get instructions even though you didn't tell it to do anything. Right? That's because we're writing a function and every function has the stuff that sets up the stack so it can have local variables and arguments. All right. And then we're going to the third thing that we need to do is be able to call functions, right? So we are right here. We're going to call a function. So again, this isn't a debugger. The debugger was nice. It put

in our variable name. Our function is called function test. All right. And then the the um viewer is showing us what that looks like. So this is actually the body of function test and you can tell because this says function test plus 00 0 the start of it whereas this up here says ASM main right so we're going through our function and then we're going to jump into this function. Now what happened was if you look at up here we have this thing called the stack. So on top of the stack we push what's called the return address. So what the f what the program is doing is it actually pushes the address of the next instruction that it

needs to run on top of the stack. And what we've done is if we have a bunch of parameters the parameters go at the top and then we're saying oh by the way to do that function I I need to come back to this spot right here. So we're going to put in a return address and then we're going to have our local ones. And the way that smashing a stack works, right, is you accidentally write and you end up overwriting on the stack a return address. So now when it goes to return, guess where it goes? Whatever the data that was on the stack, right? So that is how we call a function, right? And you will it's

called different things. So in x86, right? The keyword or the op code is call. If you look on um ARM, it's going to be one of a few like we have branch is what they call it and then like branch with link. So branch with link means use the link register in order to store like where we're going. Um so they use their jump and their branch are kind of the same, right? So they have some different op codes, but they're basically the same, right? All of our different processors, they just do the same same thing. That's right. All right. And then I usually don't teach us teach system calls. Um, and the reason why is like

the numbers that you have to use in here are very specific to whatever platform you're doing. Um, so when I'm when you're first learning, right, you don't want to use system calls because they're very detailed. You have to look up, okay, what is, you know, what do I need to put in RDI for this call? What do you need to put in rais you go and look and there will be a table right every OS has a table and it's like this this number is this sys call and it's argument one is this and its argument two is this but you will have to do it at some point if you do any interaction with the OS so you

either use a library to do it right or you do it yourself now if you are writing malware or shell code right you're going to end up writing this right you have to be able to do a SIS call. Um, the good thing is that if any of you have any of these AI tools, they are really good at it now. I did a talk probably 3 years ago at Colonel Con and I had an assembly program and I I said chat GPT like can you call print def and it called print def like it was a C program, right? That's how bad it was at that time. Now that was all generated right by AI,

right? It knows the SIS call number. I went ahead and tested it, right? It has the ability to do this, right? They've gotten so much better at doing code. Um, and so this is how you do SIS calls, right? A second way that you can do it is you can do interrupts. Um, interrupts are really sort of at the processor level in general. There are user interrupts that you can have, but you know, a long long time ago, right? interrupts. I don't know if anybody has used DOSs, but I remember like your interrupt table got corrupted. I mean, I was like 12, but like that was bad when you did that. Like it didn't work for

some reason. So, it interrupts a lot of the processor to be doing something and then say a network event comes in or a keyboard comes in, you can say, "Oh, let me hold on. Let me process this." And then it does it, right? And then it goes through the other things that we're doing before. They're not used a lot in normal programs. Um, but they are used in the kernel. >> All right. Oh, man. This looks terrible on black. Um, so here's an example of a um piece of code, right, that um does if you look at here, right? It's pretty easy to read if you go through and read the comments. So, it's calling MAP,

which is memory map, right? And it's setting the permissions on this to be read and write. So that means I can write data to it. I can read data from it. Okay. So it sets up using a sys call. Um and then I got it's really nice here, right? Copy shell go code into rw page. Right? So we've got some shell code or some code, right? The code is actually right here written in our favorite language hex. Right? All it's doing is moving 42 into eax and then it's returning from there. again C3. I know C3, right? That's a return. So, it's copying that shell code into the read write page. It's then coming in here and

it's changing the permissions. So, it says drop right and add execute. So, now that page can be run right inside of memory. So, that's this call sets that up. And then if we look in here, call RBX. So, I'm calling a new function. It's RBX. RBX has the value of the page um where we have copied that Damon. So let's see if my video works. So what I did here is I just went through and I loaded up this in a debugger and I set a breakpoint that as name. So that's the function that is being called. You can then see that our registers are printed, our stack is printed. go ahead and disassemble and we're looking for

that call to RBX, right? That's a call where our shell code is. That's not going to occur as something that's disassembled. It's then going to set a break point at asm plus 93. So when it gets to that point, I think I then step through. So the nice thing is if you check SI for step, if you hit enter again, it will redo that command. So you can just keep getting hitting enter until you get to the point where you want to. And so it's going to go ahead and just go through um we also have a fun repeat command. So a repeat command does the same instruction over and over again until the ECX counter gets set gets to

zero. So that's a nice function that we have there. So again, it'll go through. Again, these debuggers are really nice, right? It shows you what the code is doing. you can see what the registers are. If you don't understand something, right, you can go through and look at it. >> Let's see. And then at the very end, I just did a continue to say like we're all done. But the result of running our program returned 42, which is, you know, the leaking of life of everything. Um, so that was just a simple example, right? And I I'm sure any of you can do this, right? if you want to. All right. Another tool that you can use. So if you

like JavaScript, I do not, but if you like JavaScript, guess what? You can disassemble things in free, right? I gave a talk several years ago about this, right? It allows you to hook any function, right, in appropriate. So again, I've used this on pentest, right? go through and list all of the functions and say, "Okay, I want to look at all the interesting functions like I want to look at every time it copies a string, right? Or every time it calls exec or um every time Windows runs a new process, right? So you can hook into that and you can do it all via JavaScript." All right. So again, when you start off, you have to really decide this. What is

your risk? Right? Can I run this on my work computer? Yes or no? Maybe, maybe not. Right? If it's just regular software and you're trying to figure out some functionality of it, right? Yeah, you can go ahead and run it. You can do dynamic analysis unless you think they're going to ban you. So then like we can do it. Um, and so you would decide, can I debug this or should I do it all via static analysis, right? If you're doing malware, right, you'd probably start with some static analysis. figure out what do I think this is doing or can I run it into VM and then do I have the right environment to test this thing right I mean there's

all kinds of virtual machine layers that you can do like you can use Q you can name Q right to emulate some other platform if you really feel like it um but I think find it's easiest to use whatever your native OS is and then of course the correct architecture for it right so the Mac actually will compile both. Um I think I think some Windows laptops will comp will run both. Um but generally it's going to be either 64-bit Intel or 64-bit ARM. And then there's a whole another world of embedded systems, right? So those have their own firmware and you'd have to be able to read those. All right. And as I mentioned, right,

like malware now is much more following developer trends, right? Especially with AI written malware, right? they have the ability to say write me a piece of malware that does um ransomware right and write it in Rust and don't obiscate it but make sure it's a small binary right they could do those type of things and so if it is written in rust or go which is sort of the new um hotness right you're going to have to use additional plugins right if I run a program in C strings are no terminated if I write it in Rust they're one length right so the first bite is telling you how long is the string and then you just

have a sequence of characters that goes on forever, right? So you have to understand um what the different patterns are and the thing is you can do it yourself, right? So you can write what I heard somebody call once is your own system, right? So you can write your own version of hello world in Rust and then say, okay, is my tool loaded? Do I understand what it's doing? Right? That's the whole process of reverse engineering is you build a thing and then you say how does this work and you say okay now I understand this and then you build a new thing and say how does that work right and you build a new

thing and a new thing right and that's how you that's how you learn right when I started I didn't know I didn't know any more than you right I took an assembly class once and it was like 16 bit assembly and like you printed out characters and that was it right so you guys can learn this I know you can And the great thing is that there are plugins, right, models that can run inside of your reverse engineering tools. So for example, there's one in D term, right? I was teaching at kernel con a couple weeks ago and there was a function and the students spent 20 minutes looking at this function saying, "Okay, this is like it's modifying a um

registry entry right on Windows." I went in my NBC MCP prompted claude and I said what does this function do and it said oh this sets a bunch of registry keys right like instantly did it versus 20 minutes right in order to look at it um so binary ninja has a cloud version IDA pro has some tools as well so like these can help you on your journey but I think again as the as the first uh speaker said right like sometimes you want to learn the hard Okay. And that'll make you better when you get into the things that are harder to do.

>> I do like dad jokes. My my favorite one is What kind of pants is Mario wearing? >> Denim. Denim. Denim.

Dr. Matthew Miller - Reverse Engineering 101

Related talks