0xFEEDFACE: Mach-O File Format and Binary Obfuscation

Name: 0xFEEDFACE: Mach-O File Format and Binary Obfuscation
Uploaded: 2018-10-15
Duration: 34 min 50 s
Description: Kamil Borzym explores the Mach-O file format used in Apple platforms and demonstrates techniques for binary obfuscation. The talk covers how executable files are structured, the role of load commands and symbol tables, and methods to obfuscate code at the binary level while preserving functionality—

BSides Warsaw · 201834:501.7K viewsPublished 2018-10Watch on YouTube ↗

Speakers

Kamil Borzym

Tags

CategoryTechnical

TopicMobile Security Reverse Engineering

StyleTalk

Mentioned in this talk

Tools used

Hopper LLDB

About this talk

Kamil Borzym explores the Mach-O file format used in Apple platforms and demonstrates techniques for binary obfuscation. The talk covers how executable files are structured, the role of load commands and symbol tables, and methods to obfuscate code at the binary level while preserving functionality—comparing source-code and binary obfuscation approaches with practical examples from iOS development.

Show transcript [en]

Yes, Allegro recruits. I've heard about it too. Hi, my name is Kamil. I'm an iOS developer at Allegro. Today I wanted to show you a bit of Hex. Some of you have already noticed that it's not in Hex, it's written down here. Let's say it's a system based on 36, and everything will be fine then. Z is 26, right? So plus 10.

In the first part I will tell you about the MACO file format. It is a format of files used on Apple platforms to save executable files and libraries. And in the second part I will tell you about the obfuscation. What these two topics have in common? You will find out about it later, or maybe you can guess it yourself as I will tell you about it. Marcin was talking about how Jot is pronounced. I'll start with that too. How Mako is pronounced. Very similar to the main source of energy from Final Fantasy VII, but apart from the name, it has nothing to do with it. It has a lot to do with the Matrix movie. We can see here a random stream of bytes. If Neo were standing here,

he would say he sees some people, cars, buildings. Now I would like to introduce you to a super short, express course of Hex for a young, beginner Neo. Our matrix will be Apple ecosystem. I will focus on Apple, but if any of you have never compiled anything, started on any other platform, all concepts are very similar, whether on Android or Windows. And now, the text from the American movie "The Closed-up". Let's stop, let's sum up and sharpen the picture. Most of such binary data formats usually start with a certain value set from above. This is the so-called magic number. Some of you may have noticed that the first four bytes are very similar to the title of this presentation, but written in

reverse. What is the reason for this? It's because of something like Little Endian. It's a system of writing that the least significant bytes are written first. So if we want to read the number of four bytes, we read from the end. In the upper part of the stream we will see the "fit face", but there is an "f" and not an "e" at the end. I said that this is a set number from above, but how can we know what are the set numbers from above and what does this set number mean? Everything is written in system headers. If someone has a macOS, it is enough to open one file in which all the information is. And there he will see, here is a

cutout from this file, he will see the mapping that lists all the magic numbers. The title of the presentation is "Fitface" - a magic number that marks MacOS files on 302-bit platforms. "Fitface" with an "F" at the end means MacOS files for 64-bit platforms. In these head files, apart from define files, there are a lot of English comments. It's really nice to read. And when we read it, about these MacOS files... It's probably from Next. I haven't been to the world for some years, so I won't tell you. Okay, reading about these MacOS files, we'll see that all MacOS files start with the so-called header, MacOS, with such a header. It's not written literally, but the structure is simply

added in C and we can see that it starts with the magic number, as we've just seen. Then we have processor type, processor subtype, file type, number of so-called "load commands", which I'll tell you about in a moment, size in bytes taken by these "load commands", some flags, a reserved area, which is probably not used for anything so far. To figure out what is where, let's color it. It's not visible on the projector, but we remember what was there. We see some numerical values here, it doesn't tell us much. The processor type is 107, not 107, maybe easier. The file type, here we see 2 written in this Little Endian in hex. All of this is listed in

the header files. If you read them, you will find appropriate mapping. All of this can be replaced with readable values for a human. What is written here? We have a 64-bit Maco file compiled for Intel architecture, 64-bit, for all processors subtypes. It is a executable file, an application. It can also be a binary file, or we have other types of these files. We have 20 load commands here that take up to 20 bytes and the rest of the information. Let's move on with parsing. I've talked about load commands. What does a load command look like? Reading the system headers, there is always one file. We'll see that the basic form of a load command looks like this. There are four bytes, which have the type of

the command written on. Then there are four bytes, which tell us about the size of the load command. So if we color it again, we will see that on 4 bytes we have type of command 19 in hex and we see that the length of this command is 48 in hex bytes. So we color it white so that we get 48. And then we repeat it. We have a command type 19 again. Here is already 228 bytes in hex. And we do this until we read 20 of these commands. As you can guess, after this type, that is, after this red field, we recognize what it can be inside. And here we see an example cutout from the

list of load commands that we have available. The first, the most important one, is the load command segment. It tells us that some part of the file is there, so that it can be mapped directly to some place in the operating memory. Another load command, uuid, for example, does not map anything. It contains only an identifier of the entire file. The load command, airpath, tells us where to look for dependencies. So when we have some libraries and we put them all in the framework catalog, This is how it usually looks on iOS, then airpuff will contain a path to this framework catalog. The DYLD info load command, I will tell you about it later. I just wanted to show you that

there is a whole bunch of different load commands. If we want to go deeper, to parse, we have to read this one header file. We won't do that, because as you probably already felt, it's quite difficult. You have to read a lot, and think about a lot of mapping. I'll just give you a hint that there are still some real data here, I'll just replace these non-zero values according to the FASC code. And here you can see some page zero, some text. You'll see in a moment that when I'll parse it further, it'll all be there. LoadCommands are parsed, there is only a section of these commands, because not everything will interest us. I would like

to discuss some typical things that can be found. Here indexed one, but I numbered from zero, so the second loadCommand segment. It's a simple load command that maps read-only memory. There are some consts or our code is usually there. On such a byte range. So the file from the very beginning to the thousandth byte in hex will be mapped to some area in the operating memory. The segment is divided into sections. These are logical parts and each contains a little different things. The text section contains, for example, our entire code, the entire assembler. Other things like the "load_command_main" as you can guess, because it's just a pointer and these E30s that you see here, it fits perfectly into this

section here. which is the area where the code is executed. The main load command is just a pointer to the main function. So when the loader reads the entire application, all dependencies, and starts it, it knows that the program should be executed here. Other load commands load dynamic library. So I will use such dependencies, such libraries. Please read them to me. I will use them later. We have done it all manually. I shortened it a bit. You could feel what parsing is all about. But not to do it manually, because it is a quite tedious process. We have such a tool available on macOS - Outool. with the -L switch on, it will show us exactly what you

saw on the previous slides, i.e. the header, the load command. If we add -V, it will nicely replace all these magical values into such readable things for a human. Okay, you can consider it as a list of the content of our file. The header, the load command, and the green one at the bottom, it's all the meat left, i.e. where our proper program is located. How to use this content? For example, let's look at the C-string section. As you can guess, it is a section that contains all constant strings defined at the compilation stage. And when we jump to the file, to this range of bytes, I don't know if you can see that it is brighter here, this is the C-string section. It is quite short,

because it is just a demo program for the purpose of this presentation. And if we change the bytes here according to the ASCII code, we will see the woman in red, just like Neo when he was looking at the matrix. You could feel that jumping by bytes is quite uncomfortable, so the system makes this command available to us. In the same sense as before, but with the -s parameter. Here we give the name of the section, the name of the segment. For example, let's display the Objective-C MetaName section, as you can guess, it's a section that contains the methods' names for Objective-C language in a system calculator. Let's see what's inside. Adding "-v" we can change it to a readable human form. Here we can

see a bunch of strings with zeros: init with frame, set transparent, These two dots indicate that the method is taking a parameter. Let's look at another section, also in the system calculator, Objective-C className. As you can guess, the section contains the names of all the classes used in this calculator. Here we see bit, field, box, calc, button, cell again, things related to the calculator. What I've shown you here are just strings listed with zeros. It's not complicated at all. There are many other sections in the binary file that describe that you have such classes, that these classes have such subclasses, that there are such methods, such properties. It's all so complicated and complicated. It's interesting, but I won't talk about it now, because it doesn't matter

in the next part of the presentation. I will tell you about equally interesting structure, which will be important later. LoadCommand.dyldinfo, as I mentioned earlier, is the loader that reads the program to memory. It will use this section to find dependencies for our application and later dependencies of these dependencies. If you want to use a printf, for example, the library has to show you the printf so that you can use it later. In MacOS files, it is done through an export try. It is not a tree. They specifically called it differently to distinguish it from a tree. If you are looking for some values, you will have a number in the keys, for example, and it will always be the whole number in the

node. So you will compare yourself, walk along the joints of this tree and you will find what you are looking for. And try is specially constructed to search for values after a very long key. It would be pointless to compare a very long key every time, so there are only fragments of such a string in the nodes. If you have a program in Objective-C and you want to call a method, then in Objective-C, a function called c::ObjectiveCMessageSend is used. So, linker will go through nodes to this place, complete the entire name of this function, and here it sees a pointer in the code, that this is where the assembler code of this function is located. You could explain it

like this, but thanks to the fact that it's a tree, it's ultra-fast when you run the program. So this section shows export, but on the other hand, someone would like to import it, someone would like to say that they would like to use such and such symbols. So on the other hand, in a big simplified way, there will be a list of bindings. will be asked which library it is from and what symbol it wants to use. If you want to read about this trial, there is a nice article on Wikipedia. I will now finish talking about it. As you can guess, it is just a small top of the iceberg, what I told you, because there is

much more information there. You can probably write books about it. The good news is that that what I'm saying, parsing Mac files, all the information is gathered in this single header file, which is located on each Mac. I've been showing you some code fragments for this file all the time. There's a lot of documentation there and it's really nice to read. If someone can't fall asleep at night, you can print it out, put it next to a pillow and then they'll definitely fall asleep. OK. Congratulations. You've passed the express course of seeing a hex for Neo. I'm sure that now each of you, if you look closely at this screen, should see a woman in red at some point. Can anyone see? Can

anyone see? You have to read a little more into these headlines and start moving. OK. I told you about the Flickr Maco format, it was the first part of the presentation. Now I would like to tell you about obfuscation. Obfuscation is a process in which we take something and mix it so that it still works, but if someone wants to analyze it statically, they will not know what is what, they will not understand the topic. To give an example, Here is a function with one type of obfuscation. Would anyone be able to tell me what this function does? I've been mixing it up on purpose. I hope no one will say anything. Now I'll show you the version of the function with the

obfuscation. And now? Yes, it's powerful. I just wanted to show you what this type of opus is all about. It's control flow opus, so the flow changes a bit, the program goes down, it gives the same result, but everything is mixed up and you can't really figure out what's going on. This type of omission is problematic with the omission of a source code. Usually we use maximum optimization and then it may turn out that this code will end up as an identical binary. I think I heard it in Malware a year ago on the B-side. I don't know where else. In the case of business applications, which are written in corporations, where there are many objects, object programming, this is not used. Such obfuscation is used.

Here the control flow is clear: some calling of properties, some = and, calling of method, but it is not very clear what is under the bottom. If we omit it, Someone who was to crack a program would take a long time to find a place to change the condition that the license is correct. This is the symbol obfuscation. There is also a third type of obfuscation. This is an obfuscation so advanced that it is indistinguishable from magic. I won't tell you about this obfuscation today, but you can watch on YouTube the obfuscation with which the Hopper decompilator was used. The most popular Hopper decompilator on macOS. As a reminder, if you try to decrypt Hopper, but not with

Hopper, because Hopper is not possible to decrypt by itself, it is secured, but anyway, in this section "text", where the code should be performed, you will see some nonsense that makes no sense. I mean, maybe they make sense, but you will go in the wrong direction. It turns out that there is a load command at the end, which has 12 MB of data. It xor this data with the text section during the hopper's operation and there the real hopper magically appears. As you probably guessed, at the beginning I told you about macOS files, now I'm talking about obfuscation. I wanted to show you an idea I had, I didn't find it anywhere on the internet, so I'm saying that I had an idea.

It's a buffer that blocks not on the source code, but on binary files. And to show that it works, although I had nothing yet, I signed up for Beside. And I managed to put a pod on GitHub that already works. So, as Mateusz said yesterday, please like, share, and subscribe to the GitHub page. There is also a readme that tells you more about this project, so I encourage you to check it out. Now I will tell you how it works. MachOpuscator operates on the sections I mentioned earlier. At the top you can see the two that I showed you in the calculator, i.e. the class list and the method list. As you can guess, it's quite easy to understand. What does it give

us? Such tools as classDump, which is a tool on Mac that serves to throw the structure of the entire application, i.e. lists of interfaces of some classes, methods, It is readable by the programmer, so the programmer can throw it away, which is not known in static analysis, and see how our app is constructed in the middle. Hopper also handles these sections well. Here you can see the fire near two sections. It means that the offuscator fills them with zeros. Is it safe? These sections are used by the SWIFT programming language. This is a new language in the Apple environment, not so new, because it's been five years. These sections are used by the Mirror structure. We can create a mirror on

any object and list its properties. This is generally a replacement for dynamicness, because there is not much of this dynamicness in SWIFT. It turns out, even from the analysis of the source code of this mirror, that if we fill in zeros here, everything will continue to work. The mirror will simply not return as much detailed data as we would like. I have not met anyone who uses a mirror in their applications. If someone uses it, the application does not crash. Maybe in the next versions of the Mach-Drop-Tor, these sections will also be possible to drop. I'm just talking about the current state. The dyldinfo section is the one with the tree and the list of bindings. We have to be careful here, otherwise the program will not

run, so we have to do it precisely. It can be done. It turns out that few tools for static analysis recognize this section and use its contents, but if it can be done, let's do it. And here I wrote that this is the Dynamic Loader, a program that reads applications to RAM. It's the only one that uses it. There's also something like SimTap. On other platforms, it probably has a much greater importance. And on macOS, a few years ago, this section appeared, dyldinfo, and it is responsible for running our program. SimTap is mainly used by LLDB, the debugger. If we want to set a breakpoint, we write "breakpoint" and press "tab" to make it match the rest of the

name of the method, then LLDB will extract the names of all methods from this section. If we reset it, then LLDB will not tell us anything. And Hopper will also not have anything to handle, so it just pours. As if storyboards. These are... The interface can be created graphically on the Apple platform in XML format. They compile into a binary form. The obfuscator also discharges this binary form and puts it together in a form of an obfuscated form. What happens after you launch an opuscator on an application bundle? Because an application is not only an exit, but also a resource and an interface. The opuscator finds all the files that can be executed, for example, there will be only one here - mysuperapp.

Then it searches for all the libraries of dependencies, then dependencies of these dependencies, and so on until it finds everything that Dynamic Clouder could read. When it runs out of things to read, it means It should be red, but I didn't know it would disappear. I don't know if you can see anything. Anyway, it turned red. It means that these inscriptions are things that can be omitted because they are in the bundle of the application and we delivered them, so you can modify them easily. Oh, great. Now it's a little better. Next time I'll use brighter fonts. Red ones, which are in system catalogs, are read-only, we cannot modify them. It extracts the names of symbols from these files.

Here, for example, I showed the names of methods, some typical init, here some init with unicorn. These are names extracted from bypassable and non-bossy files. And now, unfortunately, we have to remove those that were once red, because Objective-C has something like this: if a method name appears at least once, it does an interning. For example, AwakeFromNib is a UI-related method. If you create a class, create the AwakeFromNib method in it, although it has nothing to do with that method, we can't use this method anymore, because we will destroy the entire interface. So we are going to remove it. This one was supposed to be grey, but it turned white. We have few things left to remove, but if

we look at it, these are the things that interest us the most in applications with business logic. Retrieve orders, clean card,

Now I would like to compare two types of obfuscation. The one that obfuscates the source code, the one that can be found on GitHub, and the one I showed you here, the binary obfuscation. I'm not saying that any of them is better, each one has its own risks, each one has its own advantages and disadvantages. And so on. The binary obfuscator is language agnostic. If Apple comes up with a new programming language in a few years, this obfuscator will still work. Maybe it will obfuscate a little less things, but some pull request and everything will start working. The source code obfuscator, if the language changes, you probably need to write it again, you need a

new one. The binary obfuscator obfuscates external dependencies. So if we add to the app some libraries like Google Analytics, something like that, it can all be obfuscated. The binary decoder does not affect the desymbolization. What is it about? If you have an application on your phone and it crashes because someone pulls out a pointer from zero or divides it by zero or something like that, then a stack trace is sent to the backend, but not in the form of such nice a string stacktrace, but it's just a list of integers. Integers are pointers for some functions. And no one has a magic decimal, a dictionary file that we use to translate these numbers into the stacktrace that is readable for the user. No one

will read it. So binary buffer does not affect the decimal at all, and in the case of source code buffers, you have to do some gymnastics to create some translators that unbuffer stacktraces. Binary buffer has a few flaws. If the structure of this MAC file changes a bit, then you have to sit there and think about what to change to make it work again. Someone may feel insecure about zeroing bytes in a file and not knowing what's going on. Here we can omit the source code and we haven't done anything like that. Everything still works. I'll tell you one more anecdote. I don't know if you know what it looks like to release an app to the

App Store from iOS applications. We send such an application to Apple. and somewhere in some country, somewhere, there is a reviewer who takes such apps for the workshop, he starts them, clicks them, to see if there are any naked photos or something like that. He also starts a simple static analysis, he sees that we are not trying to hack the iPhone or something like that. This process can take a few days and it can end up with Apple rejecting the update of the application. and will refer to the second, third, fourth, first paragraph of a very long article and will say that we will not let you into the store. In the Allegro application, Apple once rejected the update, saying that the application contains hidden,

undocumented features. And they wrote to us that it would be nice if we removed all code omissions, all the darkening and launched the app again. I can't tell you what this story ended up with, but you can probably guess it yourself. I want to show you something that the Mako Fuskator does, which would allow us to go over this problem a bit. MacOpusCator contains a list of 10,000 words most commonly used in English. It just pastes these words so that they match the usual names of methods. If you look at it from afar, it looks like a pretty good list of methods. But if we read it, it's just dog poetry. It probably doesn't make much sense in

a banking application, for example. DTO, for example. Bravo, I'm going, right? OK, and some conclusions, because there must be some conclusions after each presentation. Maybe it's worth taking an interest in parsing these files, because as you have seen, I actually focused only on the omission, but surely each of you would have a lot of other ideas on how to use this parsing, for example, for some code complexity analysis, that, I don't know, some class is maybe too long or something, or or some security research in this application. Everyone can have their own great idea. This is the file I was encouraging you to read, if you want to see it, I encourage you. Here is a link to GitHub,

I'm collecting likes there. Thanks. And that's it. If anyone wants to talk about obfuscators, obfuscations, you can now ask questions or you can reach me on Twitter later. And thank you very much. Any questions?

Wait, wait, wait! We won't repeat it. Oh, it's you! Yes. Some younger guy. A few years. A silly question. If a guy would run an app where every name of a method or a class would be just "fuck" and numbered, does he think it makes sense? Is it an imposition on us to do it? I have a friend who works in Wroclaw in a bank. He told me that there they call getters and setters setcost, getcost. I can imagine that someone who operates in Mandarin language for example, that he also calls the method function in Mandarin. I can't imagine that then Apple will check that there are curses in Mandarin. I think they see something similar to Base64 and they think:

"Oh, this is random, it stinks, it's hidden." I haven't checked it yet, I haven't released an app with a hidden message, but I'll probably do it this year. So it's probably worth avoiding English. Well, if the attacker doesn't speak the same language as you, then yes. I don't think we have any questions. Okay thanks.

0xFEEDFACE: Mach-O File Format and Binary Obfuscation

Related talks