← All talks

ELFie brudne sztuczki

BSides Warsaw · 201628:432.2K viewsPublished 2016-10Watch on YouTube ↗
Speakers
About this talk
Autor: Maciej Kotowicz
Show transcript [en]

I won't prolong it, because beer is probably more important than some security or pseudo-intelligent research. Okay, let's start the last presentation of today. From Lufka, sorry. What am I, the Polish government? I'm not saying no, but it's cold next time. Okay, let's start. My name is Maciej Kotowicz, I'm a researcher in CERT Poland, I play in Dragon Sector team, I'm working in reverse engineering and exploit development, especially in automation of higher levels, or I used to work in formal methods in higher levels. I have a Twitter account, where I can talk to you. A few questions. Let's start with finding some information about what's in this title. I don't know why this is the first picture I got.

I wanted something better, some girls or something, but unfortunately the most popular dirty elf is Dirty Elf, who does weird things, weird things, other things like, I don't know, moving toothpaste and so on and so on. Google it, really cool pictures, damn, people have ingenuity. I would never fall for it. Okay, let's go back to the real one. Who analyzed some elf? Let's go back to the previous one. Who knows what an elf is? Oh, damn. Okay, so who analyzed some elf? Not bad. Who analyzed some elf in malware? Not good. Okay, let's go back to the previous question. What is an elf? One answer, so that she would tell us that we are not talking about Tolkien, for example, or Sapkowski. Oh,

good, good. Can I hear something further? No? It's not working. It's not working, right? Well, good. We're on the right track. Okay, who wrote their own Elfie parser? Maybe the next question doesn't make sense anymore. But that's good, no one has to leave. This presentation should be good for someone. Okay, let's talk about Elfes. Short of executable, linkable format. As you know, every binary format is best described by the C structure, right? Why not? So let's define some types first, so it's easier to read. Can you see it all? Or do I have to read it? You can't see it. These are just some names for some typical things like: win16, 32, 464, with or without a sign.

I can explain, no problem. But after 20 minutes I need another shot, because I'm turning off. Win16 is a type of data representing a 16-bit number without a sign. I encourage you to read more detailed explanations. Some basic school? Some types of addresses, offsets, whatever. Let's start with the whole elf. The first field we'll encounter is the header. Like in any file, right? Every file starts with a header, unless it's fucked up, then the header is at the end, like in ZIP. Whatever. The header looks like this in a very simple way. Can I do it? Oh, I can, great. Here's this magical few bytes that say it's an elf. This is... Oh, we'll do the first contest. Who will give me the first four

bytes of elves? Three? No, I want four. You're not counting. I'll find something, I don't know, some beer or something. I forgot the work gadgets. There were some people who knew Elf. I've heard 7F, 4, 5, I don't know. It could be in ASCII signs. Okay, let's assume we know what it's about. In '39? Damn. What was going on in '39, man? They didn't organize such conferences. Anyway, there are four first bytes that say it's Elf, it's 7F and the Elf sign. I don't remember the mask. And a few more information about how to parse it further, like bit-ness, endianess and a few other things. There are 16 bytes of this shit. Some things that identify ELFs further, like entry point, where we will start

making code. version, machine, on what architecture the processor can be turned on, offsets to the next fields, flags, f**ks, f**ks, things that are useful for parsing. And of course, because we are in the modern world, we have two architectures, usually 30 and 20 64-bit, in this case we have ELF 30, 20 and 64, which probably do not differ in anything, as far as I remember, in module size, right? Probably nothing. Okay, the next structure. The structure is the Header program and this is a series A table of such structures, whose offset is here, here is its size. And it tells itself that some segment has a type, some segment has an offset in the file, somewhere it is mapped under a virtual address, a physical address, which does not

matter, because it is not true. It has some size in the file, it has some size in memory, it has some flags, alignments, such bullshit. The same goes for 4-bit systems, of course. I think the same, no, there is some change, which is often fucking. Let's go back. It's a bit of a mess because the fields are changed. Well, because it has some types, these types exist. Of the important types, oh, it should work, right? There's the Load type, first of all. One more time. The Load type is important, because it says that a given segment should be loaded into memory, that a given segment is the header program table, and the dynamic, which is interesting from

other things that will come out later. The second table is a section header. We have segments, we have sections. It's always better to have more than less. There are similar things here. Name, type, flags, address, offset. Basically the same, but a bit different. A 4-bit network as usual. And there are types. As you can see, there are more types here. So what do we have here? Hush table, GNU, which allows us to reserve DynSym symbols, which corresponds to PTDynamic, which are the symbols that will be imported to the binary. Dynamic, which corresponds to PTDynamic. Table of symbols. Table of strings. And the program data. Plus, as you can see, there are many more, there are some three dots,

I don't know if you can see them, but there are three dots. Saying that there is a lot more. Next, because I've already mentioned Dynamics a couple of times, let's talk about what Dynamics is. It's a table of another structures, surprise. They have this type of type, they have this, they have a value, depending on what type it is, it's either a value or a pointer, the size of the terabit is not different from the module size. And DP, of course, another table, another types of important reminders, this needed, got, hash, what would we explain here, for example, This is a table of... how to say... strings, wow... Symbols, hashes... You have to do it someday, right? Things that are... addresses

of strings, libraries that will be loaded into memory with our program, for example, in July. PLT, GOT, whatever... Symbols, we'll skip them, we don't want to talk about them. Ok, so let's start with some dirty tricks. First of all, why? Because malware analysis, because we will be dealing with it one day, is easy when it comes to Linux. Because malware that appears on Linux is malware for IoT devices. IoT is shit, so malware is simple. How to get a big botnet? Someone tried to get a big botnet, or I don't know, tried to... create a honeypot on iOS, Linux, SSH servers, I don't know, whatever. No. It's sad. Generally, the whole Internet is full of Chinese people, children who brute force themselves with

routers, SSH servers, Telnet, whatever, whatever, Bash, Shashok is one of the most popular methods of getting to servers. So malware is the main access method. It's mainly used for DDoS. For example, recently there was a situation where a new malware, Mirai, attacked Brian Kerbs and they were shooting 500 terabytes. I don't remember exactly. Maybe my friend is nodding his head, he doesn't remember. But generally a lot. They said it was one of the biggest attacks, then there was a war because everyone had a bigger attack. Someone was talking about comparing my DDoS. Why not? How much can you compare the same thing that is on the bottom? So this malware is trivial, but probably not only this malware exists, because

Linux servers are like a ball of a ball. because we have a lot of communication going on, there are a lot of services. There was a theory that almost every server is bugged in some way. There is no such thing as a proper paranoia. But this model is simple. But it doesn't have to be. It can be complicated, there are some tricks that will make it difficult for you to live. Unfortunately, they are based on the fact that we exploit certain problems in how ELF is built. And now we'll talk about these tricks, how to make it difficult. It mainly occurs on CTFs, so it doesn't make any sense. The first thing is the difference, I

don't know if you noticed, I said that there are two structures. There are segments and sections. There are many sections. I think it's easy to notice that these are exactly Almost the same things, it's quite obvious. And they describe exactly the same thing, but from different points of view. And about that in a moment. Such a simple... ...fuck... ...simple building of ELF. We have this header, table PHDR, SHDR. And they indicate one another. They indicate the same segment that will be loaded, or that it is the same... segment with ptload type will be loaded and section.text which contains code that will be executed. This is the definition. This is how it should look in memory. We also have a segment,

it is indicated first by the offset in the file and here is also offset of course. In memory it is address, address and size. Size is not important, but this is true. The moment when the PIC is loaded into the memory, the situation changes a bit. Because the SHDR is not used in any way by Kernel, it is only information about certain properties, more debugging than the necessary ones to be made. It is only something that makes it easier for us to access the kernel, because its granularity is much larger, because there were many more types of sections, than segment types, so we have much more emulation. We don't have to go through one table to get to

the other, we know that this table is here. So it's useful for debugging purposes, if we want to do an offline analysis of ELF, to find out what libraries this program will need, what symbols it will solve, what else can be interesting, some such bullshit. But it's not really for execution, because we only need to have information where the data is supposed to be located, what are the addresses, what are the sizes, what are the permissions, which I didn't mention, but there are of course permissions for each section, it could be reading, execution, writing, appropriate combinations, as you know. So there is no such thing as HR tabs. And this causes some problems, I don't know if there is another one, no, there isn't. The problems are

as follows: almost all tools use, of course, what for analysis? Shdrs. Which are not related to the truth. Let's try to use it. Let's consider a program, I don't know, some kind of... Let's see what it looks like. It has a text section, it has some address, some size, some offset, some size, some law, comparison. It's the same address. Let's see the difference between the size. There are two or three other sections with similar properties called init and finit, but it doesn't matter. But as you can see, it is included in this loaded segment. Let's look at a simple shell that does nothing, basically only writes string bet and a piece of code. And what is it about? It

is about using... And now let's see. There is some size here. Each section Every segment of alpha is loaded with syscall mmap for simplification. Who knows what property size they have with such syscall? If you can't hear anything, you can hear the word "page multiplicity". Second question: how many pages are there? 1024 bytes, right? No. More. OX 1000 bytes. This is less, evidently. Can you see it or not? OX0... fuck, I'm not going to make a mistake. 6EC generally, skipping the leading zero. So definitely less than the page. So what will be done? More than what is described here will be done. It will be equated to the page, right? It should even be drawn somewhere here. Here is

such a nice align. So we have an undeniable amount of space between 6 and 1000. It's a bit crazy. Let's put a sharecode under this place. and we write it should be entry point of our program, that is the place where we will start to make the code. We write it with our shellcode. This is entry point of the original program. There should be some execution, but I think it didn't work. Well, whatever. And it's very nicely executed. The problem is that it will not... Just a second. Whatever. It will be executed, it will be written out first in bet, then in good, The problem is that no program will show it. I'm already saying why. Maybe it has changed now, because before

I created it, the update to AIDA came out. ObjectDump doesn't look at it, it's in the ass, it only looks at what's in the sections and follows it. Radar will show that our entry point is actually in this place. And that there is a code, because it can do it. AIDA refuses to show data from from the address that is not in the sections. It was improved in the latest edition, probably by the end. This bug has been going on for years and years and years. A few more. Okay, so let's go back a bit. Who is here more than two years, three years? About these events. Okay, maybe so. Who remembers this CTF? Aha, well, that's bad. Okay, then I organized the CTF

and there was such a task that was based on it, among other things. I thought there would be something else here. Of course, this was just a simple example. We don't have much space for it, we're just adding a shellcode. But let's think a bit further. If we have this system, nothing will prevent us from making this PTload completely independent of this text. So that it's completely somewhere next to it. Why not? We don't care what the analyst will see. And that was the task. It wasn't exactly independent, because it was more clever. They were covering each other, one smaller, the other bigger, and a piece of code was invisible if it wasn't debugged. Of course, there were some difficulties. But it was a

nice task, I've already burned it twice, I won't do the third one. A bit of different techniques, it's a nice one. It requires a lot of work, because It's not that easy to use for example C. You have to do it only in assembler, build LV from the basics, which is very annoying. I don't know if anyone does it, if not, then... It can do it, but I don't recommend it. Another technique. The formula in which symbols are disassembled is that there is... Let's go back a bit, because it's jumping around the slides, it's a nice fun. There is a symbol. The symbol has a name. What is the name? The name is int. int is an offset

in this table. And this table is located under the address that is under this dt. But it is also here, of course. There is some table of strings. Let's go. Let's look at a simple program. The program doesn't do anything interesting, it takes the first argument of the execution and displays it. We compile it, we run the magic script, we run the object dump, everything is fine, there is a print. After the execution, does anyone know that print does such things? What should print do? Display ID, not any values. From the ass? There should be... Here is the script that magically does it. Again, the idea is similar. We have two... The values that I should mention in the same way

are not validated. In the PTDynamic table we have the "dts-string-tab" type, and we also have the "sh-stab" type. What we do is to replace the value in dt, which is important, and which is the source of the program's execution, because that's what LDSO uses. We replace it with string system. It's nice because it has the same length, so it's not difficult. We add a new phdr and so on, we save everything nicely. There should be a picture, but I don't have the internet. It turns out that two or three weeks ago the only program that could recognize it was Binary Ninja, written by Vector35. All the others were printing very nicely. Fortunately, a while later there was a terrible storm from Radar's side, they fixed

it quite quickly. Aida released, literally a few days earlier, a new version that... In six years they fixed this bond? Maybe even more, maybe 20. I'm almost 20 years old. After 20 years they fixed this error and it doesn't work anymore. But it's still a nice technique. What else do we have? And of course this is not the only option we can use. If we go back a little bit, there are quite a lot of these types. We can replace this type, this one, this one, this one, almost every one of them is a pointer that we can replace, which has its own has its own equivalent in the sections. The problem arises when we change the whole dynamic section,

because then, for example, the riddle itself starts to scream, because people are clearly more aware of it, that it is possible and they start implementing such checks. And this is my favorite trick, a trick that has its roots in some frack, I don't remember which one, but really long, long ago. I'm not its author, unfortunately. One of the things that is in the Dynamics table are the names of libraries that will be loaded additionally. Here is a devised program written in Haskell, you will have a lot of libraries. Some of them are some names, the system will look for them in the right places, everyone knows how it works. And there is a field that always appears. It has a debug type

and a value of 0 and it does not fit anything, it is unused. Let's change them. We can add some value. It is best to change this 1 byte, speaking from the debug type to the needed type and some random value at the beginning of this string table to indicate to us an offset, a piece of offset more precisely. Now we can load any library during the program. Simple code. I went too fast. Most of what I showed here is described in much more detail, in much better form in this little book, which should be published soon by these authors. You can buy it under this address. And that's all for me. I won't take up your time any more. Let's

go for a beer. Any questions? And I like it. I would need to hear you, but wait for the microphone, we have time. I can go back if you want to take pictures of these short books. Have you ever met a malware that uses one of these tricks? No, there is no such malware so far. I admit that there is no one here who wants to take pictures. Why not? So, no, there are no such things. Maybe they are on a really low level. If someone writes some serious, decent rootkits, which, if there are, will not be found for a long time. Maybe. No, there are no such, fortunately, art anywhere. These are purely, at least, I hope,

purely recreational pieces, which enable the Elf format. That's it. No more questions. Let's go downstairs.