← All talks

How to fuzz?

BSides Warsaw · 201643:04670 viewsPublished 2016-10Watch on YouTube ↗
Speakers
Tags
About this talk
Kamil Frankowicz walks through fuzzing techniques and workflows for finding bugs in open-source projects. The talk covers fuzzing history, intelligent instrumentation approaches like AFL, practical optimization (corpus minimization, dictionaries, sanitizers), real-world case studies from PCRE and other projects, and how to conduct effective fuzzing campaigns on limited infrastructure.
Show original YouTube description
Autor: Kamil Frankowicz
Show transcript [en]

Sure. Hi everyone, thanks for coming. I'll give you two minutes to get ready so that everyone can come. I'm surprised by the number of people that in the morning at 10 after the after-party and half of the room is there, so it's quite ok. I expected a smaller frequency. Ok, 10:02, let's start. Today I would like to tell you a little bit about fusing. Who fuses in their daily work? I don't know, maybe even once every three months. One, two, two people. Ok, if you put it on the frequency, it's not bad. And who is a developer and fuses? One person. Very good. I'm impressed, because developers often forget to write their code. It's not about full time, it's

about writing your code. You write something, you check it, then you test it. My name is Kamil Frenkowicz, a year ago I talked about home router bugs, maybe someone remembers me. I work in CERT Poland, generally through my own hands, malware and bugs fly by. I'm looking for bugs in open source projects, I'm fusing them, I'm reporting something, I'll tell you about a few in a moment. I also write a blog, you can find me on Twitter under the nickname "Fumfel". The blog is rarely updated, sometimes I just post posts from the CERT blog, so So, I have to admit that writing is not that good for me. What should I talk about? I should remind what fuzzing is, how

it looks like, because I think not everyone can know how it used to look like, how it looks like now, what problems we have, where problems appear, something about what solutions we have, that we can use in our projects for free. How does the workflow look like? It's not just that we turn on one program, we give test cases, fuzzer and go. It's not that easy. And a few looks from the last three weeks that I managed to do in open source projects. Who doesn't know what fuzzing is? Zero, zero, not bad, not bad. I'm not a big fan of it yet, so no, no, no, let's say it's community for community here. I'm learning from you, you're learning from me, so... So, of

course. The concept is quite old, because... More than 23 years, actually 33 years, sorry. 33 years on Mac OS X, the Demanki application, Dumpfusing, which means throwing any test data into the application, mixing it in any way, generally RAND from RAND and after RAND. But it worked somehow. It seems simple.

Yes, there are examples of pitchfuzz. Someone used pitchfuzz, because it is a commercial fuzzer. I didn't have the pleasure, I don't know how it looks. Later, something like template-based fuzzing appeared. It is fuzzing of specific formats. You can use HTTP through Metasploit module. There are three phases: HTTP get to re-long, HTTP get something else. It works quite well. I also managed to find one bug with this help. It's an interesting option. This is a very simple bug. I mentioned it a year ago. In Poland there are around 350,000 devices with a simple stack buffer overflow. It was shown on the return address. A tip: if someone is at the back and doesn't see what's at the bottom, because there will be a lot of code at the bottom,

then turn on the stream and listen to me. It works, really. I was watching the presentation on Friday and it's okay. There is a Wi-Fi network. These slides are available on the Internet, so you can also... I'll tell you later, so you don't have any surprises. Yes, that's a good tip. One bigger tablet and three people watching. So far, one of these routers is not working. UPC refused to update it. So, you can shoot routers for over two years. It's a simple CSRF and you can clean the router's memory, turn off Wi-Fi and that kind of stuff. After 2013, the era of fuzzing changed a bit. It is a conditional date, because then a fuzzer was created,

which I want to talk about quite a lot, you probably know it. But something has changed in the industry. Fuzzing has become more intelligent. We started to study how the code is progressing. The data is being sent in the code. In the code, if someone saw ID, they know what is happening in GraphView. Generally, we knew what was happening in the program and we chose the best test cases. We are not interested in cases that, let's say, fail on the first test. First validation, that it is not a string or integer, but we want to go further and to crash the program. Also, sanitizers are often used, for example, Asan, Emsan, Upsan. Someone uses it in everyday work, because it is a very

cool tool. Do we know what it is? Asan? OK, I understand. Sunday morning. Well, asan is a very cool tool that monitors access to memory. If this access is done via an index, a table, we are informed that we have a stack buffer, overflow or heap. HIP, BASE, OVERFLOW. mSAN is used for memory leaks generally. Memory leaks are a very big problem. If you think that there are no memory leaks now, that everyone has learned to release memory, no. Especially in parsers. ImageMagick is a tragedy. Recently, I announced in four different parsers in ImageMagick memory leaks. PDF parsers, MU-PDF, MU-PDF Poplar, the same, just every crooked PDF. which does not meet the specification requirement, the end of the program ends with memory leakage. So this is a big problem.

UPSAN is Undefined Behaviour SAN. I haven't used it yet. But generally it catches some weird inconsistencies. In general, we try to focus on the problems of the C language. So managing memory, access to memory, cleaning this memory. Okay, it's cool, we know how the code works, we know where we get in this code. So we have a lot of information on how this program works. But we also need to know where to get the test data. For me, when I started working here, it was a certain obstacle to the achievement. Okay, but where do I get good test cases? It doesn't generate itself, but generally, for example, putting a text file into a PNG parser is a stupid idea. Because before my fuzzer, how

many thousands of times it has to go, millions of times, to get to some part of the PNG file structure, Hello. And throwing good test cases, photos from holidays or some internet graphics is generally a pretty cool idea, but it will not allow us to squeeze as much from the phaser. We don't want to phase three months to find a memory leak. We don't want to find this memory leak in three days. I know that this is not enough, but let's say that something is catching us, we have to test the code quickly, so we have 3 days to catch as many errors as possible. The same with some other things. I found a way for this, there is a lot of test data on

the Internet. I just crash, I don't crash, it's enough to backtrack the largest projects: Wireshark, ImageMagick. In PCR2, there is a shake of ready-made test cases, all you have to do is download them and put them into a folder. There is a bit of a strong disagreement about these test cases, because it is important, it is quite important. Later I will tell you how to work with them. How we will mix data. These are phasers internals. But it is worth, for example, if we are phasing HTML, for example, some HTML parser, it is also worth to keep HTML files. It is not good to throw random bytes as to HTML parser. Sorry. Unfortunately, it is. So it's worth remembering that. First of

all, we need to cover the code. We fight all the time for our test cases to reach as far as possible. For our test cases to break as much as possible. That's the problem. And the second problem, or the third one, I don't know. How to do it effectively in the final time? I've already talked about it. We don't want to phase a month, we want to phase three days, we want to phase a week. I have already mentioned about sanitizers, how I have found errors that do not cause program failure. Some of them, such as HIP based overflow, as found by Asan, do not crash the program. This is a detection, but the program

will still fly. If something sensitive does not hit, the program will continue to perform. And the error will continue, so it will not cause us a crash. And we reach AFL. Who hasn't heard AFL? American Fuzzy Loop or Welcome Tofie by Michal Zalewski from Google. One person. The rest has heard AFL? Really? Damn, the frequency has become a lot. Okay. It looks like this, it's not visible here. Maybe I'm listening to the light, maybe. I don't know what's the problem. No? Neither. I'm starting. Runtime is how much time the phaser works. No, I'm kidding. Generally, it looks like this. A lot of interesting data. The most important thing is what is red - total crashes. This is the Yara phase. I will tell you

more about it later. 5500 unique crashes per phase hour. 17 unique. This is the version of Yary H350. So it's pretty cool. Does anyone use Yary? No one will admit it. Well, now yes. Okay, if we know what AFL is, I won't be disappointed. If we want to work with AFL, it's worth it. We can compile it with AFL Clang Fast, because it has a lot of optimizations of the compiler, which allow us to extract more execs per second. Additionally, our compilation program is behind SAN and mSAN, so we just need to run it, but of course outside of FASR. If something fails, we will know whether it is a memory or a base overflow. AFL is the most common phaser. There are

many errors found with AFL. I don't know if you've heard of it. But AFL managed to find a new Herblide within 6 hours of phasing. It took 6 hours of phasing and we managed to find Heartbleed again. So it has a lot of power and a lot of errors are found thanks to it. If you go to the Lcamtuf website, this phaser, it did not manage to complete the table where the programs were, along with the links where these errors were reported. But not around AFL, the whole world is spinning. A very cool project is LipFuzzer. It is part of the LVVM clang. The problem is that you have to write a little bit of code. If you can't see it there, I recommend to turn on

the stream. Even though this code is not that important, it is a very good thing because it is used by Google to phase Chrome. It is the basic phaser of Chrome. About three years ago there was a post on Chrome's blog about how to phase. There are a few pages, four pages, how they do it. When they test, they have a whole cluster to pass, you can put your test case and your code to test different components of Chrome. So there is power. And just like AFL supports all these ASAN, MSAN, OBSAN, You just need to write some code. Because people from Security are lazy, I don't like to use Chrome, I like it when people are looking for my

mistakes. I haven't played with it yet, but you can also use Chrome with AFL. The component pdfium, which is responsible for displaying the PDF in Chrome, works very well with AFL. It is also worth remembering that not all errors They are able to find it. Marcin Noga was very cool about it on Friday. I don't know how many people were at his presentation. It was very technical, a lot of tax analysis. I saw some people were pouring in. However, this error was found using Code Review. Someone told Marcin on Reddit that it was enough to start some phaser and we would find it. What is it? Well, no. When he showed this path, everything, when it comes to the moment when there was an

error, then really, Fuzzer would have to work a lot of time to find something like that. So it's worth knowing that it won't solve all our problems. It's one of the parts of the development process. But it won't solve everything. It will find very strange errors, very interesting errors, which we wouldn't expect. I'll show you that at the end of the presentation, how much time is enough. And the time is good. But in fact, it's not a remedy for everything. It's worth knowing. Ok, we're starting to phase. Good test cases, a few practical tips, small ones, really, forget about 500 MB test cases. Forget about it, no, no, really, AFL is so smart that it won't let us in above 1 MB. So

it's okay, below 1 KB. Really, it can do it. All the errors I'll show at the end, not more than 50 bytes. 50 bytes. In fact, it's more than 20 bytes. One has 13 bytes, I counted yesterday. 13, so it really works. It's different functionally. What do I mean by that? If we test some parser, we throw it in. JPEG, PNG, GIFs and some other weird ones. I remember some very weird formats, for example, ImageMagick, I use it. Let's throw it all in, let's mix it. It really does it. Very interesting errors come out of such mixing. We use the previous tax, the whole phasing cycle. I didn't put it here, but it's a cycle. We start with some test cases, phaser generates something, maybe

it finds some errors. We have a queue that was generated there and we use it in the next iteration of phasing. We really have test data for free, generated. Often these are good quality data, these are not some data, let's say, Dev from Rando. So it's worth it. It's worth to read test cases between different solutions. For example, if we are creating a parser, Marcin just talked about libarchive, it's worth to read test cases from libarchive, and then you can go somewhere else. It will work well, because these programs are often different in implementation of different things. And you can do interesting things. And small files are important. The previous interaction is what Fuzzer generates.

We have good test cases, we removed the entire backtracker of Wireshark. We compile it. We use the variable asana. I will tell you more about asana in a moment. It is important to use clang and clangfast. Minimizing test cases. As I said, if we have large test cases, they are pointless. They are pointless because they contain a lot of data, but not enough structures that we can dig up. So let's minimize test cases, let's check how much they affect the number of paths achieved in the code. We have several tools for this with AFL and I recommend this flow. First, we minimize the corpus, i.e. all our test cases that we throw in, no matter if they are good or not. AFL Come In

will throw out all the test cases that are not suitable for us, which are redundant paths, etc. We minimize it, and then we minimize file by file using Thanks to that we remove bytes which are not valuable for program output. This is how the corpus minimization looks like. We can see that the binary has 13,284 tuples. We understand tuples as code blocks. Just like in ID we have these code blocks, the same here. We have 311 files. This is a lot, up to 100. Although AFL can handle it. I'm putting 13,000 files and 150,000. I like to beat and test various There are some critical cases, so why not? They have measured a bit and

found 82,000 different block codes. The files are processed and 239 of these valuable files are just t-share test cases. Ok, we have a minimized corpus, so let's say we want to pass something like HTML parser, for example JS. Let's use the dictionary, random bytes will not give us anything, let's not be fooled. Let's use the dictionary, which we can add ourselves. Do you want to ask a question? It's very useful for us because we can test test cases with and without language folder. We can see how the program behaves when we enter random bytes, but also when we enter bytes that are in some way consistent with the specification. We don't catch the first checks related to the folder, that the file has to start with

something. We finally reached the point where we have a phaser. Here is how the whole phaser stack looks like. Generally, it is worth sometimes limiting the memory. The larger one was supposed to be 50 MB, so it is too little in the case of some parser, especially images, it is definitely too little. Especially now that memory is cheap, you can buy cheap VPS with a large amount of memory. So the limit of 50MB is absolutely small. Timeout and I start phasing. I want to switch to the optional test itself, but something went wrong. We started the phase and we see that 3.4 seconds of the program start per second. But we would have started it manually faster. Unfortunately,

it is often so. Everyone who starts fuzzing has to face this wall. They will not listen to you before. A large test case, small test cases, large test cases, too large binaries, they have a lot of checks in them. For example, in the case of libraries, it is worth writing minimal wrappers for the library, i.e. that it will read out some file for us. Let's say it searches for some basic operation, let's not go into any different... It's enough basic functionality. It crashes while reading to AFL. I often catch myself on it, I'm ashamed of it, but I just have to do sudo ldconfig before running, so that the libraries' paths are updated. And often I get a binary error and I don't know what's going on. It turns

out that after starting the normal one, I can't find a shared library. Massacre! I don't know, I won't learn this. And the compressed TMP, especially when it comes to phasing the image of the magic. Massacre! What's going on in this TMP is a tragedy. So it's worth writing simple tricks in Python, in Batch, whatever, which will just get you out of there, using something like TMP every 5 seconds. It is even better to transfer TMP to RAMDISK and clean it every 5 seconds. Then we have a lot of power. Ok, we have 2000 program starts per second, that's ok. We can live like this, we can't do it by hand. But we still use one CPU, because the AFL takes only one logical core.

So we can phase on many cores. To use something called persistent mode, I'll tell you about it in a moment. I already talked about mSAN, so this is only for example, Disus AFL. Disus AFL is a cool project, it is developed independently of AFL, which allows us to phase on many machines and simply synchronize the results of the phase. For example, we have 10 VPS in different parts of the world and we want to phase on them, for example, OpenSSL for example. So we can synchronize the phasing results with the help of this project. If one phaser deals with these test cases, then the second one deals with others. And synchronize the results to have them available only on one machine. is an

AFL magic that allows us to skip the order related to the start of the process. This is quite important, because before all the initial things that are relevant to a given process start, it takes some time that we could skip. for a greater number of execs per second. So we can use such a macro in the code. Of course, we have to modify the source code. Most often, some wrappers are modified. It is important to modify it. I must admit that I am ashamed that I always forget the data read. I make loops between here and there. Generally, the program often fails because it tries to make loops on the same data. I don't know why I have it like this. These three basic operations are enough

in this loop: read, parse, parsing and cleaning. And thanks to this, we can even two or three times, some of them managed to start 17,000 times per second some kind of Pearl. So this is already a lot of power. Even if I have weak test cases, we will simply make it a processing power. For example, when you read about it on the Internet, you will always find thousands of iterations. I, as a poor guy, do 2-3 times more, sometimes 5. You also need to find how much the program is capable of. You just need to set it up so that it is as much as possible, because sometimes 5 is too much, it has less

execs per second, so it is worth finding a golden mean. But a thousand is definitely not enough, I always give about 2-3 thousand. I've already mentioned it, but the problem with Asana is ours. It will slow down our memory about twice, but if we get rid of it with Persistent Mode, then we can live. We have a speed limit, so we don't have to restart the process. One will go to the other. What is important, though, is that it doesn't see the back. It is worth remembering that without limiting the memory, we should not try this on the X64. Why? Well, ASAN can, by default, even eat 20 TB of operating memory for its own operation. So if something goes wrong, then we can

very quickly close our eyes. It is very well described in the AFL documentation. There is even a script ready to limit this memory. But I will start an unusual program with Asan. Apart from Phaser, no, not with 0.2 TB. This is what Asan looks like. It's hard to see, but Asan is very cool because it shows us where it was filled and what change was made. With its name and which lines in a given file. This is just pcr2. Which lines in the file are responsible for this? What was the path to this error? Very cool thing. Ok, we've found the crash. What's next? This is the beginning of the game. Checking if the application is also crashing

in the default mode. In theory, it should be. In practice, I had it with ImageMagick, that if it was compiled with GCC, it didn't crash, and with Clang it crashed. I don't remember. It was like that, depending on which application compiler crashed or not. I didn't have time to check why it was different, because I was busy with something else. But it's a very strange thing. Maybe I'll write it on my blog, because it's worth knowing such tastes. After we find the crash, it's worth minimizing it, because if we have 400 kilobytes, it's pointless. Usually it turns out that it's just a test case and 20 bytes is responsible for the entire code failure. So, what I mean is:

reverse, code review, where, what, how, if it is an open source project, and Peruvian Rabbit-5. This is an interesting thing, because we are going the other way. In test cases, not our tech, which we have just mentioned, but the crash. The point is that the MFL works in such a way that when a path is already crashed, it just avoids it. And here is the opposite approach, that is, we consciously push ourselves into the paths that had failures. Just yesterday I found some kind of hip overflow in the PCR, another step. There was just one more failure. Why not? It's a very nice approach. It looks like this, it's not much different. This is the unfortunate PTR2, just started before yesterday. There were only 260 crashes, but

that's what the crashes that we had generated earlier were. We are able to generate almost 70,000 crashes very quickly. Yesterday, when I looked at it in the evening, I had about 8 million crashes, so a lot. Yes, these indicators are only for orientation. I use cheap VPS. It's strange, but it has power. Why? I am a student, I have a github student pack. I have a free code for DigitalOcean for $ 50. Generally, I recommend GitHub Student Pack to students. You can use it, there are a lot of developer tools, there is a free SSL certificate, Domena.me, among others there is DigitalOcean, a very nice hosting, it has API, you can automate it. I run Fuzzer on 512 Mb RAM and 1 CPU.

All the errors I will talk about here are found with the help of these VPS. The approach is important. I don't know if someone is interested in breaking hash, or even in the hashcats, where the password breaks. It's not about the power, it's not about having the latest i7 with 10 cores and 20 threads. It's about having the appropriate approach to the topic. People who have a high level of knowledge, they win these competitions. They know better how to approach the topic, where they can find something that is interesting, that can change the course of the process of finding a crush. I recommend this approach. Especially if we are students, we can be accepted as a freebie. If

you are not a student, I recommend this ad, but it's free, from my experience. Aruba Cloud. For 4 PLN you can have a VPS, 1 CPU and 20 Gb of RAM. It's enough, but the problem is the old interface. It is very, very weak and the problem is that it cannot restore the state of the virtual machine, so that it cannot be rebuilt. It is necessary to refresh the operating system completely with the help of DigitalOcean. There is no such thing here and it is difficult, because if a couple of iterations and phases are performed during the month of various programs, then the system is very dirty, I will say it gently. Ok, let's go further. What can we use to identify a breakdown? You

know, AIDA, HexRace, Walgreen. Does anyone know what Walgreen is? One person. Ok, you can read it. I forgot to mention that besides the plugin Exploitable, there is also a nice plugin called GDB PIDA It is open source, it has a lot of nice registers during the crash I like it, you know, ASAN, EMSAN, OBSAN And there is a script that allows us to collect all the crashes that the Eiffel Tower has found. Don't write it, because it's all in the network. I'll tell you in a moment. Too bad for the long-term. And a few bugs, good times. Generally, in the last 3 weeks, StackBufferOverflow in PCR, such a payload is enough to... It's like 1, 2, 3, 4, 5, 6,

7, 8, 9, 10, 11, 15 bytes. The length was miscalculated, during the regular expression deployment, the length of the branch was miscalculated. The table is probably variable, it has 1024 bytes for this regular expression. If there is a larger regular expression, then memory is allocated on the touch. Unfortunately, due to the bad calculation, this memory was not allocated, because it seemed to him that this buffer would be enough. Well, it wasn't enough. The effect, there was a screen filled with the stack-parse-pattern table. And we have Stack Buffer Overflow. This is announced, we haven't got an answer from the developer yet, so I don't know how they approach it, but it's in the RC version, so developers can still

forgive them. But this is much weaker, this cannot be forgiven, it is already fixed. Filling the buffer related to the wrong service. It looks like this. In the payload, we turn off the UTF-8 validation, because we can. And we throw the bytes that are not compatible with UTF-8. Effect, buffer overflow. The length of the buffer to be copied to UTF-UNIX is wrongly calculated. And the buffer overflow effect. The payload looks like the third byte was incompatible with UTF-8. This has been loaded, this doesn't work anymore. But it's also an interesting error, it shows that I have turned off the validation, but then the whole program is... Especially that validation can be turned off in the payload, so we shoot ourselves in

the knee. Yara. I asked myself why someone uses Yara. Quite often, in the security system, Yara is used for classification, malware tests, for finding patterns, anything. Simple mistake. We import another 13 bytes that Yara gives us. The length of the string is not the same as the value of the import. The next variable len = 0 and the next one is len -1 = 0. This is already loaded. No, it is not loaded. The second one is loaded, the double one is loaded. This one is not loaded yet, it is still hanging. I don't know why, I don't know why I don't want to load it, it's been hanging for over a week. Why I

said not to write? Because it's all on my website, there are a lot of materials, there is a presentation, there will also be a recording. All the projects I talked about, tutorials, posts, recordings links from other conferences, I collected everything here. So that it would be convenient for someone to ask me later, you have everything here. That's all from me. If you have any questions, I'm listening. Have you used or have you got any opinion about Radams? No, I haven't used Radams. I don't remember it, but it's used for mixing. No, I haven't used it. I haven't used it. Anyone else?

you I have a question about your work. Do you choose tools, samples of the program you analyze? No, no, no. It's my initiative. I choose what I'm doing, what I'm doing, how I'm doing it, what I'm doing it for. I have absolutely no restrictions. If I borrowed a cluster of 300,000 cores, I think someone would shout. to get the best possible effect and I think that 13 bytes is enough to make a parser or something. Generally I try to make open source projects to make some kind of contribution to make it more sustainable and safe, since the source code is available from Africa. So there is absolutely no... What I liked? Maybe tomorrow when I come to work, I will say: "Ah, today

I have Chrome." And Chrome is gone. I don't have any added. Of course, I don't want to be neglected by other duties. Any other questions? If someone has not written it down, here is also a presentation, so you can watch it. It is already posted on the Internet, you can go there, there is a link. So if someone has not made it, go ahead. Any more questions? If someone would like to be born after watching the presentation, on YouTube or on my website, you can write to my official website or to my official website. I have a 24-hour response time, so I will gladly answer all questions related to the entry with some walls at the beginning of the phase

or with other problems related to some phasers. But don't ask me about the live phaser, I haven't started yet. Anything else? What is the business model of this? I understand that you do it in working hours, so your employer probably earns money from it. We are part of NASK as CERT Poland. We are a research institute and in general, our team, the team I work with, deals with the research of various current threats. We are trying out some malware that targets Polish users. For example, I saw the latest campaign with Play invoices. If I reverse the cryptolocker, I can launch something. Generally, we also have paid services, such as Continuous Threat Intelligence. For companies, you can

order it from us and then we serve security incidents on its behalf as a CRT. Of course, we are also a unit that can take fishing out of our hands. For me, a very interesting element of this presentation was the zero-day throw. But no, these are not zero-days. All of this was reported after contact with developers. Ok, write in our system, in the backtracker. I didn't show anything that wouldn't be on the Internet now. Zero-day is not something that is not reported, it is something that has no corrections yet. Yes, we know. I've met with the fact that vendors don't care about security and generally it's like you report and they say "ok, you reported, so what? What do you want from us? What

are we supposed to do?" We won't let this update go because we don't want it to because most often this router or this device is end of life. It's very common that the router is still sold for two years and vendor says that it is already an end of life product. And this indifference of developers does not irritate me so much. So this is not their problem. Here the risk that is worth noting is that in the case of some permission, this permission will be made to your employer if you do it on the order of the employer, not to you. No, no, no. We are not afraid of permissions. Any more questions? That's all from me. I will be

here until 4 or 5 pm. If anyone wants to have a beer or something, I am here to talk. Thank you very much.