← All talks

Hello Kitty - Binyamin Sharet

BSides Knoxville48:05231 viewsPublished 2016-06Watch on YouTube ↗
Mentioned in this talk
Tools used
About this talk
Kitty is a new open source fuzzing framework. It's modular, extensible and flexible. Kitty allows stateful fuzzing of targets over unconventional communication channels, such as USB, Bluetooth, SPI, UART and CAN. It features a rich context-aware data modeling syntax, along with mutation fuzzing engine. In this session I will talk about our reasons for developing Kitty, its main features and concepts, and hopefully get to show a short demo. Familiarity with fuzzers is recommended for this talk. https://bsidesknoxville2016.sched.org/event/6tCg/hello-kitty
Show transcript [en]

So my name is Benjamin Sharet, but you can also call me Ben if Benjamin is too weird. And I work for Cisco in a team called STER, who perform security threat analysis and reverse engineering, mainly for embedded devices. And mostly in

black box format. And we've come up with a fuzzing framework to fuzz all these weird embedded devices which have different kind of interfaces like USB and Bluetooth and Spy and they don't always have the common interfaces such as TCP IP based protocols. So, KITI is our fuzzing framework and I'll try to introduce it today.

So, I'm going to talk about a few things. First, I'm going to talk about fuzzing in general. Are you all familiar with fuzzing? Who is familiar with fuzzing? Yeah? Okay. So, this will be first. And after that, I'll talk

about the purpose of KITI. about the structure of KTE and then I'm going to go into some details of some specific features in this framework. And I'll try to show you a demo at the end.

So I'm not going to go over fuzz and just to mention that it's a testing technique where you test software by providing malicious or bad input to its different interfaces. And specifically generation-based fuzzing, are you familiar with this term? No? Okay. So generation-based fuzzing is the type of fuzzing where you provide some description of the data or the protocol that the software accepts or the firmware accepts. And this model hints the father how to create, how to generate the payloads. So the payloads are more similar to valid inputs and they get to be processed deeper in the software and they are not thrown away at the first part.

So in general, the fuzzing process might look like that, and in most cases it is. We first need to start monitoring our target. Then we generate the data that we want to send to the target. We feed it to the target, and then we collect the results, which basically means to see if the program crashed or not in the most basic cases of fuzzing.

And then move on to Kitty. So just one big description line. Kitty is an open source generation based modular and accessible fuzzing framework for real targets written in Python. And I don't like this line. It's too long for me. And I can't get any of that. So I try to divide it into Separate parts. So first, it's open source. You can find it on GitHub. I have the links here. And you can get them later if you want. And we very much appreciate collaboration and contributions from the community. We already got some, which is awesome. It's generation-based. So anything I said about generation-based fuzzing applies to Kitty. It can also perform what's called mutation based, which is also referred to as dump fuzzing, where you

provide a default valid input that you recorded somehow. It might be some JPEG file or TCP packet or whatever it is. And then without knowing its structure, the fuzzer will try to mutate it and replace bytes and bits inside. and try to break the software without any knowledge about the internal structure. So in cases you don't have enough knowledge about the structure, you can perform mutation-based fuzzing. It's modular, so pretty much every part of it can be replaced with something else. So if you start by fuzzing a target for JPEG files, it's very easy to just replace it and fuzz it for PNG files now or something like that. Most of the examples that I give during this talk

are very familiar and common, but just keep in mind that the real feature of Kitty in my opinion is that you can actually fuzz things that are very uncommon. In most cases, We don't work on PCs and we don't test PC applications at all. Not many

mobile phone applications as well. Our targets are really different and all the examples here are for things that you can easily connect to.

So you can replace pretty much every part of it and Later when we get to the structure, you can see that each part of it can be replaced according to a specific target. It's also extensible, so the already implemented parts of KT may be extended as well to get a better functionality. It's a fuzzing framework. It's not a fuzzer by itself. download Kitty and run it. It is used to build your fuzzer or when someone built it you can use it of course. But you don't get everything for free. There's some work that you need to put into it to get a working fuzzer. But it takes the most common tasks of all fuzzer which is

the general cycle of fuzzing test and reporting mechanism and the UI and also the syntax of the data model for generation based phasing. And it provides with many tools and utilities that makes creating a new phaser much easier. And the last thing that, no, it's not the last one.

It is intended to fuzz real targets, not the standard target, although you can perform a fuzzing of normal targets as well. So you can fuzz HTTP or WebSeller with it, but you can also fuzz a TPM module that speaks over an SPI channel or a UART. The other thing is that it's written in Python. And it was very important to us so you can run it from Windows, from Linux, from OS X. And also Python have so many libraries so you can perform, build your fuzzer really easily and implement all the other modules really easily. If some of you are familiar with other fuzzing frameworks, then you might recognize some of these parts. Kitty was

inspired by two tools. One is Pitch and one is Sally. Pitch was developed by Michael Eddington. And Sally was developed by Pedrama Mini, which is part of Open RCE. And we kind of took ideas from both fuzzers and tried fuzzing frameworks and tried to make something that will fit us in our day-to-day work.

So I'm going to talk a little bit about the structure of Kitty so you'll get a better idea of how it works and what's included inside it. So here we have a common setup, a very abstract setup. On the left side, you can see a Kitty-based fuzz. I used to write Kitty there in this part, but it's actually a Kitty-based fuzz. And in the right side, you can see our target, which we want to fuzz. This target might be some server, some server software, and it might be a phone or a TV or whatever it is that you're dealing with. And I'll try to go over each of the Fuzzer parts. The third one is the Fuzzer class. We have two types of this class, and in

most And in most slides, I will only refer to one of them, which one is used to fuzz servers and one is used to fuzz clients, and we'll get to that later. And the fuzzer has a responsibility to manage the entire fuzzing session from start to end. It has a data model that it gets, it payloads from, and it has a target proxy that is used to control and send data to the target. Once the test is done, it will ask the target proxy for a report about the specific fuzzing test and move on to the next one. The next part is the data model. And as I mentioned, there are two types

of mutations that that Kitty can perform. One is the generation-based fuzzing to generate a smart data, and the other one is mutation-based, which is just a mutation of a non-valid payload.

The next part is the target proxy, which is used to actually communicate with the target. So if your target is a web server, again, then

this target proxy will need to implement a TCP connection to the server and to be able to send and receive messages from the server. And if your server is actually a device that speaks over UART, your target will need to use maybe PySerial to create a serial connection to the device. The target also controls It contains two other parts, which are the controller and the monitors. Those parts are used to control and monitor the environment of our target. So the controller, its main task is to make sure that our target is ready for the test. and to see after the test if it's still up. So it will check if the web server is running, maybe by sending a message or by checking if its

PID exists. And if not, it will start it. And at the end of the test, it will test it again just to make sure that it didn't crash. If any of the components of the father detects a failure, it will raise it and at the end of the test you'll know that a specific payload crashed your system. The monitors are kind of similar to the controller but they have less responsibility. Their main task is to monitor some aspect of the environment. You might want to catch a pickup file of all the network traffic at the same time or in other cases you might want to get a power consumption log of your device if it's an

embedded device. So you can say, okay, I can see that when I sent this payload, the device started to consume a lot of power and it might mean that it got stuck at some loop or I don't know. It really depends on your target. So this is the basic structure of Kitty. I'll go over some of the features that I think are special to Kitty and at some level we didn't see them before in other frameworks. The first one is its data model. It's a data model with a very rich syntax. It allows you to describe complex structures and I'll show some examples of how it works. I think this is going to take a big chunk of the presentation. So

the basic strategy of payload generation in Kitty is as follows. You have a field which is the smallest component inside data model. It describes only a single field in your final message. And it has some default value. and Kittin generates a library of other values, of possibly invalid values. And then each time this field is mutated, it will go over the fields and provide, over the library and provide values from it until it exhausts the library. At this point, it will say, I can't mutate anymore. So for example, we have a string field with a default value get and kitty will generate a fuzzing library similar to that. It will have duplication of the string. It

will have format strings. It will have long strings. And all these kind of values are targeting specific known issues that are related to string processing in programming. Another example here is an integer field. In which case we have the default value of 55 and then KT will generate a Fuzz library that is similar to that. It's bigger than four values of course. With very small values, with zero, with negative values, with the off by one values and so on. Now when you take multiple fields and combine them together you get a container. use a container for that. And the container can be mutated as well. And the basic strategy is to use the default values of its fields. So in this case we have a string

and an integer the same as before. So the default behavior of a container will be to render get 55. And then it will start mutating one field at a time until it is exhausted. So it will take values from the fuzz library of the string field and the default value of the integer and when it's done it will move to take the default value of the string field and values from the fuzz library of the integer. And in most cases this is enough to find the common bugs which are usually related to a specific field. If you change too much of the data the packet will not be processed

it will stop the... the application will stop the processing of the packet at a very early stage. So some of the fields that we have in KT are the atomic fields, which are fields like strings that we just saw. Again, we can see that performed in... In white, you can see the declaration of such a field, or very similar to the actual declaration. And after that, you can see in black some of the outputs of some of the mutation of the field and in comment the type of mutation that Kitty performs. So we can see duplication, null injection to the middle of the string and format string. All this output is of course in Python syntax. So the slash x zero zero is actually a null byte.

Another field is the integer field, and we have some types of integer fields. In this case, we use a little endian 32-bit field. So the default value is cafe babe, and all of the values will be flipped in bytes because it's little endian. And we can see some of the mutations. Some of the mutations are really generic because it's not related to the specific value, like the minimum and maximum value of a field. but you can also see bit flips on the initial value of this field. So it tries to target both generic integer issues and issues that are related to the specific value that we expect to be in this field. After that, we have the containers that we just mentioned. They are

used for two main reasons. One is to combine

multiple fields so they can be a single functional unit. So in this case, we want to encode three fields together, three strings together as one string to get the HTTP authentication token. And if we try to just encode each of them with base64, we would get a different string than what we need. because Base64 adds padding to the string. But if we say, OK, all of those strings are related to each other, and now treat the entire container together as a single field, and encode this entire container as Base64 fields.

The other usage of containers, and in this case, we can see the list container, is to perform mutations that are not related to the content of each of its fields, but instead it treats each field as a closed unit and then perform mutation on the orders of the fields inside it, or sometimes it will duplicate fields, swap them or meet them and so on. There are multiple of other fields, of course I'm just showing a few examples here.

The next type I think is the strongest type of fields in data model and this is what really separates it from mutation based fuzzing because the biggest problem with mutation based fuzzing comes when you have a checksum for example or you have a length field and then each little change that you do will not match the final checksum or some known length and again the application will stop processing your input at a very early stage. And if you want to get deeper into the application, you'll need to have a way to provide data that is not the same as the original packet, but it's still true. It's still correct. So here we have, in the first example, a size field. this container

describes a Pascal string, which has first a byte with the length of the string and after that the rest of the string, unlike C strings that ends with a zero. And you can see that even when the string field is mutated, like here, you can see that first it gives us hello, which is a default value, and after that, percentage, percentage, and so on. Even though the mutation is performed on the second field, the first field is calculated each time to contain the correct value according to its purpose. So in the first case, we're talking about size in bytes. And in the second case, we have even more complicated field, which will contain the hash of another field.

And you can see that there is no relation that there is no requirement that the calculated field will be before or after the field that it depends on. So here we have a hash field, and by specifying that depends on a field name, it knows to refer to the random bytes field in this case. So this kind of field are very, very useful. The last type of field is dynamic field. And this type of field is used when some of the data you don't know at the time that you create the data model. Like for example, if you connect to a server and get a token from it, and from now on you have to use this token, this login token, in each

request that you send to the server. Otherwise, it won't process your request. So when you describe your packet, you still don't know the value of the token. It will change each time you connect to the server. So dynamic fields are used to get data in runtime from outside sources and inject it into your payload while keeping the rest of the structure correct and while fuzzing specific fields in the payload. So in this example, we have a dynamic field with a key token, which is the important part. This key will be used to extract the session specific data later and a default value of four zeros. And if we render this field, we get the default value. Then we call set session data

and we provide a dictionary with the entry token with the value 1, 1, 1, 1. And this is the same as the key token that we have in the first line. And when we render it now, we get the value from the session data, which is for ones. After that, we try and set session data without this value. And it will not affect the field without the token key. it will not affect the field, so we are still with four ones. And then we try to set the data with both token and something else, and it will take the value from the token field. So now we have two. And then if we reset the field, we

go back to our default value. So this comes very handy when you try to perform a session and not only send a single message,

to your target. This issue doesn't exist when you fuzz file formats because they are static, it's not related to communication. It's actually just passing one chunk of data. And once you get to session-based fuzzing and to session-related fuzzing, you have to have a way to fix it. And some of the ways that it was done in other fuzzing frameworks was just to take the payload and fix it in post-processing. And this has many unexpected effects on the data structure and it renders it invalid in many cases. So after we use different containers and fields and atomic fields and so on, we get a template which is the topmost container which represents the entire message or

file or request that we want to send. In this example, we have a template with some fields. The first field is an absolute offset field, which is the absolute offset of the hash field, which is the last one.

And in this case, it will contain the offset of the hash field in bytes from the beginning of the packet. You can also, there are also relative offset, which is the offset of a field from another field. After that, we have a sizing byte field, which again refers to the entire container called content, and it will hold the size of both strings that are contained in our container. And the last part is the SHA-256 field, which holds the SHA-256 hash, of again the entire content. So there's no problem to refer to the same field from multiple fields or stuff like that.

And once we describe a few messages like this, we can now describe a full session between a server and a client in most cases. So we can describe how communication flows and what is the structure of the session. To do that, we use what we call a session graph, and it looks something like that. First, we define a graph model, which is a kitty object, and then we connect an init template, and each thing that will pass to the connect method is a template composed of multiple fields and containers, as we've seen before. And this, since we only pass one argument to the connect, it will create a root node called init. And then we can connect another template to it and we say root login is connected to

init. So now it knows that before you can send, before you can fuzz a root login message, you have to send a valid init message because you have to pass this stage. After that we have a root action template that is connected to the root login message and again the same rules applies to it and so on. We have a few more and the reason that I put all it is just to describe that we actually can create a directed acyclic graph which can't have any cycles in it but each node can point to multiple nodes, can have edges to multiple nodes and each node can receive, can be connected to multiple nodes as well. Such the init is connected to two

nodes and there are two nodes that are direct to the action. And then once we have this model, kitty will fuzz it and it will start by fuzzing the init message which has no a pre-requirement to be sent. So it will generate a mutation of the init template and send it. And this will be one test. And then generate a new mutation and send it and so on. And once it exhausted the mutation of the init template, it will move on to the root login template. So now it will send a valid init template without mutation and After it gets a response, it will send the root login template, but this one will be mutated.

And so on, it will traverse the entire graph and fuzz each path. And if we take the dynamic fields that we've seen, that you can set the session data in runtime and the ability to describe a session of communication between the fuzzer and the device it talks to. We can achieve context awareness of the fuzzer. So it knows where it is and what data should be sent at each stage of the fuzzing session and get deeper and deeper in your fuzzing. So let's

Take an example protocol. It's a simple protocol, just two requests. First, the client sends a getToken request to the server, and after that, the server responds with the token. And the next time, when the client asks the server to perform some action, it has to send the token that it just received with the actions that it wants to use. So it looks like that. And now he sends the token and receives some response. I added the token just, I don't know why.

So to describe such a communication sequence, while trying to fuzz the server, not the client, we need to describe two templates. The first one is a getTokenTemplate, which is

pretty easy in this case, it's just a string, please give me a token. And we don't need to describe the token response because we are never going to fuzz it. This is a response from the server, it's not an input. And we also need to describe the doAction template and you can see here in the doAction template that the first field is a dynamic field with a key token which is Important in this case, we'll see in a minute. So now we have our templates and Kitty knows how to build a valid request and also how to build a malformed request. And now we need to connect these messages to create the graph as we've seen before.

So we have here a graph model. And the first call to connect is simple, it's just get token. As before, we create a root node. And after that, we have a new type of call, which is not only get token and doAction, but also some callback. And the two first arguments will behave as before. And now doAction depends on sending the get token message first. But the additional argument, the token callback, is a function that will be called once you got a response to the first message, the first node in the connect function, and before you send the next request. So you'll get a response to getToken, and then the token callback will be called. You'll be able to process

the response to update your model and only then render and send the doAction request. And we can see it here. This token callback is called with few arguments and the response argument is the actual buffer that we got from the server. We have some obsecure function here, extract token from response, which extract the token and then we set it. This is just for kind of simplicity, but actually in most cases it's even simpler than that. to set the session data that we care about in a way that the father can take it and put it inside the next action. In this case, edge dest refers to the do action because do action is the destination of the edge between get token and do action.

And we'll see it again here where the father, it renders the get token template and then sends it to the server. The server will send the token response. Then we'll call the token callback and update the data model. Render the next request, which is the doAction, send it, send doAction with the updated token, and get some response that we don't really care about. Because at this stage, the following session is over. Okay, the next part, which is the next feature, which I think is really neat is that Kitty can fuzz both servers and clients. And we've talked a lot about servers until now. All of the examples were server fuzzing, but client fuzzing is a little different, and

I'll be focusing on it until the end of the talk. So it sounds rather trivial to Fuzz client. I mean, what's the big difference between a server application and the client application? They both talk with another side. But actually, there are some major differences that make it really hard to Fuzz client. And in many cases, the solution that that other frameworks came with were partial and rendered many of the requests and many of the tests irrelevant. So there are three main differences. The first is the hook initiates the communication between the, for this session. So when you fuzz a server, all you need to do is to connect to some server that is already running and send a request. And you choose which request you will send

and how it looks like. And the server will either respond to it or not. And it might handle it correctly or not. You have no other limitations. When you files a client, you don't control when the client will initiate a request and send it. You can kind of control it, but not fully. You don't start the session yourself, and you also don't control what exactly will the client send. I mean, in almost any protocol, the clients, where the clients have even a little bit of space to vary, you will see different clients perform different things, even at the very early stages of the messages. And this is how many tools can detect operating systems or clients

based on the first message that they send. The second difference is that, and it is related to the fact that you don't know what the client will send, is that you have to send meaningful responses at each point. Because otherwise, if you still didn't get to the message that you want to pass, if you don't provide a meaningful response, the client will stop talking to you. So you have to keep answering the correct answers to each request until the point where you want to find a specific message processing. And the last thing, and this is arguably the server logic, at least at the part that speaks to the client, is many times much more complicated than the client logic. So

it's kind of hard to implement all this for every protocol that you want to use. And when we made this problem, we decided that we're going to take a different approach and we don't want to handle the communication side and the server logic at all whenever we files clients. So how do we do it? We start by taking a real server stack. So if you want to file, for example, a web browser, if we're still in the web environment, you want to file a web browser, you will find some web server, which is open source, and you can modify and run by one. And then you add a small agent, kitty agent, to it, this agent will be used to communicate

with the actual fuzzer and provide only the mutations whenever they are needed. So

the basic flow of

client fuzzing goes as this. So again, we have a client, but we have a target. But this time this target is a client and not a server. And we have the kitty based fuzzer as before, or almost as before. But this time we also have a server stack with the original server logic and nothing was changed there except that we installed a small kitty agent. Now this kitty agent is a very thin layer of RPC using JSON RPC. over HTTP. So we have implementations for Python and C, which is all that we needed so far. But it's very easy to implement it to other languages. And if you have a server in language that is written

in a different programming language, you can rather quickly get an agent running. We also chose JSON-RPC specifically. because there are many implementations for many languages of this RPC protocol. So we don't need to worry about that. So anyway, once we have this... Okay, we don't have time for the demo, but anyway. Once we have this

environment ready, we can start, and we start by monitoring our targets that we want to fuzz And again, it might mean different things. And once we are ready to start the testing, the controller is responsible to send some trigger to the client application or the client target that we want to test. It might be, in the case of a web browser, just starting it up with a specific URL. Or it might be sending some DBS message if the browser supports it. And if it's a hardware device, It will be just boot it up somehow. And at this point, we expect the client, the target, to send a request to the server. So it will send a

request, and we don't know what request it will send. We hope that during this session, it will send a request that we want to fuzz. So the server before processing this request, will pass it to the kit agent. And most of it you don't need to handle, of course, just how it works. It will pass this request and it will also tell the agent, this request is a request of type x. And this x, you need to know, but it depends again on the specific protocol. And the kit agent will ask the father, are we in stage x? So for example, this stage might be the login or the specific action that you want to perform. And if so,

the fuzzer will say, oh, that's our stage. Here's a mutation for this specific request, for this specific response. If it's not, it will return none. It will say, I'm not at this stage. In the fuzzing, just return the original request, whatever it is. And so this is it will return one of these two options to the agent, which will return it to the server logic. Now if there's a mutation, the server will send the mutation back to the client. And if not, it will just perform the original processing of the request and build a valid answer. In this way, you can fuzz a client incrementally by implementing only a single response first and adding more and more

as you go. Yeah, okay. So I'm kind of out of time. I'll skip the code walkthrough and demo, but if anyone have questions, I'll be happy to have them.

Oh, yeah, I skipped that. Yeah. If there are no questions, I can. Catnip, kitty doesn't contain any specific implementation for any protocol or any communication channel. So if you want to send a message over TCP, kitty don't know how to do that. If you want to fuzz PNG files, kitty don't know how a PNG file looks like. These are stuff that you need to implement by yourself, depending on the specific target that you have. So CATNIP is a collection

of such implementations. And we started to receive contributions from people lately of different monitors and templates. Templates is a protocol implementation. SPEAKER 1, in the QSB.

Yeah, it's nice. It's based on the face dancer by Travis, which is not here now. And yeah, you're there. And the UMAP, which is already a USB host security assessment tool. I think this is the formal name. So we are hoping to release somewhere soon, sometime soon, a new version of UMAP using Kitty. We already have an implementation for more USB descriptors and USB protocols than were originally in Yuma. And we were able to find many, many bugs in USB stacks using this environment. Yeah. I'm glad you asked that. Well, hey, I knew it was about, I mean, the abstract mentioned about USB and so I,

Okay, so if there are no more questions, I'll just put these slides again here. My name is Benjamin again, and you can contact me directly if you are interested in the tool, and you can go to our GitHub. We also have documentation and other stuff, but you can get it all from GitHub. And that's it, thank you.

[ feedback ]