
good afternoon everyone welcome to my talk on custom protocol reverse engineering and fuzzing my name is Sanders Diaz and thank you for coming out today uh it's a Sunday I know that uh people have uh but hopefully I can make your time worthwhile by giving you some tidbits that might be useful in uh in our careers a little bit about me I am a penetration tester based in the central Florida region I work for a government contractor uh some of my interests are uh I'm a packet monkey I used to be a sock analyst uh for years and since then I've always been uh very interested in in analyzing and dissecting packets I have a full packet capture box at home and I
like to uh to dabble when I can uh I'm a tinkerer so I have a few servers at home I have a lab I like to uh I mentioned the full packet capture box but uh I like to uh take apart things I have bit of a graveyard of uh old phones and uh things that I play around with uh and I'm a bit of a paranoid security geek I think uh many of us in the room can relate uh we're the person that our friends uh constantly hear from about the things that they shouldn't Post online or the things that they shouldn't do uh with their passwords or things like that uh some of the Sears I have uh
I'm a gcia G intrusion analyst I'm a gpen and I have a the uh Advanced penetration tester and I'm a senior at the University of Illinois um the motivations behind this talk uh when I was uh thinking about what to what to talk about I thought well what a great subject uh fuzzing is and isn't fuzzing a great way to find those vulnerabilities in custom applications uh but this can uh be met with a roadblock when you have a custom protocol if your application speaks a protocol uh dis closed that you do not have a specification for then you can't you may not be able to set up your fuzz and fuzzing is all about code coverage
if you cannot have your fuzzer speak something that the application can be able to post and understand then you will not be able to uh achieve the kind of code coverage necessary to find new abilities uh so there's no way around that you have to figure out what the protocol is that you are fuzzing and how is it put together tools like wire shark and TCB dump can help but it's not always clear how and we'll go over how why that is um so uh looking online and looking at resources for reverse engineering protocols it can be a black art to beginners uh a bit of a black art to beginners and that's because there's not
a lot of documentation a available uh some of the some of the documentation I found out there with uh through my research uh shows that I noticed that there's a lot of high level stuff not a lot of um uh very detailed information about how to go through and analyze a a in Reverse an engineer a protocol uh and a lot of the documentation that was available uh focused on being able to uh it focused on being able to uh look at the engineer the binary associated with the the particular Target application that you were you were working with uh what I want to focus on in this talk is looking at the network and using that as
a starting point for your reverse engineering so with that said what we uh these are the things that we will cover in this talk uh we'll talk about basic protocols design the attributes the structure the syntax based on my research of common protocols such as TCP IP SNP HTTP and others we'll talk about some of the uh the network verse engineering tools that can help us in this endeavor uh TCP dump which is a great set and forget tool wire shark uh has a a very extensible dissector uh framework and nsab which is a protocol reverse engineering tool that allows us to find low hanging fruit very quickly can anybody hear me cool all right uh then we'll talk
about the protocol reverse engineering process you know what it is that we we will be able to once we've talked about all these things then we'll talk about how to get
started some of the things we won't talk about in this uh talk this is not a fuzzing talk I know a lot of people we're expecting a lot of detail about fuzzing but we won't be going into a whole lot there's a lot of good resources online and and in books and and conferences that you will be able to use um but uh what I aim to do with this talk is be able to introduce uh protocol reverse engineering as a step to buzzing we won't be talking about uh reverse engineering the the actual binaries associated with client or server software again there there's plenty of good guides out there for that and we won't be talking about uh reverse
engineering uh encrypted protocols and we're going to stick above the transport layer of the OSI model meaning that we're going to be focusing on TCP or or UDP based protocols we first confront a a protocol that's new to us this is kind of what we find we find this uh this undef bit of data uh sometimes there'll be some some some text that we might be able to discern for the most part our tools won't be able to do anything with it the aim of of the uh entrepreneuring reverse engineering is to be able to find patterns and sometimes uh we're lucky and we're able to find some text that we can interpret and be able to put
something together this is actually a printer at home that uh uh had some uh uh uh uh it back and forth between my laptop and uh and the uh and the printer seems to be managing uh the the printer software as you can see there's some there's uh some interesting information that could be defined but what does that mean and what I'd like to talk about is just what are the piece parts of the protocol structure the flow things like that that might help us answer those questions so to begin let's talk about protocol structure and what that is is the control signaling the metadata and the payload that our messages contain uh most protols are organized in the
following way they have a header they have a body and they have sometimes they have a trailer uh IP is like this uh IP can have a uh uh I'm sorry it's uh ethernet it's like this ethernet can H does have a trailer that is used for uh cyclical redundancy check uh but protocols uh come in two basic flavors they can either be text based or they can be binary based Bas and they are framed uh framing is uh protocols speak for how you organize and and structure the protocol what is the delimiting associated with it there's two types there's uh fixed and character delimited and these can come in three different varieties three different uh
uh ways of applying that you have octet stuffing which means that a specific delimeter and message the client who uh receiving this communication is able to then remove that extra delimeter and find out which is the actual end of uh of the field or or the specific uh part of the protocol you're talking about you have octet counting which is essentially what it sounds like you count the octets and you put it in there HTTP does this you have uh connection blasting which is uh applied by FTP uh it sets up a new connection transmits uh the particular uh portion of the communication that needs to happen and it tears it down that's your delimiting another thing that's
important to protocol structure is endianness the order in which the bites are are interpreted you can be you can have big Indian which is known as the network bite order it is uh how the internet processes most of its uh of its just going to set this thank you all right so it's how the internet uh process most of uh uh of the common bite order one one
yeah so we're talking about indianess big Indian and there are a few out there that apply little Indian some apply the mixed can be applied to different fields an example of little lendian protocols is SMB uh and then we have different field types um applied to headers you can have fixed field protocols like uh TCP have variable optional Fields is actually seems like it's fixed field but it actually has a counter the the uh I uh the IP header length allows us to determine what is the the the current uh size of our header and when that counter goes up beond a certain level we know that IP options are present uh the last one we have here is the limiter separate
uh value pairs this is how text based protocols delimit their uh their fields uh you have the name of a of a field associated with a delimiter and a value followed by some kind of Terminator in this case in HTTP it's a crlf and then we have the protocol trailer I mentioned ethernet uh ethernet actually uses um uh the protocol trailer uh as a cyclical redundancy check something that uh allows it to check itself to see if there is uh uh an error in the transmission the next thing we'll talk about is protocol flow uh that's the timing the order directionality of the communication uh most the most common models are client server which means
that a client connects to a server and expects services or you have peer peer where both ends of the conversation depend on each other uh the modes of communication are connection oriented and connection less TCP as a connection oriented protocol has the following states listen connect accept receive and send uh but UDP is typically a connectionless protocol now there are protocols built on top of these and they have their own uh uh ability to uh track connection and and track State uh HTTP is actually connectionless even though it is built on top of uh http we have uh flow control which can be feedback where the the client says yeah I I didn't get that
and then there's rate based uh which um uh is used to determine which applies uh formulas to how the communication is coming through and how uh how uh how the the uh specific uh timing is happening to determine that there is congestion error correction you have automatic repeat request when uh when we we're having a TCP conversation the uh the client uh will acknowledge every single packet everything that does not get acknowledged gets retransmitted uh and then we have forward error cor correction that's where the server expects the client to correct its own errors session tracking could be stateful or stateless uh HTTP again it is a stateless protocol unless you add some other Technologies to it uh
typically um there is no State tracking um and uh lastly we have excuse me let me fix this
now yeah I'm just going to make sure that it doesn't sleep excuse
me all right back to our talk I hope that doesn't hurt us on time uh while researching for this talk I was able to go through and collect a list of the most common field types uh version was actually very common and for the protocol reverse engineer that's a very useful thing because once you have the version listed in your protocol header it really doesn't change right for example the uh the IP protocol uh when you're transmitting your your protocol header what comes first it's the version number usually ipv4 is has a four in front IPv6 has a six in front so that's a very useful way to be able to track uh you know uh what how your
protocol is delimited header length that's another very useful thing um the uh IP header length allows us to be able to determine how big the header is so as reverse Engineers we can go through and uh pre-compute header lengths and try to match it according to uh what we find in the header we have sequence values and these are unique values uh that track every single message time stamps record time values this is useful for encryption uh and also flow control you have timers such as the TTL on the IP header and uh the TTL in the DNS header which allows us to be able to track uh how long a message has been out there as
it's being as it's moving through the internet we have error correcting values crcs hashes Etc to be able to determine uh how uh if an error is present and then we have flag Fields such as the TCP uh header flag uh which allows us to share a wealth of information about the state or uh things such as uh the the hand shaking between Communications to be able to set up connection uh it's a lot of useful things that can happen through Flags then we have the standard data types strings integers bytes Etc still going on it's not sleeping anymore okay I'm just going to keep my finger on it all right uh moving on we
have reporting and encoding reporting uh is the way that we transmit how our protocol uh is is actually uh behaving uh HTML reports status codes uh 200 400 500 for Ser ever uh status indicators uh are present in uh protocols like pop 3 it actually uses a plus or minus we have bit field Flags like in TCP where uh we can uh use the flag combinations to determine what's going on in the communication and then we have error direct uh correct correcting schemes such as parody bit check some and we already mentioned those uh but those are used to be able to detect errors encoding uh that can happen uh if encoding is present it'll be asy uni
code epiic or few others out there but uh those are the most common and then we can have compression like on the HTTP protocol which can be uh gzip deflate or compress with uh that concludes the the the theory portion and we can talk about the tools and how to use them TCP dump is a great tool it's a set and forget tool I actually use it on my full package capture box uh it some of the uh most important things for the reverse engineer uh that you want to consider you want to be able to write to a file you want to be able to read from a file and you want to be able to display hex
toggle uh verbosity levels and be able to specify how many packets to capture and what the snap length is Berkeley packet filters they share a wealth of uh of information they share uh Berkeley packet filters are a great way to control how you capturing data uh there are operators that allow us to spec uh specify Expressions that we can use to find uh the the kind of packets that we like uh you have relational logical operators there's also a index reference which allows us to specify a protocol and allows us to specify an offset an offset and the number of bytes that we're looking for uh some useful examples uh so everybody knows that uh the TCP header
is 20 bytes long anything after the T the TCP header should be our payload so for the reverse engineer being able to specify that uh that we're looking for the end of the TCP header or this or the UDP header uh allows us to be able to go into the to the protocol of our choosing and look at things there uh being able to specify the protocol size by two of the IP header that specifies the the uh the actual uh size of of the packet uh in this case I'm just specifying random value of 576 uh the the next one there is uh TCP uh again looking at uh in the in the header looking at four bytes there
actually it's just a hex sequence that hex sequence is actually a postv value uh looking at HD HTP packets uh working with pcap files so these are very useful uh uh configurations for TCP dump to be able to collect information you want to be able to collect unique uh instances of your protocol I mentioned how you can look above uh by 20 of the TCP header uh in or looking above uh the the and you can specify bite patterns as I mentioned before with uh with uh looking the at the instance of post in the header we can also divide packets into m multiple files so that we can diff them we can we can chop them up
individually uh you do that by specifying u a file in uh in TCP dump to read and one to write and then you can also specify either a package count or an actual bike count you can manipulate uh peaps using xxd which is uh a a command line based uh hex editor you can specify an offset uh just like you could with uh the Berkeley packet filters you specify the offset you find the length of how many bites you want to go into and you of course you specify your PE app we can also use xsd to reconstruct the the actual uh hex representation of of the protocol print out an asky version of it we can use it
to specif to to print out a to re interpret the hex and then pass it to a tool such as uh checkm or B 64 to be able to figure out if uh if this is a check sum or if there is space 61 en coding or some other interesting uh uh manipulation going on next up is wire shark and pretty sure everybody's had a chance to look at wire shark at some point uh the capture filters are BPF based so everything that we talked about TCP dump uh it has display filters which are actually very useful if you're using t-shark um you can actually extract individual headers uh from your packets if you have a diss sector built for that
uh for that protocol we'll talk about the sectors next built-in analysis tools uh it allows you to export PD uh pdus meaning that you can you can take uh different um uh parts of the protocol such as maybe the the segment for uh that's uh TCP based or the whole packet you can export those using using uh wire shark it has a Lua interpreter which can be used for uh dissectors uh I don't personally like it I actually like Lua on uh on nmap for developing NSE so what I would like to share with you is the wiar generic dissector and that is a plugin for wire shark it's a dll or u a library file
that you can load that allows you to specify text based dissectors author is uh Oliver aene uh and you can find this tool at wsg free uh FR FR so some of the features allows you to specify all the things that you typically would in a diss sector floats strings others uh you can also Define bit Fields but not only can you do these things in a static way you can actually uh specify a bit of a of a script to manipulate the data it has Loops switches functions enumerations and arrays simp the limitation doesn't do text it can be a memory hog uh if you're have more than4 fields in your protocol though I
don't think will encounter that uh and then it can be limited to uh 20 protocols again something that I don't think we'll encounter very often this is how you put together a dissector the first thing you have is a file called the protocol uh wsg and this describes the actual diss sector starts with a name you can associate a specific fil that uh that you want to um have tied to this dissector so that every time this fil this uh this field comes up uh the field of your choice or something else that you've chosen the uh wi shark knows to apply this not diss sector you also have the actual the structure of the protocol uh
you can Define the your ID uh the size which is very important and specify uh switch that governs uh being able to piece apart the uh the different messages that we would be looking at then lastly there is a link here to the actual field definitions uh that is found in file Proto fesk the the field descriptions include our header uh structure include the different messages that we would be uh looking for and it also includes said like I mentioned that switch that the diss sector us order to be able to identify the different uh uh messages that we be looking for as it encounters them uh in this case we have makes things easy so that when we see our
message comes up instead of seeing that numeric identifier we actually see a name uh the most basic of headers has to have at at the very least a quar we talked about how uh protocols can be big endian little Indian mixed Indian we can have uh we need to have a message ID and we need to have a a an actual size associated with the header moving on we have uh where we specify our messages as we take apart our protocols we can have different types of messages um that may appear as as uh as we uh encounter new protocol new um new packets and lastly we have this main switch which can be tied to each of
these m mesages so that our dissector knows what to do with uh with the presence of that message and display it on the uh on the dissection let's talk about scapy um scap is a great Tool uh is a for interacting with the uh uh basic interactions that we can have with uh with our protocol calls uh include being able to list uh what's inside of our packets um uh that's a very important tenant that uh we also uh going over we can also set up basic protocol inter interactions this is very useful for when you have already exhausted all your options in terms of uh what you've captured you can interact with the protocol and get it to do
something that you didn't expect you can mess with uh with the different fields you can you can trigger an error you can um uh manipulate the data hopefully find a so it's a very useful tool for that uh uh below you can find out you can you can set
a specifying the the message above which includes an IP header a TCP header and then your your your uh application layer message in this case I just put hello world but you can specify a by stream you can uh spef any if you're trying interact with HTTP you would put a get request here Etc we can also uh Define custom protocols with capap and we can fuzz it you can uh Define a pro a common uh a custom protocol Header by creating a new class that accepts a packet as uh as a uh as an instance variable uh uh you can specify the name for that protocol and then you can describe your field similar
to how we described in the diss sector you can describe the specific uh field types uh by uh their uh their their types in this case we have uh a version field which simply tracks a version I me mention that comes up very often we have a length and we have a check sum and this is how you would interact with that custom protocol you've just defined you set up your IP header your TCP header or your UDP header you specify and you specify Arguments for that protocol the next thing we do is fuzz it so once we defined the custom protocol we can actually tell TTP dump tell to the vertical Header by um
message and then scapy to fuzz it would actually go through the fields that we've and apply uh different combinations of of uh of of characters sizes and uh uh be able to go through and and try to cause a crash but this is also use for being able to trigger new have a message type field that we may not have seen uh trigger new messages before but once scapy gets a hold of it it's able to produce new messages because it's now sending those uh those packets to the server and the server is reacting uh some other useful uh tiets for AP you can actually repack uh and you can use from the packets anybody did the uh the
Sans holiday hack challenge it if you remember with uh in that uh challenge Josh Wright kept mentioning that you could use scapy header this is a little bit copy from from his influence uh in that case we used uh uh scapy to be able to take apart uh a DNS uh manipulated DNS transaction uh you can do the same here to reverse engineer protocol tools that are
useful P of the header and send it out to to different tools uh these are tools that you can use in order to uh analyze that data that's coming out check some which computes a CRC value uh we have something that uh car character debt which allows us to identify uh character encodings C4 sometimes you have basic C4 encoding of uh of protocol Fields actually uh I I saw a video Once of someone who mentioned that we're double encoding uh uh double Bas 64 encoding of a particular field so it's useful tool end grap I didn't talk about Eng grap but it's actually uh a tool that's like TCP to tax based protocols that's I won't go into I I
actually won't go into net zob at all I'll just mention it uh net zob is a tool that allows you to set up uh uh uh semi-automatic reversing of protocols and it can do all the things that I talked about uh that we would be doing manually it is very good because it allows us to pick apart low hanging pruit and be able to uh uh go into the areas that might be more interesting uh it infers protocol structure and Flow it can simulate uh traffic to trigger client server responses and it is also extensible through the python nsop
Library let's talk about the reverse engineering process and uh based on all the information that we've talked about we have the protocol structure that and the protocol flows we have the tools that we can use we can actually combine all these things to do reverse engineering and the first place we want to start is making sure that we read everything that's available about that protocol that we want to reverse engineer um if it's an administrative protocol you can be sure that it's going to have uh it's going to be a reliable protocol it's probably going to have some encryption those are things that uh that are important to to note uh so that when we do actually reverse engineer we
can we can anticipate those things capture as much data as you can I mentioned having a full packet capture box uh you can just run one around the clock to see if uh if you capture unique instances make sure you have different instances of TCP dump collecting new and interesting things that you haven't seen before um identify the protocol structure we talked about how uh Protocols are structured in a particular way uh you should look for those uh those dead giveaways things like um being able to identify the delimiters being able to identify endianness being able to identify in coding things like that are um uh can be easily found and uh and and put together so that we can have a an
overall view of how our protocol Works uh when when working with text-based protocols just grab all the text you can and try to understand uh how it works protocol designers re re uh reimplement things that uh are common among other implementations uh so in the in the case of uh HTTP you have your uh value colon uh this is painful value colon uh value delimit pairs uh you may want to look for that if you have a text based protocol there should be some kind of delimiter that allows us to be able to identify uh that um what that header is uh and what that field is uh you should look for protocol flow and attempt to uh uh corrupt
payloads to see uh how the protocol reacts is it reliable uh how how does it actually uh behave when uh the clows down how does it behave when when there's an error present uh you can fuzz protocol headers as I mentioned with uh with scapy and you can also use tool mentioned which is uh Sully and then lastly uh you can create you should create uh a field diagram as you go along uh not only should you have a diss sector uh but you should create a field diagram that allows you to reference in the future when you come back to this and then you should also have a field diagram a flow diagram that shows how
the protol work as uh as you identify different behaviors uh Sully is a is a python based uh generation uh based fuzzer uh with that means is that you have to define a speci then Sul will then unique character manipulations that uh to those uh uh different um uh fields that youve defined uh it's Rich protocol uh grammar so you can work with uh binary based protocols as well as text based protocols it's uh it has crash detection so that once you're running your service you can actually uh detect if something's happened and uh and restart that service and then it has session management uh like show you how to actually go over how to go over a a suly
grammar
okay I guess I can't all right um all right so let's just talk about how to get started with uh protocol fuzzing uh protocol reverse engineering uh you want to build a lab you want to be able to place where you can capture data uh begin with uh protocols that you have that have uh well document well documented specifications you know uh uh use those uh and and try to analyze uh Communications for those protocols without the specification and try to recreate the specification uh manually focus on tax based protocols because they're a little bit simpler to get started with but then you can move on to Binary based protocols uh once you're familiar with the the process I
again again me move on to Binary uh look for new targets uh to reverse in your home or corporate network if you don't own the target Network device ask for permission and then uh aim at motor U malware uh protocol revers which is uh very hot very uh uh there's a lot of new uh instances of prots um with that said we can uh move on to
questions um so what do you yourself using most uh of those tool sets and what makes you like that tool set job wire shark is actually where you would spend a lot of your time because a lot of engineering is visual so what you want to do is be able to load up your wire shark uh and look for patterns and try to uh create uh uh dissectors as you go along as you're discovering new things places you point myself for somebody to that have like some examples to go through or something get started uh I mentioned that uh you there are resources on line that that cover uh um uh some details about how to do this
there are academic papers uh there are also I mentioned that you can uh use uh uh predefined protocols protocols with protocol specifications uh to kind of get your feet wet what do you mean no I mean uh you know you can you can look at protocols that areed with an RFC just look at look at those protocols um you have the protocol spe specification as a guide to help you as you are going along dissecting these things things like uh the RTP protocol which is which is uh U binary based um you can go through and look at the different fields and as you go along taking apart right
correct
or or decrypted uh so you can use um you can use a tool like you can use you to uh analyze behaviors and and uh track um how that communication is happening and then you can extend net zob and use uh its python library to specify uh the routine that
happen so yeah you can you can if if uh if it is if you can H if you have a uh if it's a if you're relying on a protocol specification that excuse me if you're relying on an encryption standard that there is a python function for or a Lua function for you can actually use that through the the the Lua dissector Library so you can create your your your dissector and then specify for that the specific Fields where the encryption happens you can actually specify how that decryption happens and take it apart that way you're welcome goad so what is the end R of protocol the same as applications so yeah it's the same thing I mean we're trying to
achieve code coverage uh the the big limitation though with custom protocols is that if you do not know what the protocol specification is you can't configure your fuzzer so by reverse engineering the protocol you can you can figure out what that specification might be and configure your fuzzer so that it can achieve better code coverage go ahead so were there any um previously Unown protocols that youed and figured out how they work and if so what was so I I did play with a few at home the the printer uh communication protocol that I mentioned um that's really just to uh analyze uh the the printer configuration protocol just allows updates and changes to to how the
printer behaves I was able to uh analyze different uh uh Communications going back and forth I was able code updates you can actually upload a binary things like that um so with text-based protocols that's a lot easier uh the challenging ones are usually uh binary based protocols what I am doing to spec uh to spec myself up more is going through like I said uh going through rfc's and and using that to as a as a guide to take apart uh those protocols and then be able to infer on uh other binary base protocols that I do not know so um be clear on that you're you're you're not looking at the r first and
then using that to come up with your thing try to figure out on your own kind of dou any other questions well thank you for coming