Beyond the Tip of the IceBerg - Fuzzing Binary Protocol for Deeper Code Coverage

Name: Beyond the Tip of the IceBerg - Fuzzing Binary Protocol for Deeper Code Coverage
Uploaded: 2016-08-27
Duration: 46 min 23 s
Description: Beyond the Tip of the IceBerg - Fuzzing Binary Protocol for Deeper Code Coverage - Mrityunjay Gautam, Alex Moneger Breaking Ground BSidesLV 2016 - Tuscany Hotel - Aug 02, 2016

BSides Las Vegas46:23368 viewsPublished 2016-08Watch on YouTube ↗

Mentioned in this talk

Tools used

American Fuzzy Lop netCallGraph netCOV Pin WinDbg Wireshark

About this talk

Beyond the Tip of the IceBerg - Fuzzing Binary Protocol for Deeper Code Coverage - Mrityunjay Gautam, Alex Moneger Breaking Ground BSidesLV 2016 - Tuscany Hotel - Aug 02, 2016

Show transcript [en]

well let's uh let's jump into the talk directly so uh who are we um I'm rtin he's Alex we work with the product security team in Citrix and uh we do not have you know interest in very high level stuff we are pretty much down to the ground at the grassroot level so we like breaking into Network protocols we like playing with systems we like playing with applied crypto that's pretty much all what we do right uh one critical disclaimer I would like to put any comments or U anything that I have to say or Alex has to say is all our personal comments nothing to do with CX so please don't blame them for that

and uh I sometimes wonder why do organizations do that because it also means that the research is not belonging to CX right that's kind of but anyway so today the agenda is going to be roughly like this we going to start with uh where the fuzzing technology and its state is right we'll move on to talking about uh some of the modern code coverage based fuzzers AFL being like the top of the list so we'll roughly touch AFL we'll talk about where what are the what are some of the issues we see in AFL and specifically when you try and apply that on the network fuzzing domain and uh how do we handle this problem in uh in this research so

one of the things we'll be talking about is the definition of of gate functions uh we'll come to that when we will talk about it how can a tracing be done at a runtime and that can feed directly to the fuzzer and we could use that for optimization so this creation of a feedback loop which we'll talk about we'll we will try and demo you guys a small PC with a toy example and then we'll try and move into a real world example uh that's I'll just just keep that as a surprise you know when that comes so fuzzing as we knew it right so this is where the whole world of fuzzing started so

um a lot of people when we started fuzzing you know uh I started fuzzing uh something like a decade back uh when I used to be with semantic and it seemed always like you know this was the easy thing to do all you have to do is to generate a bunch of random packets send it across to the demon hopefully it will crash hang something will happen magically and you will not believe it but that time it used to happen things have changed a lot so fuzzing used to be easy it's not that easy now specifically if you are targeting to do a more uh targeted attacks you know you want to explore certain code paths which is not

guaranteed to be covered by just a randomly generated string or just adding like 1024 A's that's not going to happen now so as in as in when we are trying to Target specific functions like these is uh we have seen there are a lot of modular programming people are doing generated code so there's already uh there's already a framework in place and I need to add a new functionality I would just go ahead and uh you know use that framework and generate some additional code and add there now if I want to do a targeted fuzzing for this thing it's kind of difficult for me to do that just by you know random package generation so another challenge which we

face is that uh during random packet generation of by using generation or mutation or whatever you end up having a lot of test cases lot of packets which just get dropped you know it really doesn't cross the basic threshold so doing effective fuzzing which could actually test the product or Target the vulnerabilities which we actually want to be Target it is not trivial it's it's uh well in the world of fuzzing I think one of the first things that started happening was uh when we when people started researching was look into file fuzzing and there is a lot of focus which actually went into it so AFL hungus from Google some of these are like pretty strong examples of

this area and I think AFL I think I'm personally very very impressed with now the ideas which these guys used was pretty good unfortunately using them on the network world is getting a little tricky we tried doing that were not very successful there was there were some hacks you could try not so I think I I'll let Alex talk about some of those hacks but uh it's It's tricky in the network fuzzing world we were still stuck with uh modeling of protocols you know so if I really want to write a very exhaustive good quality in-depth fuzzer first thing I have to do is I need to go through a documentation of that protocol if it's available if at

all it's available and the documentation would be like some 200 pages of PDF file right and at bunch of places they will refer to another document which is another 200 pages right frankly I am an engineer and I don't have the patients to read the documents okay it's a it's a weakness I have sorry I'm sure many of you share my weaknesses so it's really difficult for me to go through and analyze all these documents write the uh fuzzer accordingly and even if I do that you know I don't have the guarantee that I'm really covering what I'm expecting to cover am I really unting the vulnerabilities so modeling of the protocols this is still and the network

fuzzing in itself is quite slow because we tend to face some very practical problems like synchronization right how do we know that when we sent a packet to the server should we be expecting a packet back should we not be expecting a packet back not getting a packet doesn't mean the server has crashed you know there's a lot of uncertainty so inherently Network fing doesn't go as fast f as file fing another thing is that we have we always have a need for setting up an agent which sits on the server side which can keep detecting the crashes do some sort of a logging to identify what crashed where what happened it's so that is also another challenge we always Face

bottom line when it comes to a 600 page of documentation of the protocol we always end up doing a blind fuzing at best we will take stuff from The Wire shark we will take the packet we'll do some mutation and will send it across little less blind all right but not not very great so it's like with some some specs I don't know what is interesting to know to see is that Network stack still happens to be the target of choice there are still so many Network protocols out there so many ports open on so many places and we still want to break that right so what we are looking for is a little more

balance on the network side and not just on the file side so so uh just trying to sum up things so historically what we have seen is there have been uh usually two kind of approaches which have been most successful you either do random bite F bite flips with uh some sort of mutation like what peachers or you could do some modeling of the actual protocol again what you can use any of the Frameworks to do bottom line you end up running millions of packets and all you feel is that yes I have run like say uh 100,000 test cases and I could go and tell my VP that you know what I did this my my V

feels good about it that yes you have you have withstood 24 hours of fuzzing without crashing and yes the product is secure is it really well we'll see so I'll I'll let Alex take over from here and talk about some of the recent advances and what we are doing thanks MJ so yeah I'll be talking a bit uh about how we do fuzzing today um mostly can you guys hear me yeah I think just turned

sure just here so yeah um so I'll talk about how we do you know fuzzing today and the you know the improvements which have been made over you know what MJ was talking about uh you know the blind protocol fuzzing just flipping bits so you know today um you know there's been some Concepts introduced in fuzzing you know through genetic algorithms basically where you know the idea is you want to retain only the best input and you want to be able to measure uh how much impact and input has on your target right so today we're capable of knowing that when you send a particular input you're going to know the effect it has on your target

binary right you're going to know if it's valuable or not and you know through these uh genetic algorithms basically you're going to elect basically a bunch of of species of inputs which are you know the best portions uh of inputs that you have to play against your targets so you know the general idea is that you mutate your best set of inputs you send them to the Target and then you measure you know what's called Fitness based on some heuristic I'll talk about soon which basically gives you feedback is this input valuable yes no and basically you take a decision further based on this right and then you discard or prioritize the input so now we live

in a world for you know file format fuzzing and and you know to some extent Network simple Network fuzzing where you know basically how valuable your input is to the Target right which is fantastic basically because you're not blind fuzzing anymore so generally what what Fitness function can you use um you know the general used one is is code coverage right so why why because basically you know code coverage tells you exactly you know what are the extra paths that you've triggered based on your input right so basically it tells you how good it is or bad it is for for your Target how much code it has executed based on that input right and so most of the

tools out there are able to measure code coverage right and that heuristic allows you basically take a good or bad decision you know based on historical data so you know you can achieve this you know by doing uh by binary instrumentation through pin or Dynamo Aro uh we took the option of using pen for this you can do a bunch of static rewriting kernel proving or to some extent even the hardware can do it now today so how does this work so the the May the general idea is that you're going to model control flow uh using basic blocks right so if you guys have open ey Pro or either uh you know you're going to have a graph with a bunch of

blocks right so the idea is you do exactly the same thing at runtime right so you're going to disassemble and know and basically you know have all blocks of code which do not modify control flow uh segregated right so this tells you and then what you want to do is count the number of edges you have between those basic blocks right so if you see the orange arrow I put there that's a transition from one basic block to another meaning there was a change in control flow which means that you know in a programming language an if statement has been taken or something like this right and so the thing is when you retain The Edge count between basic

blocks uh it gives you a big set of unordered code coverage map right and the thing is with sets it's that they can easily be compared so you've got this gigantic set based on your input of what Cod coverage has been achieved and that's in a set which you can easily compare right so most of these Evolutions come from way back but all this was kind of industrialized through through AFL right so again AFL is amazing right uh an amazing Tool uh you know it's a battery included fuzer so it takes care of all the building all the you know instrumentation the minimization of the Corpus and all this kind of stuff so it's just brilliant right because it's

got this perfect balance between you know using the power of the build system you know through make or cmake or whatever you want uh you know speed you know through the fork server and all this stuff and through functionality the only caveat it has is that basically AFL by Design is meant to compare traces across runs right so it means that you run your target once twice you know until end times and the map comparison happens when the target exits right so the comparison happens across XX so this means that for Network demons it's a bit more complicated right and I'll talk about it a bit later also AFL has to get its data off STD in or a file

descriptor right which is directly passed into the target so again I I'll talk about the limitations we try to address but you know if if you understand what I was just talking about a second ago you know the requirement that your target has to exit can be complicated basically for Network demons so again if you've got source code you know again we're not trying to replace what AFL has done uh because it's still the best option out there right if you have source code just get it to work on Packers right you can do it it's a lot of work you basically have to write some code write some rappers right handle most of the State uh you

know make it make sure it exits after its main event Loop and all this kind of stuff it's not pretty but but it can work right the problem is if you've got very tight coupling between the code basically which handles Network packets and pausing you're going to have to stub out a whole buch of stuff right by stubbing out I mean that all the network calls you're going to have to mock in a sense right meaning that you have you're going to have to LD preload stuff which means redefine the way that recv for example Works read and write and accept and all this kind of stuff and this is you know prey does that if you guys have

have worked with it or you can use a bunch of linker uh of linker trips Bas tricks so all this basically to say that like you know for Network demons what we like is you know to keep the successful AFL Concepts or the Gen genetic algorithm Concepts you know as well as the code coverage feedback but avoid restarting the Target right because this would allow to get these Maps um at runtime right the thing is it breaks the deterministic nature of AFL so again uh you know we want to improve upon the traditional fuzzer you know so break the cycle of like I'm going to send a packet and I'm going to then probe to know if my target has

crashed or I'm going to ask my agent to know it's crashed which is quite slow and you know by borrowing all the advanced features uh from from feedback driven fuzzers again you want to do this during runtime and without the responding the target between inputs right so our approach uh we did a bit of work around this and we tried to at least start working on this problem um so it comes basically we we you know we observed and just uh thought about how you know Network demons work right so generally they're going to do a whole bunch of uh of startup stuff which you don't really care about right it's going to read a config file it's going to

isize a bunch of stuff and all these things right and then it's just going to wait uh for a con right so it's going to hang on an accept call or you know something different for UDP and then it's going to basically read uh read an input and from there it's going to get a buffer of bites is going to want to work on and pause it to make sense of the protocol and what's happening right and based on that pausing is going to take a decision you know write back something out to the socket uh you know an error or some some validation so in this context basically you know what code coverage you exactly care

about well you can kind of simplify this and discard everything all the initialization stuff just Chuck it out the door right it doesn't matter but the interesting stuff generally happens between the first read on the network and the right right so the whole idea here we're going to talk about is can you get those code coverage maps triggered during those specific CIS calls so to generalize this you can call you know you can call these read and write CIS calls uh you know Gates right when when you enter a gate ciso you'll start the tracing right and when you exit the gate you stop the trace so the idea is that you're going to monitor a

bunch of CIS SCS at runtime and when you hit one you start the trace when you exit one you stop the trace and the idea is you're going to dump that trace and give it back to whoever consumes it you know fuzer reverse engineering stuff or whatever it doesn't matter right and you transfer that code coverage back to the decision maker right which can take then an intelligent decision based on this code coverage data so again you can you can generalize this a bit further right you can so the idea again is all this is only about code coverage right we don't out all the fuzzing stuff because the the mutation can be done by anyone at any time and so

so based on the defined gate CIS you know say X or Y you can again when you hit X trigger code coverage when you when you hit Y stop it and then dump the trace so this can be achieved pretty much for any CIS out there which has a relation right so this thousand feet view of this is you want to only track uh file descriptors right uh because they're the ones who tell you when the data is valuable you want to ignore right all the io happening so you don't want to care you don't want to start tracing when something reads a file or when something's like that you want to generate the hit map at runtime only

when the gate CIS scer are hits right and again as I said dump it to the fuzer further so I'll take the example of TCP here and how you can filter file descriptors for TCP right uh so you know the accept ciso uh it basically returns a file descriptor right that is then going to be used further for read and writes Etc the cisal layer so if you hook into the CIS calls and just look for anything for the accept and instrument it and get the return value of acccept basically you can build a list of file descriptors you're interested in without polluting that list with stuff from IO the io layer right and so then you've got that list

of file descriptors which you know are from the network and which you probably going to be interested in and then you also instrument read and write and you figure out when read you know receives the argument of the file descriptor which is in your list then start the trace and when you hit right just ditch the trace right so here I've got a silly example right where file descriptor six is good for tracing and nine probably comes from the O you know the io layer and we just junk it right so another interesting point about this kind of gated uh cisal analysis is that your coverage maps a per read write gate right so if you've got a connection

which has you know many gates which is generally the case right you have a bunch of exchange and you're going to have a read and then a write back and then a ping pong kind of you know exchange for the protocol to happen well you can get the coverage map for each gate meaning that you can enter the protocol at different layers at different points in time and get the coverage MC for that specific packet but it also has if you remember what I said is that you know those code coverage maps they're sets basically right so you can also aggregate them if you want to have a macro view across multiple

Gates so this is the the 1,000 you know feet view of how the how the pin tool works so as I said you know it hooks a bunch of CIS skols right um basically all the networking cises so you know accept read write close receive from send from send to all the stuff send message um and so on accept add the file descriptor to some list of stuff you're interested in a white list of file descriptors and track it across the further cisal right and then you can see my little heat map there which is basically what I was talking about before which is the edge count per basic block right and you can see that on the final

right or the Final close basically that heat map is flushed out to something so for UD P um you can do basically exactly the same thing but track receive from track different CIS calls right this worked exactly the same and again I just want to say this that it's generalized you know you can generalize this to any possible sequence of CIS calls and you could come up you know as something you know a grammar basically to describe this and have runtime code coverage information based on some whatever runtime criteria you believe in so um so we wrote a simple a simple pin tool um called net COV uh so it's only job in the world is to do exactly what I

said uh it's basically to generate the code coverage map based on the runtime data right and all it does it is that it waits you know it does exactly what I said and it will write the output to to a pipe right so it will flush out the code coverage map to a pipe where it can be consumed by something else and so basically you know it's it's the reverse kind of of the the fuzzing talks right right where you know before people used to say instrumentation is up to you right all this stuff well here basically the fuzzing is up to you all you get is that when you send an input you know the

code coverage which happened um it's got a sidekick uh called net call graph uh basically which just generates a runtime call graph so on the same principle of this uh you know those gated CIS calls you can generate a runtime call graph of what's happening so it can give you some interesting insight for reversing all this kind of stuff and I've got a really simple dummy you know fuzzing example that I'll go through a bit later which which uh shows this so again you know the point of all this is is just uh to get people you know trying to think about Network fuzzing and get interest basically in it so you know it's a PC uh it works

relatively well but again it's got a bunch of of limitations right so it doesn't work uh for select polls um even though it could be adapt uh there's no crash DET detection but I mean that again is a solved problem in the pin world so it's it's wouldn't be very hard to achieve the other the more complicated one is there's no uh address sanitizer right to catch out of- bound reads or wrs so that's a bit more of a problem there's some work you know in the pin Community to get uh address and as like tools within within pin tools which could be adapted here and right now the the heat map or the hit map uh

format is basically text based it's completely not optimal at all but it's it kind of works what it works very well with is multi-threaded demons right because uh it will work across Forks uh it will works with P thread and all this stuff uh you know because you know file descriptors happily are shared between parent and child so all this stuff works for multi-threaded applications the interesting thing also is that heat map is is per file descriptor right so it allows a form of concurrent fuzzing meaning that you can track you can have multiple instances of those guys and just uh do selection based on the file descript that it happened and well you know by Design its

mutation independent since it doesn't doesn't do any and since it's a pin tool it's source code uh independent right you don't need to to build anything it just you just dump a binary in it and it just runs it and does some stuff and it's slow because it's pin so again the net C flow so you've got a client which is a fuzzer and you can see that the orange uh lines basically show you know the protocol exchange with a demon and then the red star is the coverage map returned by net COV back to your client so I'll do a super quick demo here so I wrote a super you know a silly silly uh demon basically which if you

can see the code uh it just looks up for magic you know characters inside a buffer right so it's a bunch of nested branches just to show my point that code coverage uh increases right when you send the right the right value at the right spot I'll try and put this here yes

so sorry oh yeah I would see it

right all right so here um I just started it on on you know the dummy program I was talking about

and here I'm just going to listen out on the pipe right and see what happens um so if you just Echo something back into it right um right if you just Echo something back in you'll see that here it spit out some stuff right so this is the code coverage uh information when you send this particular packet

right no I need that no I need it

all right so again if you just send the same thing here you can visually see basically that the code coverage doesn't change right the Ed count Edge count is constant right so uh if this was a fuzzer basically I'm a manual fuzzer here just doing some stuff you know if I if I add an a you know randomly by having lack of bite flipping stuff you know here I should take an extra Branch as I was saying and basically see the code coverage increases right um so the whole point of this is just to show that here you get feedback at runtime uh for this kind of stuff based on network connections so again if I if I put a b

I'm going to take a new Branch Etc and all this stuff right so here is just you know to give an example to visualize what's happening uh and to see the to see the code coverage map increasing one other interesting thing I want to show you is that um I added one parameter which basically is used as a loop boundary

um so here the last parameter three basically is used as a loop an upper Loop boundary right and what happens is that you're going to see that inside the coverage map you're going to see that that edge count increases so you can know also when you're covering when you're controlling the upper bound of a loop boundary right so if you look at this uh I know this is a a bit abstract but basically here you're going to see this number three which will probably change meaning that you're controlling a

loop so if I change this to 15 for example it should do more iterations on that edge

so again you can see here that that H count increased right so it's just to show that you can also have uh fine grain control and and view actually that that edge count increase and when you control the top of a loop boundary right let me go back to the

slides so I just wanted to show a quick example of um of the net call graph stuff I was talking about so this again is is something which was drawn at runtime so if you s if you look basically this is um a view of that dummy demon uh between a read and a write right so these are the operations it does so you can actually visualize this stuff and and dump it out if if you're interested in doing this so I wanted to show the process basically you know I showed you the manual uh you know manual fuzzing stuff so I wrote a very very simple fuzzer based on this where you know it's just

the Charlie Miller algorithm where you just basically bite flip random stuff and you want to see it increase in the code coverage right and start finding the correct inputs so I'll just show this very quickly um

so this is my very simple fuzer which basically which is going to get some feedback uh right so what's happening here is that uh the fuzzer is just trying a bunch of random mutations

uh right and it will take its time but eventually it should be able to bite flip the bite we're interested in and start finding code coverage entries if this takes too much time I'll just skip it but basically you should see this guy um suddenly when it finds the right input that will start uh basically finding that the hit count has changed and

increased all right so since we're running a bit out of time I'll just uh skip

this okay so all this to show that like we can have probably better tools uh for code coverage and you know for fuzzing Network protocols there's probably some Evolution we can work on here to get similar technologies that are used for file paing can be applied in the networking world and uh you know hopefully that that can help us uh find bugs quicker and mostly be more efficient at fuzzing this kind of stuff so now I'll I'll um I'll pass it over to MJ who'll talk about a real world example basically based on the on the RDP protocol and he'll quickly discuss you know how you know reverse engineering and the fuzzing portion of this you know tightly integrated and and

can work together thanks a lot thanks Alex all right guys um so uh referring to something I mentioned earlier if I Could Just Kill Kill the whole idea of reading my documentation to assess what the packet structure looks like and I could get a fuzzing ready information about the packet I think that's good enough for me to write a fuzzer so what uh what we were trying to do was to see that for the RDP protocol and RDP I think everybody knows about it right so for the rdb protocol uh is it possible for me to extract the packet structure using the feedback loop and come to a level where I may not know what each bite

represents but I should have a fair idea how to First that bite right so that's the kind of demo I'm going to try and do here hopefully hopefully this will work so all right so rdb is the uh is our regular Windows remote desktop protocol and uh that runs on 3389 uh it has lot of variants on the Linux world now there's a xrdp which you can find on the Unix environment and RDP clients are available practically everywhere so it's kind of a nice protocol and frankly uh you know some point I want to hit a CV on this one but let's wait on that one for a moment so uh this is what I did uh from

uh this this is a small uh PC around Net COV how it can be used so at a high level uh what Alex was telling was how the net COV Bindy tracing works on the server and it puts all the data in the pipe the the pipe name over here is uh temp net Cav and uh from there the binary Trace which is basically between the receive and the send system calls this is given to a fitness function like any genetic algorithm you will have some heris stics around it so the heris stics that right now is being used is just the count of the number of edges which is being covered so yeah it's not the

perfect one but then it just gives an idea of how many edges have we've been able to cover now that is a fitness function which kind of sends back the feedback to my client side so this this dotted line basically divides what's on the server and what's on the client on the client side I get this information based on that we modify the mutation strategy and the packets will be mutated accordingly so everything uh which you see here the rest of it is pretty obvious except the input is something which uh uh which is read from a wire shock Trace so just to make life simpler you can put a wi shark somewhere take RDP connection dump put it in this tool

and it will automatically generate the backet structure and give it back to you so if I have to use this whole tool a little differently you know maybe to do fuzzing to uh improve on some heuristics it's the green boxes which I need to play with you know a better Fitness function will typically give you a better result on something similarly based on that the mediation strategy will have to change right now all I want to do is to understand the structure of the packet so it's basically reverse engineering the protocol uh the packet structure if I want to do fuzzing the strategy has to change a little bit one of the biggest challenge that uh

usually you know I faced with this whole Automation and you know we were we were struggling with that a little bit was synchronization problem because you know uh you send some packet you don't know what what packets to receive sometimes it just goes out of sync packet drops all kind of things so I'm not going to go in details on how to solve that that's more trival engineering so let me just quickly do a small demo and let's try and see what we are looking at for

all right so I have set up a small shell script which

basically so all that this guy does is that it uh kills off any RDP which is running and then just attaches the the net the net COV uh client that we were talking about here this tool it attaches this thing to uh our xrdp binary and there is a flag to it with a minus M here which basically marks out which is the module you want to trace for fuzzing so usually in the real world there going to be like uh you know 10 20 modules which are dynamically linked and if you start tracking each one of them you could actually end up with a lot of graphs which you really don't want to analyze

that may not even be the code which you're looking into right so you can actually choose which is the binary you want to uh look into the trace for over here it is lib xrdp which is the one I'm looking at and that's what's it's going to

do all right so the attach has been done that's good now all right the server program basically over here collects the data from uh the from this pipe where the output will be written from the trace and it's going to analyze with the fitness function and this is the guy who's going send the trace back so that is fine and the final part of it is our so this is the analyzer so what we have done is that this is the pcap file which it takes as input and typically you know the the pkf file can be taken anywhere uh between any client and server and you might want to Target something else so a small thing I add

was to just have to mark which is the IP address which is acting as a server in the pkf file and what's the Target right so they can potentially be two different IPS so what this basically does is the small uh thing which we are doing here is that for each bite so this is the first packet which you are seeing here and if you see uh the bite which is being flipped right now just serly goes from uh one bite to another and what it does is that for each bite it takes the value as 0x01 and in the next iteration it takes 0x FF so what we want to do is to enable all

the bits or disable the bits and uh see if that changes the control flow somewhere what also if you see a little bit here is that at Offset you know over here the control flow changed we were able to go deeper into the code and so after all so uh so let's say if you have a 30 by payload what you talking about is 60 iterations of that packet so two iterations per bite and we get an idea of what it looks like and then it's all about a little bit of massaging but the final result that it it so the packet structure that it looks like is something like this what does this really mean let's just try and

look at that for a

moment so coming back to the so yeah so when I send this base packet this is roughly what our Baseline looks like and uh this is something I forgot to show you guys so if you see in the in the trace here you know for each packet this is the trace which is coming in so if we go right at the top we will as you let me just show you that one Trace so if you see here the first bite bite zero is a control by a control B basically implies that this is something which is changing the control flow somewhere and we are probably expecting a different code coverage than what was there earlier so and that's pretty

obvious based on the coverage length here this is the coverage length where usually the next bite which is a data bite so you can just see the size of it right how different is this just basic just you know visual inspection can tell you that there's a code code flow difference so coming back to the slides yeah so at a high level this is something uh you know we using the same text based format on identifying the code coverage the results which uh actually we got is something like this so what I wanted to do was to take a look at the xrdp protocol specification I didn't go through the whole 200 Page document but yeah a few pages is okay

right so uh what's interesting is that if I have to understand if I'm getting the results properly or not I wanted to verify that with the first you know first five six for the first six to seven bytes that should give me a fair idea whether we are going in the right direction right the rest of it is Data so the uh this is the X the RDP specification so what I'm primarily interested in is in the T packet heror which is a 4 byte thing and then there's a x24 c RQ which is 7 by after which there's a lot of variable field so that all goes in data I'm not too worried about that but primarily it's the first

11 bytes which I want to look at so so let's take a look at the first four bytes for a moment so this is the tacket uh uh heror the first octet which is the first bite it basically talks about the version number and uh the protocol is different based on this binary value which makes sense because our first bite did actually turn out as a control bite and it was actually changing the direction of the flow based on what this value was right logically makes sense the second octet is basically a reserved bite nobody really uses it today so it kind of just goes off as data it doesn't change the control flow which is exactly what we found the next

uh two bytes turns out as the packet length it's interesting because what we are doing right now is just a simple mutation of the packet and therefore the length of the packet really doesn't change and also it's interesting to see that this these two bytes is turned out as a magic here so when I say something is a magic bite it just implies that if you flip this bite the packet will be dropped right so basically if I have to make a rough assessment of what I have learned till here is that there's a very strict verification of these two bytes and they verify whether the packet length is exactly matching this value or not right something that I could learn

just from this much let's move ahead the next set the next set is uh the first bite which is the bite number uh four here actually bite five offset four that's the length indicator field that's another one by length thing but interestingly uh this is the length for this header only and it could potentially change because there's a uh there are a lot of data after that so this still acts as a data it doesn't change the control flow anywhere the next uh bite two is basically broken into two uh you know so the bite is broken into four bits each it it has two different control structures in it so that specific bite is still control bite and the rest of

the thing is set to zero or it is referenced in something but eventually that is not something which is changing the control flow well I feel good about it after doing this analysis so because now at this stage I know that from the first packet the mutation of the first bite is going to lead to a change of control flow three and four are going to be a length field which should not be played with unless you know you actually going to change the length and it is also sure that they are verifying this length Now by five is something which is also length but they are not really you know enforcing it somehow so this is a

place which could actually potentially lead to some kind of overread or under read or something I would like to play with this one frankly and by 6 is another control flow and by 7 to 38 is all data what this implies for me is that now I don't have to fuzz this in a linear way where I could fuzz one by at a time but I could differentiate all the control bites together and all the data bites together and this is basically the product of the number of use cases which I want to first so for each control bite mutation I could choose all the mutations of the data bite and I could potentially reach

to a different location make sense so with this kind of information who in the room cannot write a fuzzer right so I'm not going to do that so just for a conclusion let's take a look uh there's a lot to do in the network fuzzing world and what we have just talked about is just a glimpse of what can potentially be achieved by this technique this is just to invite the community to start playing with this and uh yeah that's that's pretty much it thank you I'm open for questions few questions does anybody have any questions for our speakers if you do come on up and get the mic not having questions is never a good

sign so I was really in a bad accent today all right thank you gentlemen thank you thanks a lot

Beyond the Tip of the IceBerg - Fuzzing Binary Protocol for Deeper Code Coverage

Related talks