Fuzzing: What? Why? How? Genetics? - Roger Seagle

Name: Fuzzing: What? Why? How? Genetics? - Roger Seagle
Uploaded: 2018-06-13
Duration: 41 min 50 s
Description: Recorded at Bsides Asheville 2015 on Sunday, June 28th, at Mojo Coworking in Asheville, NC. Software testers, engineers, security researchers, and miscreants uncover software defects through a variety of methods, and fuzzing is arguably the most popular. Since the 1980's, this technique has found n

BSides Asheville41:5098 viewsPublished 2018-06Watch on YouTube ↗

About this talk

Recorded at Bsides Asheville 2015 on Sunday, June 28th, at Mojo Coworking in Asheville, NC. Software testers, engineers, security researchers, and miscreants uncover software defects through a variety of methods, and fuzzing is arguably the most popular. Since the 1980's, this technique has found numerous, severe vulnerabilities in operating systems, network protocols, file editors and viewers, office applications, and especially web browsers. Testing an application with randomly generated data has proven to be an efficient, effective strategy. However, as applications mature and companies adopt secure software development practices, fuzz testing must adapt and incorporate new strategies. This presentation introduces a feedback fuzzing strategy using genetic algorithms to guide negative test case generation. It will provide a quick introduction to file fuzzing and genetic algorithms then describe how to combine those concepts with static and dynamic program analysis to create a feedback fuzzer. Since employing dynamic program analysis can greatly increase runtime, the presentation will conclude by describing how to combat this issue with a distributed fuzzing architecture using a message queueing system (AMQP) and a NoSQL datastore (Mongodb). Roger Seagle Jr. is a technical leader in the Advanced Security Initiatives Group (ASIG) at Cisco where he assesses the security posture of Cisco products and advises product teams on patching and mitigating vulnerabilities. In this role, Roger audits embedded systems and web applications, configures and monitors internal production servers, and serves as a technical advisor. More recently, he has contributed to a big data analytic system for detecting malicious traffic and network compromises and prototyped a system for continuous security assessment of virtual assets. Roger holds a PhD and MS degree in Computer Science from the University of Tennessee, Knoxville in Computer Science as well as a BS in Computer Science from Wake Forest University. He currently resides in Knoxville, TN where he enjoys hiking in the Blue Ridge mountains with his wife, son, and hound dog.

Show transcript [en]

all right good morning everybody I'll let people file in real quick that are coming in all right quick poll before we start how many people know about fuzzing how many people who who have developed a fuzzer okay how many people know about genetic algorithms to in the back awesome all right three so feel free will dig in anywhere you want to dig into this talk this is actually part of a larger talk that goes about two hours so this is like the 30 minutes and Optus of everything so we're gonna hit the high points but topic today fuzzing what why how do fuzzing and actually how you mixed together genetic algorithms and fuzzing so first off Who am I this is

the first time I've been in the Asheville security community the picture is hilarious that's why I put it off there actually I was on a panel and I guess they hope that was a good picture to put up so it kind of looks like I'm making out with the camera it may have been so I am a principal engineer at Cisco Systems I've been at Cisco for about ten years which is kind of odd for me to believe that I've been there ten years now but I was a penetration tester in our advanced security initiatives group we were responsible for hacking all of Cisco products so to figure out where the owner was at before other

people on the outside do so that's about twenty percent of my time now the majority of my focus now is the security researcher in the advanced security research group we do outreach with universities so we we publish RFPs requests for proposals in multiple different areas that were interested in namely right now threat mitigation analytics privacy and advanced crypto and then we go through the funding process give them money and we will work alongside them so that's the majority of my time now and I do IT admin and monitoring for a small Enclave inside of Cisco so I don't do the larger Cisco but I do a SIG's Enclave because we have different security requirements because

we are hacking Cisco products so I do some monitoring there and I love beer some other things that I've been doing I was the lead organizer for b-sides Knoxville so if you guys haven't made it out to be sighs Knoxville we're gonna do it again next year we got about 200 people come out last last time around awesome event I'm also lead organizer for the Cisco's offensive summit where we bring all the penetration testers from Cisco to one location and they share their tools techniques and practices I mentioned this because we're moving to a model where it's going to be kind of like blue hats so if you're interested in that talk to me we're gonna try to bring some external

researchers in so that you guys can mingle and see we're doing dumb big data analytics stuff I've done continuous security assessment for virtualized assets some DevOps stuff as well so basically they just called me to do random stuff all the time yeah yes I do [Laughter] so we are hiring we are always hiring our office is about 60 in Knoxville we do internships yeah we do we do internships for college students and we also have full-time positions and our internships as discussed yesterday are hands-on we don't treat them like second-class citizens we sit them down next to a penetration tester on a product and they are expected to learn and find vulnerabilities and report them

out to be used so yep and we do CTF stuff we do the whole gamut so it's fun it's a great place to work so first off I shouldn't have done that yeah you got me off track fuzzy can anybody define fuzzing yeah no God we already cheated I'll give it to you so this is the academic approach to fuzzing everybody when they give anything about fuzzing they just pull off their favorite tool this open source they just start running and they don't really think about what's happening under the hood and there's actually this this is kind of categorization of fuzzers that you can have that tells you what the fuzzer can actually do and what

it's methodology methodology is under the hood so we're going to walk through all that but fuzzing in general systematically sending or deliberately malformed data that's the key there you're sending just garbage right either it's just random dev random or you're actually choosing specific values to send in and you're going to any of the programs input vectors whether that be file processing Network data so anything like that you're looking to identify implementation errors your typical buffer overflows which are becoming less common these days use after freeze double freeze all that good stuff so pretty simple here you got a program you got some anomalous input to try to set fire to the program you may have

some analysis that you're going to be doing either programmatically or you're gonna have to have the engineer actually go through all the logs and figure out what happened and figure out why it was vulnerable and Wyatt crash and hopefully that gives you a good program in the end right so that's the thousand foot view of what fuzzing actually is but it's kind of more complicated questions on the fuzzing good buzzards not gonna find all software defects right you can't you just just can't do it some some corporations think that I'm just gonna book by code nomicon I'm gonna run against my product and all of a sudden I have a secure product it doesn't work that way it's a combination

of fuzzers will get you closer to a secure program but you're gonna have to have methodology put into your software development lifecycle to make sure that your developers know how to code correctly and eliminate errors plus your testing cycles are going to have to account for trying to find vulnerabilities as well so it's basically it takes a village to birth a secure program right so another view of this you've got a fuzzy right a monkey just slamming on a typewriter sending anonymous data in there it's generating a bunch of test cases and you can see the gamut here volodymyr just filed some of it our network traffic some it's Bluetooth some of its command-line arguments to

programs and you've got a system under test and so a lot of people think that okay the big three here you've got Microsoft you've got Linux you've got Apple but I'm in the Cisco world right I've got embedded systems and I've got anything under the Sun so you're looking at our routers and switches you're looking at our cloud services right how they handle API is there command injection in there as there xs/s and also looking at things like IP phones right who could have their own real-time operating system or they could be using Linux under the shelf it just depends cisco just tends to use anything and everything uh-huh so and what you're trying to do is with

fuzzer is partition this big set into two different sets one which is the normal files that pass and are handled right even if they're anomalous the program actually catches the errors and exits normally or you've got this other set that crashes everything and you want to find all of those in the larger set so fuzzing you think about as a search space problem here's your search space and here's the set that you actually want to find right so this is setting us up for how to do genetic algorithms and fuzzing questions yeah that's a good will you talk some point about your view of fuzzing and a security context versus a program robustus yeah you know there's

there's some kind of overlap and stuff and not to get too pedantic about it yeah I always look at this more as a robustness thing yeah it can be a robustus thing but when we're getting ready to go through the different types of closers are actually I'll hit on that for you because there is a layer that you can put above everything where fuzzing just tries to find crashes and mountain malicious behavior anomalous behavior right of the program you can't put a layer on top of it to figure out was the crash actually something that was a security problem right and so I'll show you in that layer where that they were they can lock so

three different types of fuzzers their first type here is a mutation pleasure and this is the original one fuzzing was developed by a professor at the University of wisconsin-madison his name is Barton Miller right in the 80s and it was by accident like every good invention it's by accident he was remotely dialed into a computer and it was during a thunderstorm thunderstorm put some interference into his connection crash the programmer on the other side he's like oh this is cool can I make this happen more often so he then like a good professor turned that into an assignment for his students and made his students figure it out right from there it's been an ongoing assignment and he actually

publishes reports on the robustness of different applications whether it be on UNIX or Linux or Mac OS X so they do that periodically but in the literature about fuzzing nobody really goes through these different layers and what makes a good fuzzer so I'm gonna walk through them really quickly so we get to the genetics at the bottom attack heuristics what type of anomalous data are you actually sending to the program you can think of this as random or you can think about it as more targeted are you looking at the bounds of an integer right and for wraparound then on top of that you have to think about how you're actually executing the program under

text so I call that an executor right are you just going to spawn it how are you gonna deliver the test case to the program are you gonna do dynamic binary instrumentation to monitor execution or GEB anything like that from there you have to be able to actually determine if the program crashed so health monitoring how you going to do that are you going to send periodic pings to the program to see if it's still up are you just going to watch it to see if it exits normally the other thing after you go for health monitoring is you need to make sure that everything for reproducible so you have to do some sort of logging to figure out

what test case caused the crash because you're doing thousands upon thousands of test cases what if one crashes at number 100 and you don't stop until 50,000 and you don't have that record you can't figure out which one actually did it logging is there something that I added in when I was doing research in this area is a reporter so I like to have on-demand summarizations where I don't have to look through all the logs but I can just ping the fuzzer and say what's up what's your current status what's all the results and it can tell me that so I'd be that as a reporter and then when you're talking about security problems and how you differentiate

between just something that is just bad behavior the program or a crash that is actually exploitable this is where you can get to the classifier and that's an optional component that you'll see on some fuzzers built in others just won't have it at all but you there are programs like bang exploitable or crash flying Wrangler who will take the stack trace and then figure out did your pro Grahame actually control any of the registers which would then make it exploitable all right so that's how you can move from the fuzzer just being something that tests robustness to something that is security based right questions on mutations cool the next step up after a while people figure out

what's - people figure out the mutation fuzzers they don't have a lot of smarts in them right you're just sending a bunch of random data and programs are crashing but what if you have what if you have code that you have to get past certain checks to be able to get deeper into the program structure right you can think about packets that have tlvs in them if you take if you change the value field and you don't change the length you might never get down past the the length check to find out more parsing years right so Dave Attell in from immunity no I think is immunity in 2002 wrote a paper on the advantages of block

based modeling and the whole just if it is let's take a modeler and let's embed it into this whole stack and so what you're going to do is you're going to describe the data format and where to put the anomalous data and pinpoint in there and you can walk the data model and then determine you know this buzzer can actually determine where to put what attacker istic into what area so it adds the smarts in it again the classifier in this situation and all situations is optional so that's the improvement from the 80s to 2002 examples of this the big one that everybody knows is peach right peach gives you an XML API where you can

actually define data models questions okay the last one and the newest in this area is actually a feedback buzzer so this is adding in some type of search algorithm whether that be something like constraint solving or can execution or genetic arguments to be able to search the space for different inputs of one of the big buzzers in this area is Microsoft's age buzzer where they have a distributive buzzing farm across all of the engineers computers right when they're not using it it uses extra cycles and they do constraint solving on their programs to find different pass to go down to try to test so it's kind of a twist there again you can or cannot add a modeler in this situation

and the classifier is optional as well so and we're going to focus on this space when we mix together fuzzing and genetic algorithms questions on the categories of fuzzers cool all right so the big question we got this guy again and we have some genetics how are we gonna turn into this guy all right so what we're going to do is we're going to use a bunch of different tools to be able to add genetic arguments in the fuzzing process we're going to use some dynamic binary experimentation or you can actually do this through a scriptable debugger interface like LLVM or ll DB sorry and we're actually going to use some static analysis techniques with ida pro to actually help us derive

attributes to affect our evolution so simple genetic algorithm there was what three or four people who knew genetic algorithms so real quickly yeah real quickly you start out with a population so you can think about this as just candidate solutions to a problem and there's you know tens hundreds of these and you'll go through and in some way shape or form you will evaluate the fitness how well are they actually solving your problem all right then based on their fitness you'll actually go through and they will reproduce and create a new population so when we talk about reproduction we talk about selection how are you going to select the fittest candidate solutions out of your population there are

multiple ways to do that like random selection or roulette wheel where you think about just spinning a roulette wheel or you can weight the different elements in the population then once you have the parents actually selected to breed a new candidate solution you have to talk about two genetic operators here crossover and mutation how are you going to select where in the particular gene or chromosome sorry what gene you're going to cross over to create the new and I'm going to go to these more detail next slides you can see it from there you can talk about mutation how are you going to randomly alter parts of the new child that you're making you can think about this as inserting

your attack heuristics into a particular file or in your child and these two are governed by some probabilities that we may or may not talk about so genetic operators selection is very straightforward here I have a huge population I choose two parents who will then breed to create a new child for my new population so I've chosen X and y here then we talked about a little bit crass over so this chromosome has multiple genes in it and you can cross it over or you can choose multiple points in the chromosome to actually take genes from each one so we've randomly selected the middle and we take part of the first parent and then part

of the second parent and then that gives us kind of our intermediate child from there we go to our mutation scheme and this is do we randomly flip any bits in the new intermediate child and you can see with problem with a probability we've chosen this one this gene to actually change in this gene to actually change so after this whole process this gets me a new a member of a population and you do this to generate the number of members that the number of elements in the population that you started with right so totally replaces the new one and then you just keep going in a cycle so that's kind of the whole very quick

spiel about genetic evidence we didn't really talk all about the probability of crossover mutation but we can if you'd like but basically whole scheme you go through you get the new population it goes in it's the population again then you go through Fitness evaluation reproduction and then you just keep going either until you set a limit to say this is the number of generations that I want to make I want to go forward or you actually have a solution to your problem that you think is approximate to what you want questions on genetic algorithms it was really quick good all right yeah I actually have a question yeah so obviously most people probably learn this through like the game of life yep

and there's lots of fun ways to solve that when it comes to fuzzers I guess my question is how would that work especially if you're talking about random data versus random data that matches a pattern like it is an address it is an email except it's a little wonky in a way that might you know pass the validation but have yeah you know exploit payload at the end of the line or something like I'm not quite sure I follow how you would set up a parent/child I like a genetic algorithm to generate that sort of data great question and this is part of this research is how do you bring those two together so I like I love the question

the the research studied here is in file based fuzzing right so you can think about your population as a bunch of files we'll choose a target let's say we're going after Mac OS X's preview software right so I'm going to seed my initial population with a bunch of random PDFs that I find all over the Internet right I don't know if they're good or not I'm just going to start out with that from there I'm gonna run them against the program and I'm gonna figure out how well they do from what whatever measurement of well and we're gonna talk about that later on down through they'll get some scoring and then I will have them reproduce and so when I have them

reproduce will actually choose a point in the file to take them and cross them over and I'll take half of one file and half of another one slam it together right and then that gives me this intermediate child and then from the next one I choose do I want to put any type of special attack heuristic in here you don't want to insert random data somewhere in there or do I not right and you can actually be smarter than that and choose particular values that you care about you can also go above and beyond and have a model of what a PDF file actually looks like and then take this intermediate child match it up with the model make it parse

it out and then from the parsing you can actually choose a particular place in the file to put something right so you're trying to get farther different code paths that way that answer question cool part of this too is I was saying how lazy I could actually be and see if I can have the Jenek our gonna evolve new test cases and mean I had to worry about it right so a lot of the research I did was just insert anomalous data I was I know how this works let the computer do work for you so we've kind of hand-wave like I've got a fuzzer it'll do genetic stuff we'll measure it and say I will measure you

test case and say how well it did based on some measurement how does that all work right and this is the crux here how we're going to do this is we're going to go through a couple of different phases the first our program under test we're going to do some static analysis before we even start fuzzing to do what I call attribute extraction so we're going to take something that's on Linux Mac or Windows one of those programs shoot it to Ida Pro right because I'd broken off load and do all that work for us and it'll break it down it was so it's going from like a black box technique we're assuming no access to code whatsoever

right so that that's a key here as well it disassembles it puts it in some nice control flow graphs for me it can calculate some different attributes about each function for me and so I can run some I 2 pi thon scripts against Ida Pro to pull out attributes at the function level or at the program level at the function level I can look at like the number of local variables that were their allocated for that particular function the size of those local variables I can look at the number of arguments to that function particularly that that takes I can look at something called cyclomatic complexity which we'll talk about in a few and then we can

actually look at how complex the function actually is based on the number of assembly instructions that that function has okay so this one kind of gets the length of the function this gets are they jumping around a lot in the function making f6 and such so we've got attributes to better program now okay that gets us one step further how do we mix that with what's actually happening when I run a test case and to do that we're going to use dynamic analysis and we're going to for each negative test case that I run I'm going to record the hit tracing information so record where it executes what it's executing in the program which functions actually get

touched by that negative test case we'll use you can use this with a scriptable debugger like LL DB or you can do something with dynamic binder instrumentation one of the big ones there is Val grind to watch execution as it happens which you know over here there's your dynamic binding instrumentation here your debuggers that you can actually use gdb but okay I know what functions were executing the program purred negative test case I've gotten attributes but how do I bring those two together right to get a scoring for each negative test case well what we can do is we can define a variable set representing all those function attributes that we derived from the static analysis then we can use

those variables and create mathematical functions based on those two let us define Fitness functions for our genetic algorithm then for each negative test case we can do some fitness evaluation by looking at the dynamic and the static analysis and some are average those variables of all the functions that were actually executed good miss Krux we got it cool so examples of fitness function variables and these are just some that I just threw out there that you could do we talked about the number of locals and arguments we've talked about the size of locals you can also do like the function stack frame size cyclomatic complexity and so you can take these modifiers up here and put

them against those variables and so you can do like a a would be the average of the number of function arguments for all the functions that were executed right and then you can actually look at some of these other variables that this not necessarily will mean anything when you do a summer an average of them and you can do like how long did the program executes right as a measurement because maybe you wanted the program to take longer in processing your input because you're using dynamic binding instrumentation and it needs you know 60 seconds as opposed to 40 seconds to get to a particular area of the code right path uniqueness is the code path with

that test case taking unique over the whole set of negative test cases that you've ran and you can actually look at code coverage is this something new that I've executed I've never executed before so some examples here of mathematical equations that you can use B a times R so you're looking the average number of local variables and recording longer execution time you're looking at the average cyclomatic complexity with the GA you'd be eight divided by V and B a times B or just spins on the same notion of looking at the average number of local variables n either punishing or rewarding code coverage and just at a variable an illustration of some of the simple ones where I'm getting from B and

D right the size and number of locals a and C the size and number of the local arguments and the whole stack size for that actually friends and so it's key to note here depending on your avi some of these may or may not mean anything right so if you're going from an x86 to an x86 64 arguments wouldn't be passed on the stack so you would be able to calculate them correctly the size and number so that might not mean anything in a 64-bit program so these can change as your ABI actually changes questions cyclomatic complexity so how complex is the function that you're actually executing just think about it as a directed graph right you've got a bunch

of nodes which you can think about the if conditionals that get broken out to different jumps so the nodes in this point will be the basic assembly blocks there and the edges are going to be the control transfer instructions between those different assembly block facing blocks right you can take the formula that was developed by Thomas McCabe straight forward and there's a star for this because that's still up to debate right some people say that cyclomatic complexity greater than 10 meaning they're probably more than 10 transfers in that particular function tends to mean more defects because the coder is not factoring out their code into other functions and making it simpler and honing in on that particular

functionality it may or may not be the case right so you can play with this this measurement and see so there's a bunch of papers on whether that is or isn't true but something to try execution path uniqueness this variable R is based off of building transition matrices between all the basic blocks in a program and you record execution and then you do your basic counting of how many times to do go in this path how many times as you go down that path and then you can multiply out for each test case all the paths all the paths that it took and then get a score to say was this unique or not based on the

transition matrix okay Markov chains high-level good yeah

so the way you're doing this you're doing this on the executables yep have you considered or why aren't you using source code analysis clearly for some of the things you're talking about analysis it's not available yeah whatever thing for a lot of Cisco's products you gotta have access to that you know I'm doing it black box style because I want to crash Mac programs that's basically it all right there there is good research on doing this from a white box perspective and when you do that if you have access to the compilation tool chain then you can instrument the program right um a lot of people don't have access to that right so people who are doing vulnerability

discovery for common platforms like Mac or Windows they don't have source code so they need a way to be able to find vulnerabilities and this would be a way yeah but it's um getting back to our industry that that's fun and maybe if you're looking for bug bounties there's nothing that you know productive but if you're actually trying to develop a product or a system web app or something yeah you know it would seems you'd want to do that to get the most leverage you could yep yep you were you're exactly correct if you are in the situation where you do have source code this is not the methodology to take right that can actually be seen with new

fuzzers like AFL American fuzzy lot which actually if you look at what you need to do are the first release of it further releases went to a black box style but the first release of it actually altered the code as it was assembled right and it actually created trampolines to be able to do accounting for which basic blocks you executed in a program so very smart implementation but yeah if you have source code there are better ways to do it but if you're a hacker on the outside world in the outside of that company you got to go black box and you have to be a way to do it okay yeah any other questions

so overview now simple GA what will we do here wall of text see the initial population our chromosomes are files they're variable length that's important because I learned the hard way that they can crawl to infinity and take up all of your hard disk space uh-huh so when we take one of these test cases we evaluate it we run the application under test with the negative test case monitor execution record the function calls I used Val grind for this when I did the initial research and boy is that slow and we'll talk about why this is a lot faster but it's still slow so you can calculate the fitness based on the recording function calls and then we

just do our reproduction we breed our next generation we do selection crossover and mutation I did the non smart way of just inserting random data because I want to see how well genetic algorithms in general did with fuzzing all right I can always make it smarter with better attack heuristics to target different different failures of coding that people can do right like integer wrap arounds and all that stuff and then one key here with this implementation actually used elitism which means I just preserve the best solution whichever best is when I do my mathematical function to stay in the population to go further so we're gonna focus on runtime at the end this actually netted about 30

32 35 vulnerabilities in preview and this is kind of older the work that's been done so this was about four or five years ago when I did this and some of those were exploitable some of them weren't but we all there they should all be fixed and and one reason it's fun to go after preview is that Safari uses it for rendering that those libraries underneath and quick look if if the file is actually just on your desktop quick lick will automatically try to render it and then you can just crash finder over and over and over again so that one is fun as well so you can send a file or somebody they'll just download it

they'll put on their desktop they'll browse to it then all of a sudden they have stuff crashing it's fun so runtime analysis and this is looking at different population sizes and what's been happening so the bottom is looking at a mutation and generation buzzer and how long it takes for them to execute a test case it's somewhere between three and six seconds on average then you go up here and all of these are genetic algorithms with dynamic binary instrumentation there are variations on a theme different types of genetics as you can see it's horrible it's like 40 seconds of run right you're going to be waiting a lifetime to get results in this situation and also what's cool

about this is you can see this graph going all the way down and getting closer and closer to one single point that's actually the files getting merging towards one solution right so that's a problem that you got a factor in as well so at the end it's like the same file being run over and over again and that's the purpose of this CH CGA which we're not talking about today but it's a cool little algorithm if you look at them so we've got to combat the runtime problem and the simplest way to do that is distribute everything all right so we can take a look at an Vols law which basically means let's make the

common case fast and let's do an evaluation of the genetic algorithm to see what's happening here so if we look and we see that the genetic algorithms are taking 0.5% of the time for me to breed new samples okay what's really taking the longest is this chromosome Fitness evaluation and that's because I'm running everything with Val grind so imagine running preview with Val grind right and then waiting for a results so I need to speed up this portion of it right so how do I do that well I can do some math based on this function and I can say if I actually just add four more nodes to this cluster and I distribute this chromosome Fitness evaluation I can

actually pull my test case execution time down to seven point three six seconds right really fast and I can scale into whatever number I want right so I can build a cluster very easily here so how do we do this well we can do a what's called a master slave GA parallelization where I do fitness evaluation independent so on multiple computers they're gonna be running preview they're going to then report back their results to the master who will then once all the results are collected go ahead and bring a new population seed it back out to all the slaves so then run it again to pull the data back in and then it's just a

continuous cycle here right so we're using a synchronous model because I need to wait for Fitness evaluation of all the chromosomes before applying any of the genetic operators they're questions on distributed algorithm pretty straightforward there's a cool graphic of what it could look like so you've got your workers you've got your organizer or your master and so the things that I need to worry about this point are how do i distribute the files in some sort of data store how do I direct work basic queuing and then how do i distribute configurations because i need to tell endpoints how to actually run what program and what program needs to be tested so I deem that something called a

distributor and I'm actually looking at this from a SETI at home type of experiment can you just have people come up on a network and then start fuzzing right can they just pull down your configuration files and then all of a sudden start running all your test cases and then contribute to a massive buzzing cluster right with zero knowledge so we talked about organizers we talked about workers distributor in the proof of concept code that I did basically the fuzzer is configured through a Yama file that tells it how to execute the program and where the program should live and what operating system it is and a couple other niceties you can't are that thing up shoot over

through email to somebody or put it up on a website and let them pull it down and then also in the actual distributor in the configuration file it has addresses for the queuing system so how to hook up to the RabbitMQ system that I'm using and for a datastore I'm actually using MongoDB which actually has a cool feature called grid FS where I can just start storing files and MongoDB and somebody can just connect up to MongoDB and pull those test cases out with them so it's pretty cool results then there's where we were before here's where we are now so we're getting closer which is in equals four it's a dramatic speed-up questions I'm

good we're concluding questions on all this yeah and that's just with four right so we could scale it a lot and that's what valgrind if you do ll DB you get somewhere down in the 20s I think because I did just a real quick proof of that and so then you think about distributing it then with ll do UB you're gonna get way down all right but about five years ago technology wasn't to a point where you could do that reliably conclusions we didn't talk about vulnerabilities found but I've got about 32 defects in preview didn't really apply it anywhere else because there's just a proof of concept to just see if it would actually work

right minimal set of negative test cases so we only ran probably about 50,000 to a hundred thousand test cases through this thing and got and got that type of results so it kind of leads me to believe that genetic operators are naturally good fuzz testing operations especially looking at the crossover and mutation as we just talked about the parallelization awesome you can drastically improve execution time no matter how you're you're monitoring your execution and then it's easily possible for us to match peach and mangles per negative test case run time if we add more workers to the distributed architecture future work that I've never gotten to doing a custom debugger doing some Python bindings to

improve this hit tracing which I think I was hitting it L LD be there and then trampolines which is highlighted because that's what a FL uses right they actually hook all the functions to they know what get executed and they they do some scoring in a shared memory area so then I can pull it out later right so it's really quick introduced data modeling into this whole thing because I'm not using data modeling at all so that would actually help me target particular areas of the PDF which would help me improve the attack turistic s-- and there's some crazy ideas about comparative study of genetic operators and different algorithms out there because we did a simple GA I also

did a CC HC GA but there are multiple ones out there that you could apply this problem questions yeah warp up yeah this morning this morning slides it it kind of works you know so it's just generating reporting that goes back to developers to say that we found this and we didn't find this and what sort of reporting is going back this looks like you're aggregating the failures and you're kind of saying you know how many did you find and what kind but yeah is this specific enough to go back to a developer to tell them what to fix yeah so this is in general a good question about fuzzers right the reporting mechanisms for fuzzers for the most part

are not good consumption for developers yet to be able to summarize it appropriately when we report these back to apple we actually gave them the example files that caused the crash plus the stack traces plus our analysis on what went wrong right and where we think the errors are in the parsing code so that kind of gave them a general area to work on so when you're doing fuzz testing those are kind of the key elements that you need to get back to your development team namely the crashing test so that they can reproduce it and any report that you any hints to what you think is wrong in the code right as well as any stack traces you

may have so that they can run through and do it sample the data that was used to crash yeah yeah any idea from the output what the relative effectiveness would be in terms of finding finding problems using pure random changes versus the ones that are coming out of genetic algorithm at this point no so the the two algorithms that I ran mangle and Peach right mangle was the mutation fuzzer that just did randomness right peach was the data model right and I used a very basic data model and I ran them for the same number of test cases so that I could kind of factor in and see his genetics actually working and I came up pretty well right

now I attribute peach not finding a lot of errors I did fine I definitely found more errors with the genetic algorithm but i tribute peach not finding more errors based on the data model that I had right and so that's something you need to factor in when you're choosing a fuzzer if you go with something like an intelligent buzzer like peach where you have to do a data model there's a lot of time that you have to spend to specify the data format right and that may or may not ever be complete alright well and that's one of the benefits of the genetic algorithms is that you get to the point where you're not dependent

upon the same developers who probably wrote the code in the first place making the same mistakes describing the data as they get writing the code that ingests the data yeah their questions

[Applause]

Fuzzing: What? Why? How? Genetics? - Roger Seagle

Related talks