Chris Craig - Educated Guesses with Symbolic Execution

Name: Chris Craig - Educated Guesses with Symbolic Execution
Uploaded: 2021-05-02
Duration: 52 min 25 s
Description: Educated Guesses with Symbolic Execution: Using Constraint Solvers for Speculative Execution in Modern Firmware by Chris Craig Recorded on April 28th, 2021 for the 7th annual BSides Knoxville conference. Symbolic Execution is a useful way to perform static and dynamic analysis but is underutilized

BSides Knoxville52:25128 viewsPublished 2021-05Watch on YouTube ↗

Mentioned in this talk

Tools used

PANDA QEMU Z3

Frameworks

angr Unicorn

About this talk

Educated Guesses with Symbolic Execution: Using Constraint Solvers for Speculative Execution in Modern Firmware by Chris Craig Recorded on April 28th, 2021 for the 7th annual BSides Knoxville conference. Symbolic Execution is a useful way to perform static and dynamic analysis but is underutilized by security professionals. I hope to expose more people to this awesome tool and demonstrate its utility in finding bugs and "rehosting" firmware. Christopher Craig is a Cyber Security Software Engineer in the Vulnerability Research Group at Oak Ridge National Laboratory. He received a B.S. and M.S. in Computer Science from the University of Tennessee in 2011 and 2018 respectively and has 9 years of experience in the field of Cyber Security. At the start of his career, Christopher worked as a penetration tester and security analyst for Cisco Systems. In that time, he has investigated VoIP systems, cloud technologies, and other Cisco products to work with development teams and remedy security weaknesses. Now in his fifth year at ORNL, his work has focused on developing advanced network security for the nation's power grids and other critical infrastructure.

Show transcript [en]

all right well thank you all for joining me uh my name is christopher craig and i'm from the vulnerability uh science research group here at oak ridge national labs that's ornl because we love acronyms and the u.s government and um i'm here to give a talk on symbolic execution or simx or s e or any of the variety of permutations that's used to talk about it and the name of the talk is actually a um a bit of a kind of inside joke so folks within oak ridge are typically from a career science background career research background and so when you give them an opportunity to talk they typically have extremely overblown uh hyperbolic titles which uh i gave the same talk to

internally and that's kind of the the riff using constraint solders for vulnerability discovery software like all this other stuff and so you typically see that kind of overblown in a lot of research papers or ieee papers so it's kind of a it's a dumb it's a dumb in joke that's not landing well but regardless uh i'll begin uh also i'll start chirping on discord after this so yeah and i've been kind of tripping on discord the entire day so if you don't be you'll see me around so don't worry uh so here's a basic overview i'll be talking about cemex specifically cle and anger two of the more i'd say popular uh symbolic execution frameworks uh

after that i'll begin discussing what i've kind of been doing with it which is uh your hosting problem and i'll be discussing the nature of the rehosting problem and one of the most predominant tools used to solve it which is panda an extension of qemu followed by my novel work in constraint graphs and constructing constraint graphs and then kind of an allusion to future work and kind of where things could go from here so here's the bio slide who am i who cares no one you don't care you don't care at all who i am it doesn't matter i am the machine that delivers knowledge the biggest point of this premise is that or the whole point of this talk is

that this is a new technology or at least new framework that's not being used actively for um for uh source code analysis or for software testing and it really should be because it solves unique problems and i'm just here i'm the person that's here to deliver that to you so that you can ingest it and then go to cli anger or any of the other frameworks and begin doing the good work that's all i'm trying to do if you are still curious and you think this person who put a bio slide saying well this isn't a bio slide uh this is my contact information craigsliano bridge orion.gov or if you want a very informal i know we just had

a talk on personal marketing and i checked my twitter page and it is not very good for personal marketing an informal uh contact list you can just hit me up on twitter because i had a blog and deleted it because managing a blog is a lot of work so let's talk about symbolic execution all right let's talk about it uh so what is sc uh it uses the constraints of a specific variable to generate test input full stop that's what it does um but why should you care why should you even bother with it well it gives you unit tests for free and i'll go through an example of that specifically with cli on how it can be used to generate novel

and unique unit tests right so how does it work what is what are you talking about well it's much like an algebraic equation and i have a very very simple example kind of following this um where it goes through kind of the natural intuition between using a solver in order to solve for unique code paths in code or unique code paths it's pioneered by z3 meta smt and stp which stands for simple theorem prover these are all constraint solvers or theorem provers that lie within the heart of these symbolic execution frameworks you don't really need to go too deep into them in order to use them proficiently as a security researcher or as a software

tester right like if you're writing source code or you're the head of a group that is working on um resilience within your source code test or then your source code repository this can be something that's easily thrown into your ci in order to or your continuous integration framework sorry i have to explain all acronyms as i say them uh continuous integration framework in order to make your code more robust this is something that could be easily utilized for that it's also very good as a bug finder uh and i'll have an example a few examples of those uh because that's the that's kind of the whole point right so also a good note this comes from a

solid pedigree symbolic execution it was used in the darpa 2016 cyber challenge in tandem with fuzzing and you're going to hear me go back and forth between fuzzing and symbolic execution or sc a lot and it's because there are two sides of a very very similar coin so a basic basic basis example right you have x over 3 plus 23 equals 38 right well we know how to solve this if you have taken algebra what you do is you uh well you want to solve for x you want to solve for x well how do you do so well you subtract 38 by 23 and then you multiply that result by three and then eventually you get 45. now what

that now how did you know how to do that well i'll take an algebra i know but what algebra essentially uh like the path that you take in order to solve this it all has to do with the constraints that you have placed upon x right x is divided by three and then that result is has an as an additional you add 23 integers or 20 or an integer of 23 to that resulting value and that gets you 38. those are set and there's a single integer that can solve that problem right so like okay well you kind of reverse it you go well if i subtract 23 from 38 and then i multiply by 3 that'll cancel

out the division and then eventually i get my number that sort of intuition is the same sort of logic that goes into constraint solvers in order for them to uh go through their hard work and figure out what the variable at that particular point in time can be right if we look at another example this concretely in software right we have this get signed function which takes in an integer right it's a signed integer it can either be zero greater than zero or you know else less than zero well you intuitively know that those are the three unique code paths that that variable could possibly take and it takes you to three unique code paths that same intuition isn't lost

when you start diving into source code and if you wanted to find a unique input that would get you let's say to that else branch like the the last else branch but there were a lot of constraints placed upon let's say integer is not an integer but it might be the result of a deserialized object and you know things like that and things of that nature you might need a little bit more work to see if this is actually reachable and reachability is a key part in developing a robust and and uh reliable exploit so all right so the more student of you will recognize that this interaction is very akin to fuzzing right and i mentioned that

there's an analogy between that and fuzzing um at least through the lens of like software testing so what fuzzing does is it generates a unique input or generates a multiple um generates input for a specific piece of software based on a corpus of information based off of some sort of knowledge or mutation or input vector or seed a priori after that it generates inputs and what not in order to explore code paths within your software this can either be done statically a la lib fuzz or it could be done dynamically looking at afl and some other stuff in order to just you know look at an input and then just you know hammer it with a bunch of

permutations symbolic execution works in a different light remember that previous example that we use with algebra it actually generates the input for a variable for a variable based on the constraints placed on that variable and you might think that's very good like lib fuzz does that statically and maybe i could do this statically but i don't know how i can use this dynamically well it turns out you can through some very creative workarounds and there's a reason i call them frameworks as opposed to just an sc tool right you can actually use symbolic execution dynamically and i'll go through an example of that in a minute but that's kind of the uni that's this that's the separate side of the same

coin right fuzzing you mutate input in order to explore code symbolic execution you look at the code and you target a specific variable to get to a specific place if you want to think about it that way so yay source code everyone loves a slide full of code right it's so easy to parse so easy to make through no but there's no need to there's no need to get super antiquated with it this is a lovely contrived example of a simple example where fuzzing would be something that you would think oh i'm just going to go ahead and fuzz this to figure this out where actually symbolic execution can be a much more targeted way to get to that specific

variable for example we have our checkroot password hash right that's our library function it does what's advertised it takes in a user id and then a char star pointing to the password it checks the for specific admin id it short circuits if it's specifically a value in this case 31337 and then return 0 right then we have our authenticate function which calls check root password if the return value is error right not zero but error some other predefined constant uh then we go access denied if not it's going to go through the work of finding the user info in this case if you were to fuzz this right you'd have to permute the user id

field possibly the password field there's a lot of things that you have to check and interrogate um if you were using something smarter let's say lib fuzzer it might it might figure out that it's looking for a specific constant but if it's programmatically derived from some other things that would be really difficult to try to fuzz right you're just guessing in the dark for some for some integer whereas constraint solvers and sc would specifically look at the user id field figure out that its constraint to that path is 31337 and set it to that variable if you wanted to get to the um the return id zero the return value of zero if you specifically target that code

branch so that's that's an example of how you can use symbolic executioners like uh just kind of conceptually as opposed to fuzzing how you select that particular tool for that particular problem all right so say we want to look at this dynamically like how oh nevermind i already moved through that all right so we've looked at how we would conceptually interrogate a piece of um a piece of software with a static um or with a symbolic execution framework so let's look at it statically uh clay is a symbolic execution framework it uses source code so it needs source code a la lib fuzzer right not all for not all fuzzers need source code but good ones do and it uses llvm

which stands for low level virtual machine you can think of it as an ir without getting too into compiler nonsense it basically takes the software it takes the code that you've written or that you're observing puts it into an abstract syntax tree puts into a big old tree structure so that you can programmatically navigate it that's hi high high high level but that's essentially how llvm kind of operates and there's a web demo of klee if you wanted to go ahead and just ignore this talk and start playing but in the meantime let's look at let's look at a more concrete example so we have that get signed function we've been poking out all this time

and so let's say we wanted to use cli to generate those unit tests as i was asking as i was talking about earlier well in order to do so in cli all you have to do is say well i have an integer of a i want a to be a symbolic variable i need to you know cap it at a certain size so that way i know that my program is going to be compiled with a certain thing and in order to use this i believe you have to use a specific version of clang which i think is the more popular llvm entry level uh way of uh getting at this and cli and then you can add the function cli

make symbolic so it turns a into a symbolic variable meaning it can be anything right it doesn't have to be specifically a set integer it could kind of be whatever it feels it needs to be and then you execute uh then you compile and execute this uh the software so when you do so you get um let's see you get this cli get signed.bc this output file is the result of claim so when we run cli it tells us that the number of total completed code paths and generated tests right you see all these generated tests um these generated tests you can actually execute and interrogate individually using a thing called uh the cleat test

tool right so cli test tool will sit there and go all right so you want to just generate a specific unit test you want to do you want to explore this specific code path using this specific variable well i attached here and these are the inputs that i use these are the actual bytes that i used to generate to explore the specific code path and there you go unit test complete you've written you've took no time at all and then what was that two lines of c in two lines of c you've written three unique unit tests three novel unit tests that explore all the code and have guaranteed 100 code completion you know qed but as

always there are more there's more nuance to this and and aside from maybe a very contrived example let's see how this could be really really you know applied so let's just say that we have a a parser for a regular expression right where you were supposed to be um attaching this library to our code someone said that there's a bug in it you're like well i don't know where the bug is well all right aside from analyzing it concretely i'm supposed to write unit tests for it geez this is a lot of stuff for this little fuzzer so let's just go ahead and use cleve for it now this is some library code we'll be evaluating it so there's no real need

to get deeply antiquated to it match star matches a star of a regex so it matches any character but null match here compares the current character pointer to the defined regex so compare it to that specific regex value and then batch is the high level function that calls each of the sub functions so we don't have to get super super super involved in each of these particular functions right so let's say you wanted to fuzz it to figure that out well there's a large number of input possible strings that you could put to fuzz this even if you define the size 7 7 verses all possible ascii characters sure that's feasible but you're essentially just

vomiting information into your software you can't make any guarantees about it it's really really difficult to do and if you get a crash well you need to record that crash you need to make sure you observe it but what if our goal isn't to get uh isn't to test all of the input right let's say we're dealing with some uh technically i think this is polynomial solvable the size of seven but let's just say we're dealing with um you know we're just trying to get complete code coverage we don't necessarily care that we've um used all novel inputs at certain points we're going to get overlap right pigeonhole principle we'll start hitting the same code paths

over and over again even with the even with two different inputs so let's just go ahead and oh i think my headset went off uh i'll keep talking let's just go ahead and um uh oh no okay i got it yeah battery come on baby come back delay all right so let's just say that our um our goal is complete code coverage right well how would we accomplish that well we'll do the same thing we did earlier we'll make a specific variable symbolic right in this case we'll use a symbolic variable and then we'll try navigating the code with that symbolic variable having it you know basically be like plato within a key right or within a keyhole you just

push the plato in it takes the shape of the particular object you're looking for and then we'll we'll use that to kind of find our way around right so in this case we make r e our symbolic variable we'll follow any con we'll follow any constraints place on that variable and so what are the possible code paths that we'll hit right well in this case there should be three code paths in the in the match function right we either match here right or we either hit this if condition or we don't and then if we are in the do while loop we either hit this if condition or we'll fail to do while and just jump right out

right because if we hit this do while we're going to be stuck in this loop until we eventually fail out anyway so we have three up front and then we have one two three four five in the lower function so we have a total of eight right eight kind of unique code paths that we're kind of navigating and hopefully if we just hit all eight code paths we're kind of done right we'll have re mutate and do all this other stuff and figure out all the constraints to just target those specific code paths and then we'll be finished it'll run but we have a problem cli does not like this why doesn't clean like this what

happened well what we have is and it's pointing out and it's this is actually very good if you wanted to use cli as a kind of bug finder and bug hunter because what you have is a a run time error you don't have a compile time error it's a runtime error well why not well what happened on regex cp line 24 and 26. if we go to the right hand side we look at the match here function it's it's using um you know us what's it a direct reference that's right it's saying re of zero well i don't see the problem with that we initialized this with a size of seven earlier right even if it's junk it

should be full of something even if it's nothing right so what's the problem well the problem here is that we actually have um and you can consider this either a failure in the test framework or not if you feed these character strings and if you try to uh analyze this directly all of your uh all of your re input would be null terminated because it would be a c character string the fuzzer doesn't care right the fuzzer doesn't care at all it will just start hammering through all this code and so you could find this code dynamically if you just completely randomized all the inputs which is what this did and if you don't have null terminated

strings it turns out these two direct references on lines 24 and 26 it'll hit you know they'll they'll uh you'll get out of bounds errors and then you'll abort so let's say we want to do a more thorough fuzzing we want to do a more thorough evaluation we can actually add additional constraints to our variables right so is this a failure in our software right yes maybe or is it a failure in our test handler say that there was say if this code wasn't reachable right or it was automatically null terminated by a um an sn printout not sn oh yeah s in printf or stir end copy or anything that guarantees a null character at the end

of the buffer right the character buffer well then you're absolutely going to hit you're not going to hit a null terminal terminator here your test handler doesn't accurately represent the environment so we need to fix our test handler we can just throw in our own constraints we can do whatever we want so to fix that we throw in our own valid constraints right so in this case we make our eek symbolic as we did before this time we set the size -1 to be null concretely or you can actually use this cli assume function which implicitly drives in a constraint that you define yourself and so with this you are now in full control of the um of the fuzzer that you're

generating another fuzzy the symbolic execution framework that you're using in order to interrogate the code right you're in full control and the output for this is wild it runs for a long time but you do get complete you do get a code completion with this because it does interrogate all the stuff and figures it out and it's only eight um unicode paths so it's pretty it's pretty quick with it all right so now that we've um so now that we've figured out a static way of using symbolic execution let's look for more creative application maybe we can use it dynamically well in that case you're going to need something like anger anger is a python

framework for analyzing binaries combines with static and dynamic execution which they call concolic execution i will not this is a more religious debate but between symbolic and concolic execution but it makes it applicable for a variety of tasks it's geared for code exploration and dynamic code exploration right and i'll go through an instance of this and how it goes through it but it accomplishes this through cpu emulation i think specifically unicorn is what it uses under the hood and it is pretty pretty cool it allows you it uses unicorn basically to get at and identify variables that it wants to make symbolic even without knowing how the source code is generated or constructed and it uses the cpu emulators to just

kind of immediately become a man in the middle it's very good and it uses python so it's pretty simple to program um so let's let's to make thing to make analysis a little easier right let's look at our getsine function but we'll add some print statements after each to make it a little easier to work with and in this case we will not be looking at the source code we'll explicitly be working with the compiled binary so we won't know any of the source information prior to this so right we'll add these print statements and then we'll go forward so all right well let's just go ahead and write some anchor code in order to

explore the code right so in this case we want to programmatically explore it so we'll define and import the binary as a project as a project with an anger we'll set up the symbolic variable defining the size because it's very important that you set the correct size for a symbolic variable you want to mutate stuff but you also want to you want to constrain it there's a variety of issues you can get into without it it sets it up as a bvs which i think is a bit vector symbol um and uh we're using clarity which is its uh kind of api between z3 and python in order to set up the symbolic variable set up the project name and then

we're going to call it uh and then we'll set a specific entry state right just start at the start so now with all that said we're going to run what's called a simulation manager and this is what executes the code um this was this is what executes the code dynamically so it just kind of goes through and turns and executes the code and in this case we want to use the depth first search exploration technique in order to navigate the software right so that's all it's saying so basically we create our we import our binary we set up our symbolic variable that we want to mutate and basically be the clay through the keyhole and then after that we set up a

simulation manager which emulates the cpu because we've only done three things so far so now what we'll do is we'll make sure it's only printable to ascii characters right only visible ascii characters simulating let's say a typing environment right we will then set our end state right remember i said that anger is primarily used as a binary exploration technique and commonly if you're doing exploit development reachability is a big problem you can find some sort of super crazy contrived um error inside of let's say a parser but you don't know if you can generate any input to actually get there right you're never sure this will allow you to interrogate the code and generate the proper

inputs in order to target that specific line of code in order to make sure that you while you're executing it can in fact reach that it's very useful so in this case it sets up that environment it says that if inside of the posix environment which we get from those print statements inside of our standard out if we notice in standard out we see the word negative right we see negative then we'll jump out right we'll just hop out and so okay that's cool if we see a negative value we'll jump out and in this case we'll um yeah it's set up in the lambda function we have defined i don't have the line numbers but it's set in the lambda

function and then after that we will uh set our print statement uh yeah the found solver eval will evaluate until that point so that's the that's the premise all right and so then we'll execute it and you can see all right get sign we're running get sign it's a 32-bit bit vector that's going to be mutated and munged around with and it starts at a simulation state at the program entry point executes executes executes until it gets to an output simulation state specifically targeting that negative value and in this case hey it's printed negative and that's exactly what we wanted and if you notice we got to this statement using the negative value negative 2 m plus which is interesting

because what it essentially tells you is that you can input a variety of characters after the negative 2 and still have that integer be interpreted as or still have that string be interpreted as a negative value so it's probably using something like a to i or something like that so it allows you a little bit of flexibility and um a wider range of inputs to get you to that value right proper with buzzing you can hit that target a variety of ways and that tells you some information about the program and same thing here so now i'll kind of switch gears so now that we've we've talked about two uh two specific uh symbolic execution

frameworks uh cli which works on static right you need source code works on uh workstatically you need source code and anger which can work statically but also works dynamically and you program it in python and you use it to uh kind of explore actively explore code there's a particular problem that's interesting at least to oak ridge national labs and that's the problem of rehosting i think oak ridge and our partners up at mit lincoln labs it's rehosting the objective of hosting is to get a high fidelity emulation of arbitrary firmware divorced of its original environment you want to think of this like a head in the jar so the goal is to say for a particular proprietary device or iot

device or something like that that might be mission critical right you might only have three of these you want to see if it's vulnerable to particular hardware testing and stuff like that and the fidelity required to make a digital twin or something of that nature is immense right because those are going for the most high fidelity uh emulator you can possibly muster and that's too much typically so what if you just wanted to get the firmware running what if a subset of the full feature set is applicable well then you might want to re-host it you might want to take the firmware out of that and see if you can get it running or in an emulated environment typically

these are accomplished using qemu unicorn but um and there's a lot of impediments because you don't actually have the physical device so there might be things for example specific flags embedded in nvram before the whole machine is booted up it's looking for specific values in memory that you can't access and don't know it might be looking for specific peripherals right like it's trying to interrogate whether or not it has the right usb drivers before it even starts with those elements in flash et cetera and so there's a lot of impediments to doing that but what if we were able to have a callback framework in order to actively be um in order to actively patch and

like monkey patch the emulator as it's executing that's where panda comes in panda is an open source platform for architectural neural dynamic analysis very cute name uh tim did a great job with that one we are friends uh us at oakridge are friends with the guys and active developers of panda we meet with them regularly and it has the ability to kind of record and replay execution frameworks so if you wanted to interrogate um how typically it's very friendly with linux and x86 systems or x64 systems but uh it actually opens certain open up stores more to arm arm 32 and like arm 64 and mips architectures but it provides a rust and plug-in architecture so what happens is it hits

a certain stopping point and you want to write a specific plug-in halt execution do some stuff and then continue execution so it's actually really good to kind of monkey patch and work your way into a rehosted environment right because you want to be actively dynamically exploring the code and figuring out what's doing what and looking around the firmware as it's being emulated so in this case i've been talking for a long time this is kind of the part where i'd stop and ask for questions but there's there's kind of no one there so i keep blazing through i don't have discord up so i'll hear all the questions in a minute but please don't hesitate to just type

them in whenever you can alright so i actually was given a specific example so like that seems very contrived and very specific when would i ever need a well here is one this is an arm cortex m firmware specifically built for ctf in this case it reads from system clock and it's looking for a very specific value from the clock in order to return a flag okay well what are we going to do well the only thing you were given was the uh firmware you do not have the driver to talk to the specific system clock that's clock it's trying to execute to you don't have um access to its proprietary information and in i think

one case it can't be easily opened in ghidra maybe i've not tried ida pro specifically with its great decompilers but if you try to open this in ghidra it will vomit and will actually explode specifically because i think it's compiled in rust to try to further off you skate anyone trying to directly look at and interrogate the um the flag from strings so what do we do well let's take a closer look at it if we observe it as it's executing it we notice that it has a very very simple pattern this is a very very uh an imperfect representation of the control flow graph which i will have represented in a minute but this is kind of a basic approach for it

so it does this i o read specifically looking for the system clock and it searches for a value after that this says device not found or another i o read okay well this is through either kind of fuzzing it or testing it or constantly running it we're observing these types of patterns so it goes through another iro read then it fails to initialize right if it doesn't have the correct value and then it does another i o read and then after that it seems to be checking for certain values after the clock has been continually reset so we're like well dang okay so i can solve this with panda right and what i'd love to do is

interrupt the i o read and return a value of our choosing right so you could actually halt say you ran it in panda you stopped execution at a certain point specifically after the i o read took the return value from the clock made it whatever value you wanted and then gave it back to the program to continue executing right so halt execution um look at the return value from the system clock modify it return but that might that's akin to fuzzing right that's just searching a specific space because you don't know what the actual value should be you can't even crack it open in ghidra to actually interrogate the constants it might be compared to

or worse it's programmatically generated so what if we just use the system against itself what if we use symbolic execution in order to do so well what we did was we worked on a platform called angrypanda which basically took the firmware that was executing um as you see we emulate the hardware then we emulate the firmware and panda has in and of itself a virtual memory address space right so what we do is we intercept after the i o read and we can observe the return value that was set for that value so we create our project create memory from panda so we copy the virtual memory i think in this case we copied because

it was a cortex-m firmware we could copy a huge section of virtual memory over because you do run into a variety of issues with virtual memory and having to make frequent callbacks and page tables will be a big impediment to this but if the application runs within 4k you should be fine but we can copy memory back from panda and then we do what i'd like to call technically is speculative execution because what you do is you halt at that space and you say all right this is the return value i got i'm going to make that return value symbolic and then i'm going to say continue running for some number of simulation steps continue

running blah number of of translation blocks or steps of execution forward using the symbolic variable and tell me what your states are so at this point you begin to explore write the code dynamically because you halted execution and now you have this reverse you have this um symbolic variable that's just mutable and whatnot and exploring the code and fuzzing the code or fuzzing but symbolically executing the code and going forward and uh you can now observe all of the states that it found that it that it was able to return to within a certain number of constraints and it uses those constraints in order to uh to get there so after that you find one that you like

and you take that specific value and return uh and return back uh this work was pioneered by andrew fasano and at mit lincoln labs and i should also say tomoko chimoko balo was also responsible for generating the ctf framework i think he solved it entirely in panda but andrew found a way to use anger for it and i took it and did a few extra steps with it from that point forward so after that we return we select our desired state we solve and cache this constraint solutions right for later in case we want to use them or we return to the specific state later we can cache that we can modify it but we're programmatically exploring this

code as we want to because we're monkey patching it as we're executing it so here's a live example right uh do i have to click play i do so here's a live example there i am uh running the actual executable and then you can see it's a qemu arm system core chip it's using qmu to emulate it because it's actually for an arm chip and so it tries to execute looks at the clock and the clock is not the correct value it's looking for and it takes no input you can't use any specific input for it so it's not like it's you're going to fuzz this value right there's no input for it so you lose

restore from snapshot and then it just goes back so how do we go forward so in this case i run it a few times you can see the clock changes each time so i go to my solution always use virtual environments i should add that caveat if you can use a virtual environment please do and so from there we'll use work on angry bang and group panda we'll go into the virtual environment and here we run uh the solve and so what solver is doing is essentially interrogating the point so we have an mmio read right from a buffer and it switches to symbolic execution so this is when it returns from the clock at that point it uses anger to explore n

number of states forward right and then it returns explored all four paths found two constraints and so it found some binary decision being made between the code at that point we select one right we can arbitrarily choose one or we can interrogate the code further but we arbitrarily choose one specific path to interrogate and then we go forward from there um and so at that point we continue executing it and it does that a number of times so we actually end up having to halt execution explore state and return about four separate times and this last one we had five unique paths that we could explore and we chose one you know returning cash solution for address one

mmi oh read six we already had the buffer previously selected for that or for that specific mmio read and then from there we return the cache solution that we already solved for and then we hit our defined solution address because we already know the solution address space that we wanted to get to and we found the flag in virtual memory which i think is re-hosting is uh what is it rio hosting his the appetizer oh rehosting is on the appetizer menu at panda express in leedspeak which took me i think that took me like two weeks to get because i didn't know you could just pull it from virtual memory and then i had to programmatically find

all the successful states and all this other stuff but this is entirely dynamically um arrived as well which is awesome you know you can explore code programmatically and just it's great oh no i pressed escape accidentally let me go back to presenting okay so what can we do with this well i wanted to visualize this because what's happening is kind of difficult to observe behind the scenes and so what if we could just visualize it well one thing we could do is we could actually build a constraint graph and this is my kind of approach to it um oh mercy me just everything's happening isn't it just everything okay okay there we go all right so

sorry my i have a hot corner and it activated so let's say we wanted to activate those uh we wanted to build a constraint graph well it's very simple we're at a specific program counter right and we have our symbolic variable and we can determine whether or not we continue execution or there's an error based on the constraint of that variable so the edges in this case between our variable will be clearly marked with the constraint imposed upon the symbolic variable and we can see how code is explored through this directed acyclic node could be cyclic uh directed graph and this is what we get it's pretty straightforward for the ctf time example but we

see we start at 0 0 which is just an initialization value oh no we're still showing a white screen no okay hold on we back out let me i know what happened uh stop showing screen show okay are you back on you should see my powerpoint okay powerpoint overview thank you all right there we go okay good okay so okay so here's our here's our constraint graph right uh we start at zero zero at the very bottom and that's just initialization value from that point forward we move up to hex for uh 54c and then we can see the constraints imposed on our symbolic variable it looks like garbage but basically it's saying if it's equal to zero or if it's equal to

the hex value one three three seven of course it's one three three seven of course that's what the clock's looking for and so from that point forward oh go ahead uh it it looks like every time you like full screen into presentation mode it goes white no so i've been giving you i've been giving the entire talk to a white screen no not the entire talk no just just for like two slides okay that's fine that's fine then oh man i thought i botched the whole thing i can talk from here i can talk from here okay okay can you see this part yes okay good then i'll just talk from here okay so whoo all right good so basically we

go from our constraint solver man that sucks but we'll move on um but it's looking for hex one through three seven within uh from the clock value from the return value from the clock value so that's actually the value it was looking for the first time well we can set it and then continue executing but it looks like it's just singularly like regardless of which path we choose it continues to execute here because in this case i um i trimmed all of the uh circuiting the cyclical path so we can get a single dag right a single direct direct acyclic path and so from here regardless of which decision you make you end up at

this kind of fork in the road right and one leads you to an error if it actually meets the constraint values uh set here you'll get to an error an error state and if you meet the constraint values on the left-hand side between these specific memory values of i think hex 18 hex 10 and hex 14 within those specific memory values within the virtual memory uh you'll you'll actually navigate towards this left path you'll arrive at the program counter 57 or 570 right and yeah you just apply the constraints and then you can move and navigate towards your towards your final solution and what's really useful about this is that you can use that to annotate a

control flow graph so anger allows you to construct um control flow graphs from your uh so i'm talking hands uh allows you to construct control photographs from your binary but they are kind of kind of out there you can't really see what's going on so what if we were able to take this and we were able to annotate this huge control photograph well you get a much simpler much cleaner view of the execution right and it turns out all of this stuff as we continue to explore it was actually open ssl so it actually looked for the key that was provided and then used openssl to disassemble your code and execute so it's pretty pretty neat as with all

things lunch is not free um there are issues and there's a reason why not everyone since 2016 not everyone has adopted symbolic execution unilaterally well first there's a state explosion problem all right you're looking at all states as you're continuing to execute and that can be problematic uh loops are the prime offender in many cases but most of the se engines are pretty good about figuring out loops like they're not going to sit there and constantly hammer on the same thing over and over but it's something you need to be aware of specifically if it makes smaller minor changes within the loop because it has to keep track of those continues to execute and on top of that it actually uses

concurrency frequently in order to navigate multiple states at the same time so it can get wider coverage but that is a huge expense right so it's going to kind of eat up memory second is speed um it not just that it also takes a lot of time to set this up fuzzing also is not cheap in fact there is a huge effort in simplifying the time to set up a correct um fuzz harness because building a proper fuzz harness is an art form it's a delicate art form that you have to do um efforts like fudge i think from which is i think embedded in oss fuzz at this point take a step at that in order to get it

kind of working a little bit easier but it's still not a um it's still not a simple endeavor and neither is setting up any of the symbolic execution stuff so it does take time and it does take an investment but it allows you to explore unique code paths and unique opportunities that fuzzing just won't hit in the same amount of time and last is memory right page faults in an emulated environment are problematic because it has it's not in it's not available for view so you have to wait until the emulated environment goes out grabs the page from memory and dumps it back in and you have to be more conscious about that and and uh

it's uh it's tricky oh yeah does it scale poorly so the next steps for me personally are to find more firmware examples for this and build a much more concrete and hopefully simple uh constraint graph that you can use to navigate and whatnot to add more visibility to the software as you're re-hosting it because it's specifically targeting re-hosting firmware uh optimizations there's a ton of optimizations you can build into this that's right specifically if you're going after very targeted or very simple form firmware like cortex m firmware or arm chips that's a pretty good one too big big gigantic operating systems it's kind of a lot to go at unless you're targeting a very specific

application but smaller iot devices this is great for more panda plug-in integration and hopefully a big fancy ui and then yeah that's it for me i do have a few references of course like a good like a good science kid and do my references but yeah that's it for me

oh 47 15 left not too bad not too bad yeah i think that's perfect for questions uh so does anybody have any questions for all of the magic that chris just showed us i didn't mean to uh i didn't mean to uh confuse i tried to really take the approach of i am a pen tester and i know what fuzzing is but i don't know what any of this se horse crap is why what can it do and so i really try to take it from that approach because uh it can be a lot um but it's actually really approachable when you try to play with it it's it's pretty useful yeah so uh as somebody that works alongside some

of the ex lincoln lab folks uh that do this kind of stuff day in and day out i've always thought it was like magic voodoo and then your presentation like wait that makes a ton of sense that that's super easy yeah i know uh which one really linked it in for me i think it was when i saw not the darpa challenge but where was when i saw someone basically find a magic value like it was just some super dumb embedded um like uh character was looking for in the firmware it's like oh we don't know what it is but it's targeted oh it's just a memory value and then he fired up i think it was uh

anger he fired up anger grabbed it in like two minutes just like fired up some dumb python script stopped right at the program counter read virtual memory and then looked at it and it's like okay it should be this it's like what and then we looked at the control flow graph and it had to be dynamically generated so this gigantic wad it just came out to like some stupid um hard-coded password that you had to put in i'm like oh my gosh that's so good that's so easy and so that's what let me like really dive in because the fuzzer is not going to find that right it's not going to find some super crazy contrived thing

but you could just ask the code hey what should this be and it's like i should do this right like fuzzing shows me where things are going to break and how the how it reacts to values whereas symbolic executions really demonstrating to me like allowing me like like the phraseology that you are using i think is perfect it allows you to interrogate the code and like really figure out what it's doing and sometimes even answer the question for you right because you don't like for example if it's trying to do some gigantic opens uh ssh decryption or function or some super hash i just want to know what the hash is being compared to i don't necessarily

care how the hash is generated you can just ask what the hash should be and it's like okay the hash should be this it's like oh thanks and you move right you move right on so yeah um we did have a question in the uh discord uh can you provide use cases however you find this specific thing useful okay uh so explicit use cases i use for uh sc would be uh right now it's rehosting and specifically ioctal calls because or syscalls because a lot of the times the syscalls are pretty straightforward right it returns zero if it's bad or a constant error code if it's if it's really really bad but it's bad in a way

to expect it or return value right like those are pretty known but there's a variety of ways that you can uh but for more creative firmware a la r tosses a la like super super tiny cortex m firmware it's just doing an iota call to peripheral comparing that peripheral something some known constant it already knew and then is continuing to execute or if that value isn't a hundred percent it will abort and you can't continue interrogating the code so that those situations are the ones that i kind of wrote this tool for um to kind of look at it figure out what the iop call is actually going for and setting it to that and then

continuing executing like saving state after i set it and then continuing to execute um it's more of like a code exploration tool uh anger is used a lot in ctfs uh it's used a lot in ctfs because it's very good at cracking open the code if you can't look at it in a like reverse engineering format or if you want to ask specific questions of a reverse engineering or when you're reverse engineering and i would say se is probably more tightly coupled with a reverse engineering effort right as opposed to fuzzing fuzzing you're like well fire at it and then we'll actually start reading source afterwards i feel like you know the results will show up when they do and then afterwards

we're going to start um you know evaluating it if it crashes but i think that a reverse engineering effort you really start interrogating what the code could be possible input values to get there is this thing actually doing what it says it's supposed to do and then in those instances you might fire up something like this to uh to help i don't know if that i hope that answered the question see typing we'll see freaking out i have heard that it is a bear to use it's too much work i have heard that argument before it's too much work for something that fuzzing can just get done in a minute um how do you how do you defend against

this probing uh i don't know man like uh because i think re is really really strong like being able to crack open code now that giger is free now that re is available to everybody not just people with an idle license you know like it's strong stuff it's good stuff and i think more people should be doing reverse engineering um but in general i actually think it's really useful for very specific use cases where you need to figure out or have the code tell you something that's when you can really just switch gears from exploring or from fuzzing or some sort of dynamic analysis and just go straight for that if you if you can do it quickly yeah

i i don't know that you can't really defend against this kind of work right it's like unless the code is aware of how you're interacting with it and then it does i misunderstood the question yeah how do you defend against this um thorough obfuscation can bust it basically if you have a state explode while it is a trial is trying to interrogate the source and figure out for example if there are too many constraints imposed upon the specific variable from the entry point but yeah you can just code around it you just go all right cool i don't care about any of that if it if it works then at some point your obfuscation will continue to execute and

do what it's supposed to do without slowing down or without without like significant slowdown so you just halt there you also when it's almost done and then you ask it the question hey uh what are you comparing against this or hey i noticed you did a you reached out to virtual memory or you had to do a page fault uh what'd you grab while you were gone right like things like that you can just completely ask it for it so it's really a strong tool in addition to uh like con like you know reverse engineering or um like analyzing uh yeah analyzing the the the assembly it's just it's just really really strong it's just not useful all the time that's

the issue it's yeah it's not a it's not a big fat tool you can apply to everything it's very targeted and it's very good at what it does and when you use it for it it's kind of not a lot you can do awesome well that puts us six minutes before the next talk uh so i think we will wrap it up all right thank you very much for the talk i thought it was excellent i hope others did as well no problem thank you guys and we will see you on the screen all right see ya thanks bye

Chris Craig - Educated Guesses with Symbolic Execution

Related talks