Fuzzing the EBPF Subsystem

BSides Canberra · 202452:34457 viewsPublished 2024-11Watch on YouTube ↗

Speakers

Zac Ecob

Tags

CategoryTechnical

StyleTalk

Mentioned in this talk

Tools used

QEMU

Frameworks

LibAFL

Show transcript [en]

we have a a great speaker actually a return speaker because Zach spoke last year as well um he's back again and he's going to be speaking about fuzzing the ebpf subsystem so big round of applause for Zach hello oh audio is are working I love the backstage people so much um cool Hello fuzzing the EVPs subsystem I will add bracket of the Linux kernel when I submitted this talk it was the Linux kernel track I'm like oh one F made J for me I actually paid one of the infos people a large sum of money um to get that track created that's not a joke um and then actually now I'm on Main stage and people don't know what the

subsystem is of it's of the Linux kernel and I'm Zach if I'm to be believed I have animations between these slides okay um that wasn't meant to happen who am I most uh unimportant part of the talk I'm Zach um I'm a final year computer science student at unw finally going to escape there um part of security Society or sexo the cyber security Society there um play CTS with emu exploit uh sometimes and occasionally I am found at Australian conferences where people listen to me ramble now to the more important question what's ebpf uh it's a subsystem of the Linux kernel as you might already know from 30 seconds ago but basically we can submit

arbitrary programs to be run at various hook points throughout the kernel um and the way I like to think of it is kind of just an alternative to Linux kernel modules cuz Linux kernel modules are like kind of brilliant but also kind of really really terrible cuz like they're written in C and C is amazing cuz you have memory corruption and also a compiler that kind of just lets you do whatever you want even if you shouldn't do that um and so you get these people that write really bad Linux kernel modules and then compile them and then you know they're found in a bunch of systems and really vulnerable and terrible and also the other annoying

thing is like you have to be root to load kernel modules cuz otherwise like how could you have a non-root user uh modify kernel Behavior that's fundamentally unsafe like kind of um but the thing is is that ebpf ideally will allow any user to submit a program and this is a program that is being run in the kernel context so that sounds very whack um in actuality it depends on capabilities because they quickly realized it was a very bad idea to let everyone submit these programs um and as you can see here oh can't see my mouse or can see my Mouse um there's yeah 10 million different program types um and yeah you might be thinking

hook points o buzzword man tell me what that is um but yeah so the sort of original BP PF was mostly for networking and that's what I'm most familiar with so it's like accept or rejecting an incoming packet um handling a packet that's been specifically rerouted but it's now much more much more flexible you can mess with say file system operations or there's even things like these options called KR and Ube which allow you to kind of have custom hook points and people do like crazy stuff with that and yeah as I just said we're kind of allegedly letting users run kernel code so like how does that not break the entire security model of the Linux

kernel and that's because these programs must be verified for security Properties by the verifier um and so the verifier asks questions like is it memory unsafe are we handling uninitialized values properly will the program terminate and it rejects any programs it think might be unsafe it's conservative here we can't accept a program where we're like oh like might be unsafe but we're fine with that it's like we have to be 100% confident it's safe here um and so initially yeah we just let everyone submit ebpf programs cuz the verifier is perfect because we say it's perfect um but then people kind of saw this and got curious uh and unfortunately the verifier is not magic

um it has bugs in it and if the verifier has a bug in which it incorrectly accepts unsafe programs cool you now have um kernel code execution uh doing stuff that it shouldn't be doing um the process of um submitting this stuff is yeah you call the BPF Cal and yeah it goes verifier which will either reject you or approve you and then this one specifically says jit compiler um technically there is both a BPF jit compiler and emulator which you can toggle between um but honestly it tends to not affect stuff too much in so far as that you know if a bug is in the verifier you can probably go at it with

legit compil version or the emulated version so that's what ebpf you might be thinking uh why Target ebpf and first I'm going to say why not to Target and I will say like from a business money perspective I think this is a terrible Target um it used to be nice because it was exposed to unprivileged users but those days are gone for like a year or two now um and now you need either cap BPF um which is honestly I've never seen in my life or you can have um other higher privileged capabilities such as cap net admin if you're doing like networking ebpf or say cap performance monitor if you're doing that sort of

stuff and so um you can get a slight privilege escalation to root from these um there's also SE Linux I won't lie I don't know how SE Linux Works um I remember reading old papers about like disabling it with kernel code um execution but I don't know what guarantees it gives you these days and how that fits in with the kernel so maybe you can mess with that um of course if they like explicitly opt into unprivileged ebpf then you do have full LP but that's kind of rare you kind of have to be stupid or like in a very weird situation to enable that apologies if you have that enabled um so why did I

Target it I don't just choose bad targets and suffer um I think it's an interesting Target um so the I wrote a father which pretty much automatically generates test cases uh and so different programs have like different sort of formats to their inputs and some of them are really boring but ebpf you are submitting programs for it to run which means my father needs to generate programs from scratch I'm like that seems cool um to like my nerd self uh also it just seemed under researched in my opinion um there is szer which is the most popular Linux kernel uh fuzzer and it sort of tries to fuzz all of the system calls and for that it defines its

own sort of language so that you can Define grammars to describe uh the format of those system calls and for example um open I assume people know what the open system call is um returns an integer like strictly typee speaking but that integer represents a file descriptor and so SS caller will make that the file descriptor type and so when you call read in an SS caller it will ask for a file descriptor and not an INT and this is cool and all for like that basic example but when you get to such complex uh types as an ebpf program it sort of falls apart um and so it doesn't do a good job uh which is good

for me because otherwise there would be no bugs here um and then I looked at Specialized furthers and there's just not too many of them um yeah there's like one that was a decade old uh framework published by Google um I think there was one which I did see which seemed cool but then found like one bug and they're like called it a day they're like I've you know proved that my fairy Works um and then I saw a good academic paper recently like 2 weeks ago so like yeah that's basically nothing and so I'm like okay well if people aren't taking a poke at it I can probably take a poke at it and find a bunch of bugs and also yeah I

just think the sort of bugs you get from the verifier are interesting cuz they're logic bugs about analyzing program properties which sounds fancy but what I mean is like if you get like an out of bounds right into the Heap I'm like okay well I've I've seen this before I'm going to go find an elastic object and Tinker with it whereas like each ebpf bug I'm like okay well how do I use this uh like to craft a program which exhibits like Behavior which I turn into stronger perimeters so I just thought it was cool um and so that's about ebpf let's introduce the F which I have called feep which stands for freaking fabulous

extended Berkeley packet filter fuz it's full name um and I originally had a section talking about like infrastructure how I set it up but I won't lie I did just steal csol as's note um so go read about how that does it for a better summary and I'm mainly going to talk about generating inputs because the main reason why ebpf is interesting is that the inputs are weird cool before I talk about how I generate them you sort of need to know the structure uh and we are submitting ebpf programs um and an ebpf program is pretty much just an array of ebpf instructions like it's very simple um here is the actual structure we use when

doing the system call and it's literally do inance is the instruction array and we tell them how many so like we don't have like an actual file format or anything um these ebpf instructions form a minimal uh risk instruction set cuz it was designed from the ground up wouldn't want to verify a CIS instruction set cuz imagine having to think about the security properties of vector compareed qword swap exchange it's like I don't want to think think about that um so keep it as simple as possible and yeah verifier checks if the program is safe but also if it's syntactically sayane to begin with um one last thing is how do we communicate from user land to Kernel

land um because one of the main things about ebpf is like say people love collecting performance metrics with it um and so we do that with things called ebpf Maps they're just different data structures like arrays or hash Maps and once again we make a CIS call saying we want to register a map and it gives us back a map FD which we can use in our program to talk and then we can look at it ourselves with more system calls now you don't need to ponder all of this I just wanted decide to not look empty but the evf instruction format is this and it is 8 bytes total uh we have an off code a destination register

Source register offset and immediate and it's yeah pretty simple thankfully not like a variable length instruction set um the op code has eight different classes of which we have effectively memory operations ALU operations and um control flow so like branching and jumps uh we're going to stay at a relatively high level view of these things so I am not going to be educating you on the difference between BPF load instructions and BPF load X instructions cool now that I've explained the format how do we generate them uh and at the start I take on the very nuanced approach of doing cat slev random I it's never been done before um unsurprisingly or maybe surprisingly to some this wasn't very successful um

you mainly do that to see cool your father's alive it's not immediately crashing and stuff is sayane but the other reason this is actually nice to do for BPF is the verifier log um basically uh you know for the user experience for like developers if they are submitting programs that they want to do for real work and imagine those programs get rejected it's like well why did that program get rejected uh you either have to like look really hard at your source code or you'd have to start trolling through like Linux kernel code for hours which I don't recommend um and so obviously this was like a design problem to evf so they're like okay when we are analyzing the program

the verifier is going to construct a log and if it gets rejected it will give you that log and this log's like incredibly both it's like hey for every instruction ever had the entire stack State and register State and you know the wether for the day um but thankfully it's pretty easy to minimize cuz at the very end it's like oh you tried to D reference a n pointer um and so I would run this for like 10,000 iterations and then just chart how many programs are getting rejected for what reasons um and as I mentioned verifier before it talks about semantics and safety talks about syntax um and so I kept getting these things like invalid op codes with the

verifiers being like what are you even giving me these aren't valid instructions how can I execute them if these aren't in the instruction set um and as I mentioned the op code is officially just a uhu um but for uh as a type I can just say generate me a random one but once again they might not be valid such as the op code F2 um and so U as a type doesn't properly encapsulate the op codes um I programmed my father in Rust I will shill it hopefully only a tiny bit um and so I modified this as an enum so you can have you know all the fields here loads does l and

jumps um and then we have to technically model a bit more here um but now sort of we can make use of having like a decent type system to make it pretty much impossible to construct invalid off codes because if every single enum variant is a valid correlates to a valid instruction how are you going to make an invalid instruction and then you can say instead of give me a random mu8 give me a random ebpf instant class and we're happy um technically it's not impossible but I'll mention that later and there's more invariance there there's too many um the other one that I think sort of I guess relevant to talk about is we have the instruction Source

mode so when we have an instruction it will either be using the source register or the immediate field and never both I don't want to say that confidently it's a weird thing actually um so if we're in immediate mode we're like okay well we need to set Source register field to r0 and similar if we're in register mode we need the immediate field to be zero otherwise we get more complaints um and yeah that's that's just so many of these like different invariants to make sure that we're sort of syntactically like ebpf is happy with us um things like aligned memory accesses if we're doing bit shifting we don't want to shift more than 63 and a 6

orbit system um and like I don't want to run through all of these because like we don't handle them all in a particularly novel manner um I just yeah where possible like make use of types to represent the sort of grammar of the thing you're fuzzing that way at a value level you can't have invalid like grammars or like values um but like if that's not possible and is not always possible sort of just enjoy using safe Constructors which sort of do do all the checks and then output the resultant instruction and I can rely on it being syntactically valid cool so types already talking about types um our program after we obey like the 50 billion different invariants

um now makes syntax wi it's the ebpf verifier like hey you sure are giving me instructions um but semantically which is like what our program is actually doing still kind of doesn't make sense um because the F has no notion of types in the ebpf program so we're trying to do things like multiplier pointer or load from a scaler and so we introduce the notion of types um for our registers and it's a pretty simple type system thankfully we mainly have three types which is pointer scaler or uninitialized um pointer has associated with it the pointer type because talking to the stack is different to talking to say the packet we're working with both of which are represented as pointers um

scalers technically have an Associated width with them and unit is uninitialized and yeah this actually gets us very far in so far as this matches the verifier own type system to like a pretty good degree um and sorry one thing I will probably refer to like the I guess truple of like register to Associated like type as the like variable state in the future sides cool so now that we have this notion of these basic types we can now ensure things like our load and stores are now only going to operate on pointers um as the memory location of course we can still store scalers or whatever and same for things like Atomic instructions like comp exchange which

tripped me up um uh ALU instructions we can now guarantee will work on scalers because you know verifier is strict we aren't allowed to divide a pointer or whatever um some like minor exceptions for things like adding or subtracting to a pointer cuz then how do you implement arrays in assembly and uninet values are sort of used in the uh like opposite manner in that we kind of never want to ask for an uninitialized value without cuz then the verifier is going to be like what are you doing um but we can do it in the inverse manner of when quering the variable State say hey give me a register any type as long as it is

initialized um so the example for that is the move instruction as long as the source register is initialized it's fine to move that to the destination register um and it's nice if we magically have these types to query and sort of make sense semantically but like how do we know that like this register is this type or so on um oh here I mentioned variable state that should have been earlier so yeah basically when we are generating an instruction it has the variable State and it can make queries to that variable state with different requirements but it's actually given mutable access to the variable State because um the State obviously changes as the program executes um and certain

instructions will modify different registers and what their types are the easiest example is move reg will copy the state of the source register to the Des register overwriting the pre-existing uh state of the desk register and also load instructions initialize the desk register what exactly it does depends on where we're loading it from and yeah um just with that we have types in our program and can make decent amount of semantic sense to the verifier um and then finally let me check how long I've been rambling 20 minutes um we have battling control flow analysis so semantically um we kind of have ALU and memory instructions pretty good uh but jump instruction still really like the verifier still really

hates us and that's because I haven't talked about the offset field which is kind of a niche thing for most of the other uh for some of the instructions but jumps really care about it because the offset field says where we're jumping to uh and so we need to reason about our control flow such that it makes sense at an instruction level and also at a program level um instruction Wise It's actually kind of easy to reason about um we need to jump inside our program so if we have a 10 instruction program and we say to instruction 20 the verifier is going to say no um so that's the first thing and also we can't jump into the middle of

the instruction um those of you familiar with um like instruction sets and encodings you're like hey the instructions are 8 by aligned how can you even jump into the middle of those cuz like the offset will just be a multiple of eight um so the people that fought exactly that very cool um and that's because we have a single I kid you not a single 16 byte instruction to thank for this issue which annoys me um but onto the harder issues is the program level issues the first one is that the program must always terminate now I regret to inform you the verifier did not solve the halting problem um it is up to the submitter to like the onus

is on the submitter to prove to the verifier that their program terminates and it can be like very reasoned about in a automatic manner the next thing is that that all instructions must be potentially reachable so if you had an instruction and you always like jumps ahead five the like instruction in between might never get reached and that makes the verifier mad um I think that might be due to either you know that's an indicator your compiler is doing something weird or also because when people were exploiting it we would purposely leave in dead code which is always jumped over because we analyze the program wrong but we do jump to it and then we're more free in what

we can execute but now it complains about that uh and then lastly We Can't Break um assumptions about the variables types I didn't know how to phrase this elegantly but this example say we load from register R1 indicating it is a pointer and then we move dead beef into R1 which is a scaler and then we say jump back to zero now if we tried to load again we're loading from a scaler which is obviously not valid so so we need to make sure we are not breaking our like type system here so like how do we go about this and um when I went into this I just treated programs as an array of instructions cuz

that's literally kind of what it was um but now I sort of shifted into representing it as a graph of basic blocks um for those of you that don't know what a basic block is fair um it is a block of instructions in which all of the in instructions except the last one are non- branching and that's nice because you know if you execute the first instruction of that block you will execute every instruction until the branching instruction so in a lot of control flow analysis people talk about basic blocks now this graph uh is going to be connected by directed edges and those edges are formed by jump instructions um the reason they're directed is obviously

if instruction zero says jump to instruction 5 that doesn't mean instruction 5 has to say jump to instruction zero so they're directed and and lastly we do have implicit edges because obviously most of the branches are conditional so if I do J equal register equals five we will have the edge in which that is true and we jump and the implicit Edge in which that is false and we just continue which is kind of also a jump and here's this terrible looking diagram um I don't have any Arts people to do this for me but yes you say you have index zero which says jump to five so it has our directed edge here but the

implicit Edge will go to index one which will also say jump to five in which we exit cool so now that we've shifted how we're thinking about control flow um let's start targeting the first issue of uh reachability um we want the graph to be connected except that's technically a misnomer we do want stronger guarantees we want every node to be reachable from the origin node which is literally the node that has instruction zero in it cuz we always start executing at the first instruction kind of that's weirdness going on but that's out of scope um and for every node we want to be able to reach an exit node so um that is talking

about having all of our code be reachable and all of it terminate um and these things are stronger than just connection um the way I did this the actual algorithm is messy for like a number of reasons part of which is me not being smarter but um we sort of iteratively add nodes starting from the like first node which we always know we want to construct and when we see that we have the last node like that we know we're constructing we do a scan of the program for any unconnected nodes um and because of how the algorithm works we know that the node where currently operating on is connected so we then connect to all the unconnected nodes in

uh sort of linear sequence and then that last node then connects to the exit which allows us to say that the program is connected and all reaches exits um technically I think you could be smarter about this and maybe have more sort of General sort of weirder programs but this was a way that made sense to me to get these guarantees um here is an example of a more complex one that is an issue so we covered this but say we have index one has the implicit Edge to two which is saying that it's always going to jump back to zero now this craft is connected um but index fre here this block um is not

reachable from index zero so like we would never be able to execute it so we're going to get complaints and so what I would actually do is if the index 2 block is the last one I was making I would do this check and I would say hey index free isn't actually reachable and so I would instead always jump to it um and then that one immediately exits and like that we have these nice guarantees about stuff being reachable and exiting um and then we have uh block through requisites so I talked about not terribly messing up the type system the way I thought about this was each basic block has a prerequisite variable state

so that sounds fancy um so what that actually means is like the block will be like hey I'm going to use register 8 as a pointer and I need register 2 to be in it and for another basic block when we're choosing where to jump uh we can only jump to another basic block if it satisfies the prerequisites um one thing is that the exit block has no prerequisites you're just choosing to exit um so you can always choose to do that um I used to have bugs in my program because the blocks would be mad at me after not having anything to jump to cuz I didn't think of that and yeah just a little bit

of my code is you have like satisfies PRX this is how I'm choosing to represent them in like actual rust types and it just says yes or no and say when we're figuring out what blocks we can jump to we just iterate over them and filter them according to which ones we can actually satisfy the PRX for cool um the last thing here is that we need to prevent infinite loose you're like why do we need to prevent infinite Loops call back we want our programs to terminate and infinite Loops don't terminate in case you didn't know um so Cycles are allowed in the graph because I want my programs to be like weird and

like to be flexible when I'm generating them so I want Loops in my program um but because of that the graph representation doesn't help differentiate infinite Loops from normal Loops it can just say yeah you do have loops or no you don't have any um so to reason about infinite Loops I would need to precisely track the registers their exact values and be overly conservative and what that means is I would just be copying the ebpf verifier and be doing its job um and I don't want to do that because the verifier is kind of big and very complex so I would either be working from scratch and take forever and add a ton of complexity to my code

and make a bunch of bugs or I could just copy the way ebpf did stuff but then I would be copying all of the stuff they sort of subtly implicitly did wrong and not realize it um so I just made the call of like it's not worth it to tackle this and didn't um if you can think of an easy way to sort of reason about infinite loots please tell me um and that is generally how we did inputs so we have op codes and invariance which we need to oby to make sense syntactically and I recommend capturing as much info about those in the type system to make it impossible to violate at the type Lev at the value

level and otherwise using safe Constructors to kind of encapsulate those checks then we introduce the notion of types for the variable state which we need for semantics to avoid loading from scalers multiplying pointers and more and each instruction can of course query that variable state for registers it can use or it can update it if it modifies it and then we have control flow validity which you model as a graph I know amazing for me to come up with that uh that's a joke it's a very common thing um we ensure connectedness but also actually the stronger um guarantees we ensure the variable state is respected for a block uh but infinitely it's the I I'm a

quitter I didn't know how to tackle them um definitely room for improvements um and stuff I want to do I want to explicitly pair load and store operations because if we store then never load it effectively acts as a noop um there are these things called ebpf helpers which are functions you can call in the program that seems like there there's code there so there's bugs there um but I would need to upgrade my type system like to a notable extent to be able to work with those and then lastly using ebpf maps to detect INF Leakes where I would register a map and at the end of a program store all the scalers um then I'll disable kslr and

check all the scalers and check if any of them are actually kernel poters um hopefully I'll do these um you can just steal my ideas and do them but you know if you do tell me and I can be interested um and so FR 3 minutes talking about uh F design because I didn't want this to just be about here's how you generate ebps inputs because that doesn't uh generalize too well um so I want people to hopefully be able to take some stuff about um further design in general and this is I guess a bit of a step back uh fair warning uh I am not like the most advanced further person to ever do it so some of these

insights are hopefully helpful but I'm not saying all of them are going to be particularly novel it's just stuff that really stood out to me when I did stuff first off if you're going automate stuff in the development process I beg of you to do it um I always have this issue where it's like oh it's just free commands I'll just run it now and then I have to run it every single time um and say I'm fing the kernel so like with user land to run the program you sort of just start the F which spawns the process but for the kernel I have to compile the binaries and then copy the F binary and the relevant libraries to the

init Ram FS folder and then I write the init script in the ramfs folder according to the options I set then I compress it then I run FF manager which then spawns KY which I'm using to virtualize so like that isn't too complex but it's just really cringe to like do all those commands all the time so yeah spend as much time uh like you want to spend as much time finding bugs and developing exp points and less on all the cringe stuff so like stuff I did yeah running the program some automatic triing scripts um generating equ C programs for ebpf inputs because it's nicer to work in C than in Rust for these things at least when working on

just submitting the program and setting up swapping between different kernel versions um next type systems I've already rambled about them but statically enforcing things about the programs grammar is wonderful um like because that's how the type is defined values of that type sort of have to obey how the type is structured so we don't have to worry about violating parts of the grammar and also um I'm not 200% sure about this but I think it's also maybe more efficient in so far that if you just have stuff included in the type system it's just obeyed um when you're like generating the code whereas if you have to dynamically check it that will be instructions generated to be like hey

I'm looking at the value and I'm modifying it and so on once again the easiest example to draw on is using the enom to represent the op codes um to which it's near impossible to construct a value of an enum that's not a valid variant say near impossible like in Rust you can do unsafe me transmute but at that point like if you're complaining you're shooting yourself in the foot you like went and bought a gun and ammo and loaded the gun and took off the safety and then shot yourself in the foot and then complained so like it's pretty hard to do um however type systems aren't magical or at least too magical like

there is a bit of magic um and so sometimes yeah you can't culate all the info you want to in types um once again bit shifting I want to be limited to 0 to 63 in the immediate field because I'm in a 64-bit system um so that uses the mfield other instructions want the infield as a normal u32 um but bit shift it's like so what do I do here do I now have an enum for the infield with bit shift or the u8 and normal the u32 cuz that's kind of a pain cuz now I have to match when I'm ever working with the field and like a u8 here doesn't even properly restrict to 0 to 63 because it

can go up to 255 um Zig has support for arbitrary size integers but like I'm not in that language so this is just meant to show uh sometimes you kind of do need to give up on encapsulating as much info into the type system um cuz sometimes it becomes too unwieldy um but when rust gives me dependent types and lenses I will uh be all powerful uh now onto the Bad and the stuff I [ __ ] up so differentiating failures uh I will say please please please make sure to clearly differentiate between your fuz failing and your target failing if your system or like your setup crashed what's happened to you like have you found a real bug or does the father

have bugs or like yeah so how much can you trust your father once again um Shing a little bit of rust I really like having a strict compiler but because you have uh a stronger like sense of Trust on your program but it's not a silver bullet of course uh we haven't made a programming language that makes you have to think about stuff logically correctly yet um so how did I stuff this up um yeah I I run the stuff inside Kimu to virtualize it um and in the init script I run the actual fuz program as the final command in the init script now does anyone know what happens when the init script or the process exits Kel

[ __ ] dies and complains at you um so the colonel panics when my father like finishes it's like yep I I fuzzed a thousand iterations I've done my job um and all of my scripts so you're like oh the Colonel's like panicked you found something amazing and then I like open the file I'm like ah in it died I need to delete this and stuff um so that's an example of me not differentiating between my father like exiting and being normal and like how I want to capture bugs so think about how you're going to do this and yeah here it is attempted to kill in it I've seen this so many times uh and then onto the ugly which is

stuff which I like to think isn't particularly my fault but when you're developing a f you're going to have to develop around issues with the target you're fing cuz it's not on to fix it and the example is the big ebpf lock ebpf verifier has a big lock around it and honestly valid how many people want to parallelize verifying 20 different programs the answer is like no one except me um so it's fun for me because I have to work around this um so one of the other FS I looked at just moved the verifier into user land to allow it to parallelize cuz then you can spawn it as a user process however many times and to

me this seemed ideal but it seemed like a lot of effort and if you change the behavior of the user of the verifier when moving it into userland seemed like a nightmare like what if your userland verifier rejects it but the K one does so then you're missing out on genuinely good inputs or if the user L verifier accepts it but the Kon rejects it you need to figure out where you've messed up so it seemed kind of nasty to me but they got up to work um so yeah F the executes in Kimu cuz we need to virtualize the kernel environment so I'm like well you know I I can't paralyze this inside one kernel

so I'll just spawn a bunch of kernels um so I just spawned a cheu instance per Hardware threet uh and you now have the overhead of multiple kemus but most of that is just time spent booting up which is insignificant in the long run and the other thing which is I think a sort of theoretical downside is that coverage isn't centralized um so two VMS might both get the same bit of coverage and then both treat it as new and attempt to fuzz it this isn't necessarily a bad thing maybe one of them finds something and the other doesn't but at the very least if we're inside just one kernel the coverage is centralized so like only

I guess one Fred is ever working on the a new bit of coverage but because I haven't done it both ways I can't confidently state if this is that big a downside next uh e ebpf can just be like really slow uh first time I thought I had a bug it was cuz the colel stored for like 5 plus minutes on a system call I walked to like my dad I'm like hey just found a colel bug um no yeah I was wrong um verifier was just really slow uh it will verify up to 1 million instructions um when you look at the colel source code that's a common next to it just saying yes 1 million which I

find funny um I reduced that to 10,000 um and also tinkered with a few other things um cuz to me like if a program isn't accepted by 10,000 like like Loops or like iterations that's probably not going to be accepted by a million but I technically am now rejecting some valid programs um the issue of the speed of the verifier improved significantly but it's still a relevant headache but yeah I will be say don't be afraid to modify your target if like the benefits are like pretty notable uh and then lastly source code and are choosing to read it um this is something I always think about when designing furthers and stuff inure about like the best ratio of time spent

reading source code to to design a solution versus being gung-ho about it uh so basically the more time you spend reading about source code um you get an overview of sort of the larger design and that's something that I do think is important but I do worry about sort of mentally sort of putting too much trust in it and sort of thinking about all these checks and being like Oh I'll need to incorporate that in my father to not get tripped up on them to to which you might over sanitize your inputs like maybe that is like a weird Edge case to bypass it um and yeah some things is I really wish I modeled like control flow

and thought about handling infinite Loops more um so like my takeaway is like read the source code or analyze the like binary if you don't have that enough to understand the sort of big level features that you are going to need to incorporate your design around but don't get caught up in the little details cuz it's not worth it um I will say it's not always a clean separation between the two there all right um pretty much near the end of the talk just quickly talking about the actual bugs um the details I found five um two issues with casting type casting two issues with control flow one of which is an infinite Loop and one of which kind

of lets you escape and one issue with division instructions um I'm waiting on some of these to be patched um and that means I think my father can totally find more but because there aren't that many bugs once I find one bug my father just keeps finding the same bug so I'm just sort of chilling here until the kernel people do that stuff um and two brackets to 4ish were exploitable depending on how optimistic you are and I'll just quickly talk about some this one's been patched so I can fully talk about it um when the verifier is doing its verifying thing it tracks estimations about register values at all times for safety what that looks

like like is if we are accessing an array of five elements it will ask is our max value for whatever we're using to index four cuz if it's not we can do out of bound stuff now there was an issue when caring to signed eight they mishandled the VAR off part of their value tracking you don't need to know what that is but yeah it didn't handle sign extensions properly and so if we had a completely unknown u8 value X the cost of it SED 8X had an assumed max value of ox7 f uh that's not right uh you can have Ox 80 and Ox FF and so this is fully exploitable uh these are the

types of bugs you really want from the verifier because misassumptions about um values allow you to craft out of bound stuff based on them so you can assume if there's a program which has a safety property dependent on us having a max value of ox 7f but we actually had Ox FF we can get out of bound stuff and then we can escalate that to higher and higher privileges to codex ution so as far as I can tell like asterisks um the bug was born from C duplication because C is lovely and doesn't handle generics the bug existed in the function coer subred to size SX but not coer reg toid the sex so that

means ALU Mau was buggy but not al64 Mau and this to me screams if you can try and program abstractly and with generics because if you do so you have to reason about logic at a higher level and you get less messy bugs like this where you're just copy and pasting but you forget one line then the third bug was another issue with signed unsigned casting for register value tracking it was barely not exploitable to my best attempts except I've now had new ideas when I've been talking so maybe it is and it exhibited simly the traits you want about invalid assumptions on register values technically this isn't patched yet um due to mailing list Shenanigans

so I've anonymized it to not give you zero days even if I don't think it's a critical one and it is of the form and sorry for the lame thing but once again don't want to give away stuff u64 VAR equals u32 VAR equals s64 value um and the fixed value VAR uh fixed version is like this and the thing I found about cool about this is like most of the bugs in retrospect once you see the bug you're like oh that's obvious how did I not no this but this one like I don't think I would have gotten because uh what's wrong is that there is an implicit u32 cast here um so f64 value

assigns to the u32 VAR obviously cast u32 and then we're assigned to the u64 VAR which has that u32 casting so the top 32 bits of u64 are going to be zero because it's going to zero extend here because they're unsigned so if s64 value had as literally any bit set in the top 32- bit there's going to be a dis chunk here so yeah I found that cool uh just talking about those two bugs cuz they were the most interesting to me and that's pretty much it thank you um here's my website if you want my other ramblings or socials uh [Applause]

yeah do we have any questions for Zach our runner here I will say we did have a Linux track option in the core for papers uh a few people wanted it but we only had one submission which was Zach so he's up here on the main stage here we go um you said that your program didn't handle infinite Loops at all so what happens if it gets one does it just get stuck good question um so no it basically just rejects it in so far as that it needs to um obviously we don't want to submit programs which can then sort of just store the kernel indefinitely that's not something we want to do um if we allowed that and say

like you know someone's using this in production and then someone accidentally triggers this and then someone's system is just stalled forever that would just be cringe um so no the verifier does some control flow analysis it says I can't confidently declare that this isn't going to terminate or if it can just clearly see it's an infinite Loop um yeah it would just reject your program and it won't get loaded into the verifier um one of my d one of my bugs was a control flow issue which did then have an infinite Loop um so that yeah was one bug which they had to fix yeah any other questions um you mentioned uh s caller earlier uh which is pretty well known

and run by uh in production by people who have a lot of resources to uh run fuzzing machines um do you think there's any room for more uh ebpf um fuzzing support in s caller um yes hold on let me one second oh it's not projecting uh that's totally valid um oh we're back yes hold on Simon scannel ebpf uh verifier so basically um what I'm doing by this uh um is I think one of the existing fathers is an extension of SS in which um they have modified um the SSA grammar to um do stuff let's see here control FCA uh H okay not this one I'm I'm I if I'm to be believed I think someone's maybe

attempted it but I just don't think it's the best idea in so far as that like and I I must admit I'm not the most familiar with szer but from what I looked just having to use the szer grammar to model such complex inputs seems like it might not scale the best um so I think I I don't know I I preferred leaning towards the sort of bespoke build up from the ground up I think you definitely can have a go at it I just think the development experience might be a bit like bad but also like I don't know I that's my view and assumptions but like feel free to like prove me wrong and give stuff a go um

yeah any other questions up there there's one up the back so I seriously fought this one to DPF plenty of time for questions yeah great talk thank you um did you ever consider writing an intermediate language to make modeling something things like control flow a bit easier and then LIF it to the EF B code yes I I very much did this was one of the things in which when going into developing it um I was just like oh I'll just like directly work with the assembly like how much can an IR benefit because it's such a simple um thing to work with um but as I came across those issues um yeah so I guess the other I

think control flow would definitely benefit from it uh I I went with the sort of control flow graph and haven't pondered too closely how I would try and Tackle say infinite Loops the other thing is working with the ebpf helper functions um does increase the complexity of the type system by a good degree um one of which an example I was thinking of is and if I can find it which I might not be able to but the type system technically has this thing called pointer or null in which some operations will give you a pointer but you don't know if it's not a null pointer so ebpf makes you work with like unwrap that by checking if it's null

before unwrapping to a pointer and that seems like something to me where I would really like to have an IR to then um model unwrapping a point or a null is a single operation like in IR which then unwraps to like several assembly instructions so yeah there's definitely value in there and uh I think I would enjoy doing that but uh I think it would be a lot of work and I think in the end it would work better but I can probably scrape by most of the improvements I want to do with my current solution so yeah I think yeah there's definitely Merit to it but uh I didn't initially see the Merit so I sort of just went

this way yeah awesome any other questions out there nope okay let's give Zach another big round of applause [Applause]

Fuzzing the EBPF Subsystem

Related talks