← All talks

unrubby: reversing bytecode for the lazy

BSides Canberra · 201846:0658 viewsPublished 2018-08Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
Mentioned in this talk
Tools used
About this talk
BSides Canberra 2018
Show transcript [en]

we are very privileged again so you have an overseas speaker rich Oh has traveled to Australia just to present it besides Canberra so we are very grateful for that and his talk is on unruly reversing bytecode for the lazy so let's welcome rich out to the stage today how's it going so yeah my talk is called unruly I'm gonna say Rubby a lot it's probably worth getting out of out of the way up front every time I say Robbie I mean Ruby I spent a lot of time working on Robbie and I've kind of lost the ability to take it seriously and so my internal defense is to call it Robbie and that makes me feel better about the like

ridiculous behavior that you bump into the obligatory like who am i slide my name is Richard Butz I speak a lot in the US and so I have to explain to people how to say my name happily that's like this trip has been awesome I've had literally everyone get it right the first time I work at stripe we're a payments company I'm obligated to shout them out a little bit if you want to work with me and come talk to me afterwards it'll be neat me and Mike Ryan have the most ridiculous CVE for a skateboard as you do I also run a con and I jump out of planes and stuff a lot so this is like mostly what I'm doing

when I'm not hacking computers and stuff I'm gonna skip the selfie because I'm like 10 minutes behind maybe I'll take one of the end if we have time so this talk it's some cool tricks with bytecode vm's i want to show you some like goofy [ __ ] that i found inside of robbie while i was working on this cuz i think it's like fun to kick people while they're down i guess and i want to talk a little bit about like what reversing is and specifically like how you can reverse engineer stuff without actually having to be a reverse engineer because while I have friends that like think a really good way to spend a weekend is to spend

like 36 consecutive hours like staring it Eider and matching at 5 it's not really my jam and so I mostly prefer to like do a little bit of work upfront and then just like do a thing and then have source code just pop out the end so I can look at it because that's way easier so this talk isn't really dropping like are they oh bugs although it does like sort of prove that commercial obfuscated products for Ruby are completely pointless which I guess like qualifies because I might at least upset a vendor or something but like yeah but that's the TLDR is I'm just gonna show you some cool tricks and you could like maybe apply them to some

stuff so really quickly some terminology that I'm gonna use when I say VM I mean a virtual machine in the sense that it's like a register machine or a stack machine that's abstract it on highway it's not like Linux guest running on your Mac or whatever which is like the other popular term for VM up card and instruction are pretty interchangeable but they're basically the operations that that vm influenced bytecode is basically just a long string of op codes or instructions as i mentioned probably is an interpreted dynamic language and yell of and MRI are the implementations of these things for the gold standard of Robbie right there are a ton of Robbie implementations but nearly everyone uses

Yavin am i right sorry how did I get working on this fundamentally there like a ton of companies out there and I won't name any names but like it's not terribly difficult to figure out what some like classic examples of this would be but they like enterprise customers they want to sell you an on-premise solution so that you can run this software behind your firewall where you have like some control over it and you can observe it and figure out whether or not it's trying to like send all your source code to China and they want to let you do this but they want to do let you do this in a way that doesn't also give you

access to like all of their corporate I pay for example because that's kind of antithetical to their business model right and so the solution proves people run with is obfuscation right you give someone something that's functionally identical to their source card in the sense that like when you run it it like does the thing that their software is meant to do but it's sufficiently mangled that as someone interested in finding bugs or launching a competing business or just like doing some recreational corporate espionage it doesn't have like all the useful properties that like a full dump of the source code would write and this isn't like near right long before anyone was like you know what I'm gonna do I'm

gonna package up my ruby sauce app and sell it for extreme money behind the firewall like malware authors have been doing this for a really long time games developers have been doing this like forever with their like licensing systems right because it turns out licensing systems everyone configure not as good as ones that are magic black boxes and even doing this with by Kurt isn't new like burp sweet I have a bunch of friends who have like being driven to the brink of insanity by trying to reverse-engineer bap sweet and like web sweet is still just Java under the hood right like in theory you can apply the same technique to it and hopefully I'm gonna know it snipe at

least one person that's doing it because I think it would be funny but like in a bye card VM most of the typical obfuscation techniques are there they're challenging right like with native code you can use performance counters to figure out whether or not someone's tampered with your card you can use side channels to figure out whether or not you're being instrumented for example like the pin framework from Intel is pretty good but it does like a drum time-cost right and if you know things about your execution environment you can figure out whether or not someone's doing things to you and you can like bail out of execution pretty early and you can also just like jab at weird

instructions and figure out whether or not they emulated that you're in like correctly implements them turns out that like Intel silicon for example like doesn't do what the manual says it should in a lot of places and implementers tend to read the manual and just assume that's correct right and so if you like Javid a weird instruction that has a bizarre edge case you can use that to figure out whether or not the execution environment you're running in is genuine or not but in a super dynamic language this is even worse right because you can't mangle a lot of semantic information for example if you've ever looked at obfuscated Java like you've dumped it into JD GUI and it

like pops out you'll notice that all the classes are named class 1 and class 2 class 3 and so on right because it's possible for the compiler to take all of the semantic information like class names and stuff and throw it away because it can consistently rename everything right but in this like super contrived example that I wrote a moment ago in the green room you can't do that because if you take user provide an input and then you use that to look up a class or a method name and you've renamed the class or method then like your application breaks horribly and so no matter what you do you're still going to be preserving like a ton of semantic

information that's really interesting someone in for example when saw reverse engineer stuff without a JIT you also can't do anything super interesting to method buddies so you can't like obfuscate them too hard because by code vm's aren't typically super fast especially if you're looking at like Python or rubbie or whatever and without the ability to do anything on the fly your your options are pretty limited right and so most of the like most of your options are purely in trying to make it very difficult to actually get at the bike herd itself right and so like this is a screencap from some malware I tried to reverse a while ago like 1400 basic blocks is a

really effective way to tell any reverse engineer to go [ __ ] themselves right like if you want someone to like stop looking this is a really good place to start so cool before we talk about how to break Rabi it's worth understanding like how rubbing fits together right and so if you wanted to just like evaluate some ruby code this is fundamentally what happens you take a source file the VM reads it in and it generates code from it right and so this is what that looks like this is like some goofy rubbery code that I wrote and rubber you conveniently has this like disassembler thing which will like take some executable object and spit out the the

byte code for it and like Roby's byte code is actually pretty expressive it's it's not like the the kind of like sticks instruction stack machine that it could be and so it's not the worst idea to just settle down like read this and like rewrite this off by hand but it's like not the best idea but also it has a pretty printer that will actually render it as an ast right so you get like next thing you get someone to send you how objects and clothes relate to one another and like you get you know another layer of semantic information so this is like the first example of something super goofy and hilarious that I found inside of Robby while I was

working on this so this is inside the the rubba compiler they have their definition for their bytecode format and I really enjoy the idea that they have like a magic a major version and a minor version and above that a comment being like oh no compatibility as though like they have version numbers how do you run into compatibility issues when you know what versions stuff is so anyway once you have this core gen like once you've taken the source code and turned it into bytecode you evaluate it right and that's like fundamentally how this all works if you're planning on obfuscating your card you just have to add a couple of extra steps right after curtain you

do obfuscation and you omit that into some obfuscated source file and that's what you shipped to customers and then at runtime you unpack that back into some format that's like somewhat execute and you evaluate that right and this is what it looks like this is a super contrived example of like what this might look like real-world examples look almost identical to this the caveat that I was trying to avoid trying to put things that might be water mapped into my talk because I thought that might come back to bite me so ruby is super dynamic run time debugging is like very well supported and so as like the dumbest possible starting point you can just like add instrumentation at runtime

right you can attach any buggy to the process and call RBF evallo on which lets you evaluate statements in the current context right and so you can like explore what's currently loaded inside the VM there's also a really cool debug your code pry which if any of you have done rubbery things before you might have interacted with it's pretty sweet it just drops you into a replica context and you can like type in Rubby card and explore your environment right you can get like reasonably fought with that but again like if you're interested in let's say finding a bug in like a ten million line card base like you're not going to do that by like typing where am

I in to pry a lot right it's it's just simply never gonna work it's also like really plausible to break these things like pry is really really easy to disable at runtime or at least like easy to disable in ways that are annoying and difficult to to re-enable as well as like you can turn off a bf a vowel like fairly straightforward if you're like bundling it VM for example so all of the dynamism that you can abuse as an attacker is also like sometimes really handy is a defender right because you can figure out some things that people are trying to do but Robby's open source and you can modify it and it's not a

black box binary like for example articles Jerry and so you can just slam your and debug interfaces in so I did all of this work on the on the reference implementation which is MRI I mentioned before pretty much everything targets anyway I guess like the exotic Rubby interpret is super widely used at least as far as I can tell rubbie and python both had this like fascination with trying to try to build like I guess an off lock and interpreter that went really fast but as far as I can tell like none of those attempts were really really successful but happily most of these products that like commercially obfuscated things also shipped like a ton of loaders and so in

actuality the version that they want to run it isn't necessarily the one that you have to use anyway like typically if for example you only had done rubbery targeting one version of which is what it does you can actually just like backport the vendor software to use this instead and everything will just work which is pretty cool so anyway like we're now at the point where it's like well I'm interested in having a look at the insides of some software I have like the Robbie solace like why do I go from this and so I'll be of Li seek is like the most interesting thing so we have these things called instruction sequences which are basically like strings of

black card and Robbie has this function called a BL IC which like takes one and evaluates it and this is the patch that I added to Robbie to like get all of this working it's really really straightforward all it did in this instance is every time you try and evaluate an instruction sequence it calls disassemble on it and then it prints it to stand it out right and like the immediate result of this is that your terminal explodes and catches file but in like the smoldering wreckage of that you have the entirety of the instruction sequences for like the entirety of the Ruby program that you spit out that you fed in and so for

example if this is like a commercial product with like 10 million lines of Ruby it's like it's a lot and again like it's kind of back to that like 1400 instruction based 1,400 basic block function like you're not gonna rummage through that by hand but to first order a this proves that like with a single patch to Ruby I could defeat like all obfuscation forever because this works without any knowledge of how their load it works right like they're loaded does its thing and then afterwards it calls into the VM and I'm like all cool instruction sequence is like I love looking at those and so no matter what they've done to like Mangalore untangle them I still get to

look at them in the end which was pretty cool and so at this point I was like trying to figure out what I could do to take like this well now I have access to the instruction sequences but that's not really what I wanted all along and do something more interesting with it so I started digging around for like other symbols that might be interesting so I'll be eval I seek like evaluates instruction sequences but there's also like I'll be defined method and VM defined method which is called every time you create a new method in the VM right which is like realistically more interesting to you like evaluating stuff isn't that cool but like like explaining

the structure of the card base and like defining methods is suddenly really interesting there's whole surah bf eval which like I I did pull a bunch of about pull a bunch of software in the process of this research right like you're for academic purposes what about and the vast majority of things like coal into internal like Ruby data structures and like to find methods and like do something at least like somewhat insidious and annoying cur of us but I did bump into one that like the entire their obligation engine had like it's a little more complicated this but semantically what they did was rot13 the entirety of their codebase like comments and all and then like unpack it in

memory and then just like coal eval on it which was pretty cool because like in general you never expect to get the comments back so I was like really pleased by that but like that's not I guess like you shouldn't bet on that happening every single time all right sir we have black card but like what do we do with that like how do I get further than like reading bytecode and trying to figure out what I could do with it so Robby's VM is a stack machine it basically has an instruction pointer and like a flag register with a bunch of like conditional jumping stuff in it but in general everything resides on the

stack right so all operations consume their operands from the socket and leave them behind and so as you dig through you'll see like an increasing number of like put object calls for example like this card is like fetching a constant which leaves it on the on the stack interacts with the cache which we can like happily ignore it puts another object on the cache and then it sends a send message which basically says invoke a method on the objects on the stack using the other object on the stack as an argument and then it retrieves the object from the stack and sets it to an instance variable right and this is like fundamentally how everything works right

and this is pretty cool because when we're looking at the bytecode we can basically just model the stack in memory and like figure out what it would have done and this provides like a pretty straightforward path to translating everything back into like equivalent source code it's also kind of cool because you can check your work right you take your buy code and you try and reverse engineer like you try and translate it into source code and then you compile the source code and if it comes out the same as the bytecode then like great you did it right so you have this like really easy like ability to check your work as you're going which

means that you don't do for example months of work and then discover some of your assumptions are wrong and realize that you screwed up your life which I've done a bunch of times so like I said the IR is like pretty easy to read I did actually find some bugs by reading by hand but like because it's so expressive decompilation is super tenable so I started writing a decompile uh because that seemed like a fairly obvious step at this point and the bytecode is really expressive so it wasn't super challenging but like a few hours in I was like this is really really annoying I don't want to do this I wonder if anyone else has done this

and it turns out this guy Michael Edgar adopted broke the sink over vessel and you just give it instruction sequences and it turns them back into sauce code I was like hum Ken this is really convenient this is the exact thing that I wanted sir reversal operated on like some really old versions of rubbery and it had some like sort of clown shoes bugs in it amusingly the the Rubby that it was set up to work with like literally didn't even build even when I checked out like that specific revision from github which I'm probably never going to understand I think this is like what interacting with academia is like but I'm not really positive but this project was also lost

touched in 2010 which like I think I started this in 2006 ish and so like there was a big gap and it's like understandable the bit rot kind of occurred I kind of touched on that like the VM has grown more instructions this is one of my favorite like goofy things in Rubby so the the VM has like a ton of jerk instructions in it which is like fine or whatever it like you might as well add jerks the thing that pisses me off about this is that the optimizer doesn't know anything about this instruction like it just seems counterintuitive to me like if you're gonna add a goofy instruction that puts the string a bit of bacon

lettuce and tomato on the stack you should also teach your optimizer to find references to the string literal a bit of bacon lettuce and tomato and use this instruction because it would be way cooler and it probably would have foiled my reverse engineering methods there's also this gem which my terminal can't render the help for but if you invoke the answer instruction you get 42 which makes sense I guess but so while I was doing this I like the reason I found these two instructions is because I was trying to ride to decompile and I was trying to add support for instructions that when supported and that was like well I kind of needed to know what instructions did

and so I was thinking through there's a file code instruction it's not deaf and it is like this goofy C macro thing that like defines all the instructions in the VM right and so every time I bumped into a new instruction I didn't know how it worked I was like oh sick this has documentation it's gonna be awesome so Robbie is now in English language like officially the rubbery community operates in English but this is like not really true for a lot of the card base so I was looking to expand array and the English help says expand array to numb objects and I know enough about information theory to know that like Japanese is like character for character

like meaning dense compared to English and that's also just a lot more characters and so I'm willing to bet that there's like some interesting stuff that expand array does that doesn't fit neatly into the description expander rates and um objects so I got really tight with my friends Izaak who is like learning Japanese and super excited about Robbie and so like we spend a good long time like trying to decode a lot of these help messages because we're like super interested what on earth is going on here especially like there's like enough English identifies in there like the repeated references to nail a definitely some like kooky edge by edge case behavior but to be very the flip

side of this is because I had the ability to kind of check my work by recompiling my D compilation artifacts when I couldn't figure stuff out I did have like a reasonable attempt because I just like try something and then like see how it differed from the from the gold standard and like by and large it was pretty tenable it also turns out that like of the emitted code like probably 78% of it is like effectively no ups like MRI emits a ton of instructions that relate to the cache and it turns out when you decompiling stuff you can just like throw all of them away and everything still works so I started trying to provide for Russell

I was like this project looks pretty cool I think I'm gonna use it so the first thing I did was I just like did the like edit make loop until it built cleanly and then I realized that it was targeting like a comically old rubbing that like even the load is the target everything we're like wow that's really old we don't want anything to do with that so I added support for one line three which like for a long time was like the gold standard Ravi amusingly like they released one nine four and then we're like that's a bad plan don't do that and so I got it up to 193 and I subsequently got it up to two one which is now a few

versions behind but it's like super modern enough I had support for a bunch of new instructions including some jerks because I thought it would be pretty cool if it could if reversal at least kind of knew about them and then I bundle roll up and shipped it sir that that's on Ruby so unruly is basically like a suite of things it gives you a rub EVM so you can run software with it it hooks a ton of internal behavior for reasons I'll get into in a second it reaches that traversal for the compilation so like basically you just hand it like something that invokes Ruby code hopefully eventually and it'll just like turn it back into source code and then

it prints it out for you so you can you know mess with it do what you want so like why not just use reversal on its own like why did I do all this like futzing around with the rubble um the biggest reason is because reversal is pretty cool but it's also pretty fragile it gets confused reasonably easily it's definitely not perfect and it's underlying assumption was that it could just give it a whole program and is like figure it out and ultimately it just kind of wasn't smart enough to pull that stunt off but by hooking the behavior of the VM suddenly I'm populating a lot of internal state like structurally i'm robi without reversal can tell you about

all the classes and methods in the program which like through some use cases is actually more than enough information to reverse everything anyway and ultimately because it's so deeply tied to the VM it's actually really like I've spent a lot of time trying to figure out like if I worked for one of these companies that obfuscates trouble you for a living like how I would stop people like me from doing this and I still haven't really come up with anything because you've sort of tied to all this like structural fragility of like having used to rub you in the first place so like I mentioned it does claim it can turn into a whole program back

into sauce and it has some really contrived test cases and the code base that do this where you end it like a 7 million line instruction sequence and it turns it back into the whole thing because like defining a class is still just embarking off guard as far rubba is concerned but in practice I found that it nearly always gets really confused and it like loses like it doesn't keep track of like scarp and so actually all of your classes in the entire ten million line card base are like infinitely nested which was a really fun bug when I first bumped into it because like it tries to indent everything nicely and so incremental compilation where you basically like

tease apart the structure of the program and then you hand it basically just method buddies and just like class parties and ask it to decompile them works way better this is a because there's like less internal state for it to get lost in and B because when it gets everything wrong you get to like start with a fresh context on every method so if you eyeball a method and you're like that's wrong that doesn't make any sense there's a good chance that the thing next to it might still be correct which was very much not the case when you hand it like the whole program as one giant instruction sequence so they're like a few levels to this like

the other thing that people do is like generate deliberately difficult to read by code and like I guess a really really trivial example of this is like creating a class right so like creating a class in Java for example like you can do it dynamically but you can also like basically specified instead of inside of a class file in Rubby there is like only one way he says about to show you the other way but so there's this myth this up code called define class which like take some arguments and it like creates a class in the current namespace but you can also load the constant class onto the stack and then you invert the new

method on it and like eventually this will kind of like run through enough rubbie machinery to generate a class but it does so through like to almost entirely disparate code paths and this was one of the things that reversal struggled with it first because its internal notion was that the only way to define classes is via the defined class up kurd and so it's the compiler would like pouring in on that as the only way to do it and sorry well I was like no not to worry I can fix this so I started hooking more and more stuff inside of the VM and so by hooking our be defined class it doesn't matter whether you define a class with defined

class whether you like Col class name whether you like eval the string cost on you like it doesn't matter what kind of goofy piece of plumbing you used to like wind up at a cost the instruction pointer always finds its way into a be defined class and I always get to include that in my class map all right and so this was like the the important part which was like removing a little like structural fragility that you have to deal with by virtue of like being in Rubby so I started adding a bunch of patches and they were all like really really simple this was like the first and most important one which was that there was

this like really hilarious loop where if you included reversal from user space it would start trying to be compiled itself and then like everything went sideways pretty quickly after that so I stopped doing that I started adding these like really tiny declarations to basically every method that like builds up internal state like anything that defines a constant or attach as a method to a constant inside of the VM I just added these little stubs to it which basically like he compiles it and then shoves it into this like huge class map and that class map is just a native rubble object and so if you decide that you want to work on this like you're not

stuck with just like handing it off to reversal and doing whatever like you can pop a ripple and then stop fiddling around with the class map and like look at it and like write your own static analysis pasta or whatever you can do all kinds of fun stuff which it turns that was my next slide getting ahead of myself but so this is basically what that winds up looking like there's a hash of hashes and each one of them includes like the methods includes extends and whether or not has a superclass I did run into a really fun bug yesterday when I was rewriting on my demos cuz I accidentally deleted them that like this approach actually loses

some ordering information which turns out to be semantically relevant and so this is like the next bug that I'm gonna fix but currently it doesn't understand whether like at what point you included another method included another object which it turns out like drastically alters programmed behavior but it also has a couple of other like I guess unexpected benefits one of which is that if you have lots and lots of goofy meta programming because of the way the rubb EVM works it will actually effectively unroll it for you this may or may not be a bonus if for example your meta programming exists because you needed to define 10 million methods that are nearly all the same you may not regard

it as like a huge favor that I did you by like giving you the concrete implementations of those 10 million methods but for the most part it actually works out pretty well a good example of this is if you've ever used Sinatra Sinatra has like magic get and posts like class method on its controller class and you invoked that and it like under the hood defines the method to actually handle it because of the way this works it will actually give you the concrete handler classes and not just those like getting put declarations which don't have to mostly be more useful especially if you're trying to do like a static analysis kind of thing see

yeah we could all kinds of stuff it's kind of a nightmare inside of rails cuz rails is full of magic but put the motor spot I think this is like pretty neat sir I'll be defined Clause I kind of lied to you before ah-huh be defined class is like absolutely not the only way to define a class it turns out this is like fairly unique like it's only because like classes are one of the two universal primitives like everything's either a module or a class at some point in the in the Ruby hierarchy and it turns out like hooking ah be defined class didn't get me far enough so I just like added my hook function to literally everything

in class let's see and that like pretty much made the problem go away alright so I do have a demo you're gonna have to bear with me for a second because while I was trying to make this dongle work I restarted my computer and so am I like carefully laid-out demo is uh not anymore but I bet I can do this the live demos always go so well alright so this should be over here and then I don't remember how to fullscreen things on my new laptop and if I I'm big in this somewhat can you see that okay sweet so it's not like I started really late this would be fine I've got heaps of time

and sweet okay sir I have this fan here and I have a copy of an Roby and then I have so I definitely did a commercial application which I'm praying doesn't put its watermark near the end and I used it to encode the CSV library from Robbie itself because I was like well at least like the thing I'm encoding is in commercial so that's good so anyway this is like basically what it spits out it does like a little bit of magic to try and find its loader and then it hands a base64 encoded string off to its loader and it's like hey load this in and so if everything goes really really well which it work cuz control

Allah doesn't work oh God how does this yeah it looks about right okay so what I'm doing here is I'm deferring reversal because it tries to learn it itself but I found a ridiculous bug in it and I couldn't be bothered fixing it so I'm telling unremedied not to try and find reversal on its own I'm immediately telling it where it can find reversal so that it will work and then I'm inverting on Robbie's just like a drop-in for Ruby and it will also like evaluate your code so if you're like for some reason trying to figure out like pact malware and probably because I guess that's the thing people maybe they're like don't do

it on your machine because it like will actually run the code but anyway so I'm going to invert this and after loading the CSV library it's spat out like the contents of the CSV module which is pretty neat they should be more that's weird something's gone horribly wrong I am gonna blame restarting my computer anyway I swear this demo worked yesterday in the green room if we have time afterwards I'll dig around but I would prefer not to waste all of your time while I butts with my group you think it's all right you don't have to clap failed demos get nothing sorry uh-huh like how do we actually like make this like work the fact that Robbie is

like super willing to basically do anything for anyone is like really practical for us because we can do a lot of goofy stuff to the VM Before we jump into like the I guess like attacking control code so we can set up an environment where we're more likely to succeed so we can preload our library and then like let it jump in and then we can like hijack execution like fiddle with more stuff this is super useful for the case like where I wanted to not try and disassemble reversal itself because that like led to some pain and then we just add an ADD exit hook so basically when the VM exits cleanly we're like hey take

the entirety of the class map that you generated and just like spit it out to stand out but so let's say you want to like break something in the real world right you have three SC SC library that you're curious about how it works then you can't remember how to get the sauce out of the rebbe tree like how do you do this in the real world when you have like a real application that you want to look at the source for but you don't want to accidentally dissemble the entirety of rails the the naive way to do this is to just implement rails without anybody's nearly all of the applications that I looked at where

rails apps because that's like what enterprise companies build these days and so if you basically include enough shims that like all of the calls into rails like weren't no method arrow then you get enough plumbing that it can like to find all his classes and do its thing but you can also do this like the kind of ridiculous way that's guaranteed to like duck punch away to glory so if you're not familiar method missing and cons missing are magic methods that are invoked when you try to access a method or a constant that doesn't exist and you can also reopen classes and like add more methods to them later which is like a really bizarre miss feature rubbing I

guess so I wrote these two stubs the first one basically says whenever you call a method on a stub that doesn't exist just return and you stub that also does that and on object I was like every time you try and access a class that doesn't exist return to stub right and so in actuality this means that if you try and access anything in the rubbing name space you get a stub and if you try any bug any method on it you get a stub that makes everything work right and so it's like literally impossible to receive in our method error and no constant error after this and so if you run any piece of source no matter what

kind of pre-existing state it expects to exist in the universe it will like run cleanly enough that reversal will get to look at all of its instruction sequences which is pretty cool because typically like ruby applications are split into like not one gigantic file with all the source code in it they're like organized into a hierarchy that next to be alerted in a really specific cordon and like generally speaking you don't want to like load the top-level thing and then receive like 10 jillion lines of Rubby and like put them into rubble abbey and then like open it and vim and be like man I better know with a buzzsaw you want to look through file by

file and like figure out like the one called secret sta'abi is probably a really good place to stop and so this lets you like decompile file by file and instead of being forced to like look at the whole thing at once so the next thing i kind of thought about was like nearly everything i was progressing like had license servers and like it's not it's kind of challenging to figure out whether or not like i'm doing something bad and actually stop me but it's like pretty straightforward to like reach out to your licensing server to be like hey here's the check some of the Ruby like the Ruby evaluator that I'm running with I don't think it's right thoughts and so

I like figured that would probably be bad and so when I was first working on this I was like working in unreadable VM and that was kind of my isolation primitive to make sure that the things always tampering with weren't like calling harm to be like hey you should talk to that Richard butts guy I don't think this is like part of the end-user License Agreement but that was super miserable because like I spent a lot of time realizing that I like had forgotten something and like I'd have to like kill everything and then like make it routable so I could like put stuff into it and then like make it unravel again and like I end up making compromises and

in the end like if people were actually trying to keep an eye on me then like I failed miserably so the next thing I did was add this magic environment called on rub you suck at hack all I did was I patched up the entirety of the socket module to like not do anything if this environment variable is set and so anything that tries to create a native rubbing socket will transparently fail it will like act as though it created a socket and it'll read some writes that socket will just like instantly return but like this isn't like a perfect defense like again if you're like doing some insane like turner of us malware something like anything that knows how

to invert sis coals or coal ash an object can still like reach socket like this isn't a real sandbox but it is like really cheap insurance against the like they imported it Ruby HTTP library to like do stuff so yeah I guess like what what do you get in the box if you like decide to use Robby you get a ruby sauce tree which is being like patched out the Alice to have like a bunch of helpers and stuff to make it really easy to do this you get a patch version of reversal that works with that version of Robbie you get a rail shim that's like less invasive than like ridiculous like duck punch the universe and two always

succeeding thing which actually works better because it marks a lot of bugs and so if you want to play with it that would be awesome like I highly encourage people to either mess with Unruh B or to try and write the same thing ideally for the JVM because I think it'd be cool but like really any bytecode VM you can do this and there's this like unreal you reportbug' thing which if you export it it's an environment variable it'll like if the decompiler chokes it'll like spit out the instruction sequence the perk it and a little bit of information and like you can kind of redact it and then ideally send it to me so there's some

other stuff in there on Robbie fly sink is the I guess that first prototype I wrote that like dumps instruction sequences out to the terminal it's like unless you want to feel like you're compiling gen 2 or something it's like pretty useless because it just bizzy's up your terminal but like occasionally it's useful for debugging unwary methods is like the default thing that you want which just emits a class map complete with populated sauce and it's actually been a while I do know for a fact that exporting Yolo does something I don't remember what but I bet it's good so yeah why not you can also like grab the thing but you can also tie with you or Tyler duh

Robbie has this like facility it's really similar to PHP if any of you have like ruined your life in that specific way where you can describe a pattern that basically says like if you bump into a class and you don't know where it is I'm sorry if you if you try to access a class and that class does not exist like inside of the current class hierarchy like here is how you should figure out which file to evaluate in order to like do it and a order load is a suite because like a lot of times it's like remote critics concurred execution because like you just have to give them a craftily named clot like if you can coax them into

loading an attack in controlled class they'll just like a burly require files in the file system which is pretty sweet like your one attacker controlled upload for victory but you can also have use the auto loader for fun of fun and profit here in order to like let it explore the program for you and kind of tell you where the fun stuff is so I touched on this before that that always the goofy meta programming will get expanded in the class I think one of the things I really want to do is like actually gate this behind a switch to like figure out how many I guess like blocks tape you out when you're defining methods and decide

whether or not to do it to avoid that like state explosion of receiving like a 10 million method cost them so yeah like how how do you actually do this in the real world it currently targets Robbie to one which is modern ish I guess I don't think there's any I don't think the current Robbie which is like too full of these days has anything backwards incompatible four to one there's just like more modules in the standard library and some performance stuff Venice ship shims for there Robbie one fun thing that you can do is like if you're looking at a bundle program and it only comes with like a single erta for there like diversion of

Robbie that they intended you use you can typically just Google the name of the Lord of they're using and then go to their thing and then get a dimmer and then get the shim that works with Unruh because it's the same version and then just like put that by the side of it and all the sudden they'll all work because everything's super compatible which is pretty neat you can also like if in the case where some some applications like obviously both this Java and so this is like a bad simile but like Burke has like an icon that you double click and it like invokes its are bundled Java which is like hidden somewhere in there

so like tampering like you need to open up the package and like stick your malicious Java in there you can do the same thing with rubbery programs right like if you open up I really started me to think of a good example but they're like tons of desktop applications that are like Robbie under the hood you can just open up the package file and like drop on rubbing instead of it and then like everything would just work which is pretty cool so like I said I thought I spent some time like trying to figure out how I would defeat this like if I work for that company how would I start some jerk like me from doing this and

it's a little unclear right like it turns out Robbie it's just like a bad language to implement your software and if you want to give it to other people like honestly I think the only real solution is to SGX really really hard right like you literally put it inside of any like an execution Enclave to like the hypervisor can't look into but that's kind of extreme and probably more work than anyone's really willing to put into this you can definitely like you you can play like a tack a tag for a while you can just like introduce more annoying things that you need to bypass but like this is not skewed in favor of the defender right like in actuality is

probably really annoying for you to implement those checks and it's really not that annoying for me to like make I'm rubbing past them as well as like any checks that you do inside of rubble and like it's pretty easy to smash them as a whole like in the same way that I made all class and method resolutions just like transparently work because it was easy for me like it's not actually that hard to just like patch all equality comparisons to be like well the right hand side looks like it has a sort of checksum look to it so yeah the left hand side is definitely also the same checksum it's like something you can reasonably easily do inside of Rubby

which like would break a lot of naive attempts and so like I think realistically your best bet is to include shadow definitions of like all of the like callosum method definition machinery so that you can like twiddle with the internal data structures without ever in working my hooks I at the same time like if you did that I would totally just like ochio's with pin and we'd be right back where we started and so it really upsets me to say this and Sam Collinson and like wanted to be quoted in my slides but isn't because I forgot but like I think nerd might almost be a good idea which is horrifying but purely because you have

access to this chip suddenly you have like you're still gonna lose like you're still coughing up like a lot of the structural information about like how your program fits together but at least you do have the option to like emit like native code that makes reverses hate themselves for all of your method bodies you're running to this like reverse problem where you need to convince it to get literally everything and you need to figure out how to like stole the shitty curtain hand that off but like terrifyingly I think like you should write your enterprise app in nerd I guess Courtney or at me the cool the good news is there's like no obvious way

to defeat this attack which is great and the cost to attack is like really really low but you can definitely apply this to other things which is like the question I typically get asked at the end so yeah go on reverse stuff so like I mentioned this is like the most important thing I fixed all the bugs that I bumped into and I reversed some like pretty substantial pieces of software which I think means I have like reasonable coverage but like every time I tried something new I like consistently bumped into something goofy like the coverage I have for up curbs is like deliberately driven by up cars I saw like it just looked really annoying

to implement all of them so I didn't when I could get away with not doing that so please do reach out like in actuality fixing the bugs is really easy I just need to know where they out so I wanted to give you a sneak peek of something I'm working on right now I actually just offloaded this to someone new on my team but it turns out like porting AFL to Rubby is like not super complicated especially at the point we have like this fine grained information and what the VM is doing so I wrote a little C shim that like influenced the FL shed memory protocol and then I used trace point to basically

be like hey every time you do something like reach into this so I can total bits so it's now possible to fuzz plaque box rubbie applications and then translate the crashes back into solos in like a totally hands-off way that like you you don't need to know literally anything about either the way it's packed or written you've just like handed garbage inputs until something cool happens and then you like have a look at the source of the bug which i think is pretty cool yeah so I want to teach out a few people out rich Smith where the single paretic PI retic but about he really said a black out like forever ago it was a

fundamentally kind of similar thing for Python it definitely didn't do a lot of like ridiculous hands off just rename a binary and you're off to the races this kind of thing but he definitely kind of pioneered this idea of extracting like a semantic class map and then populating everything post factor which I was super grateful he also give really good talk about it Michael I'd go for writing reversal like if he didn't do that I would have had to write the B compiler and that would have made me really sad I'm shouting out the wrong conference in my slides which is really embarrassing I [ __ ] up pretty badly sorry Sylvia but really thinks these sites are having

me I obviously spoke about this at troopas once and Dominic yeah I just didn't update the slide because it was towards the end and I didn't get around to reading it sorry I'm so sorry anyway if you have any questions I'd love to answer a few of them right now or you can ping me on Twitter or you can email me at work which is where I've done this substantial amount of work of this cool are there any questions we might just have time for one question and a selfie of richer as well how much program execution do you need to say before you get a useful class map out of Ruby sorry it was a little hottie

it was the question how much program execution do you need to say yes so like the answer is like maybe none maybe all of it sir the way this fundamentally works is Ruby like Ruby doesn't have like a separate like compile and run step the way many languages through and this isn't like novel to Ruby pipe Python works the same way but like in actuality when you run a ruby program like all of the class definition method definition stuff is just it's imperative card but instead of like immediately having like side effects where like print something or does arithmetic it defines classes inside of the local hierarchy and on Robby looks at them in real time right so like if the program

for example like defines a bunch of classes and then does a bunch of stuff with those classes and then defines a bunch more classes I guess like the concrete answer to your question is like in the hypothetical example if it crashes in that middle part where it interacts with the first set of classes you weren't good to see anything about the second set because they were never defined but you also like if you're asking do you need to exercise the code in those classes in order to get this source of them the answer is not as soon as they're defined like on row B has looked at them it's like figured them out and you will get to look at them so

like the ideal case is that you can load all of the card without ever actually like in the I guess like web application is a service behind your file thing like if you can load all the card without starting the web server it's great because you want to get like debug logs in amongst your source for example is that it's your question suite I think there's time for that Sophie that you wanted at the beginning of you if you so watch yeah there is today's a good day alright everyone smile because I'll know if you're not rude all right thank you all so much let's thank rich I for a great tool [Applause]

[ feedback ]