← All talks

Automated, Generic System Call Hooking, And Interpretation

BSides Luxembourg · 201821:17161 viewsPublished 2018-10Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
When doing malware analysis, monitoring application behavior plays an essential role. To do that, one of our most used mechanisms is system call monitoring. In contrast to approaches that e.g. put breakpoints into mapped libraries, we employ a VM monitor approach that catches the SYSCALL instruction itself. When interrupting processes on such instructions, what syscall did they actually attempt to use and what do the parameters mean, as it’s all encoded in CPU registers or the stack? In order to solve that problem in the most elegant way for our users, we needed a library of signatures of all syscalls along with the data structure types and formats they use. We automated the process of harvesting system calls and data structure definitions from multiple public open-source sources and preprocessing them to produce a single data source in machine-readable form. We present the results that we keep OSS to share with the community, and demonstrate how this improves the analyst workflow.
Show transcript [en]

thank you and thanks to all of you who are sticking around to this last talk I'll try to make it entertaining and quick so I'll repeat some of that interjection again but I'll go quickly through it so I'm a software developer with OS and virtualization background I studied at the Technical University in Dresden which is the home for something called the Nova micro hypervisor without going into too much detail but micro hypervisor is a very tiny operating system kernel that's been designed with virtualization in mind from the start and it only implements the necessary mechanisms in the kernel mode and then everything else is built on top of that in the user land yeah as mentioned I've

worked with Intel and fireEye where we try to apply virtual machine technology in security related areas for example endpoint protection or malware analysis after that we decided to found this company cypress technology where we wanted to bring something on the market that features Nova at its core and that brings me to the motivation of this talk which is our Tyco malware analysis platform and again I won't go into too much detail but the platform looks like this we have the analysts laptop on the Left which contains your usual work environment like Ida Pro debuggers volatility static analysis stuff and then you have a target that's connected to it that basically runs just an normal install Windows System unmodified and

then we just slip our hypervisor and the virtualization underneath and we use it to analyze malware now the question is what do we actually analyze in malware and that's the application behavior so we have this application that's usually to do something that is useful to the application it accesses files or the network or the Windows registry and the way to do that on pretty much every operating system I know is system calls and so a system call is just the interface between the kernel the operating system kernel and the application that wants some thing done so the operating system is there to service requests that are signified by system calls and these include stuff like file access network

access you know all those things that you want to yeah control as an application including studying processes threads and modifying your memory layout and in the application and for the rest of the talk we'll be using one system called example to see how we interpret this in our analysis platform and again you don't have to look too closely well yeah this system call will haunt you for the rest of the talk this is how the system call looks like an Ida Pro you just have something here that's the system called instruction and then you have just a bunch of numbers ones and zeros and nothing makes any sense what we actually want is a picture like this

so we have printed output of the system called parameters of what this is actually about we don't want to look at all those numbers in detail all the time yeah and the way from the previous bit image to this one will be what I'm gonna be talking about in this talk but first we have to feel some of that pain so we do that manually first and then see how we can automate it and first of all we have to think about how we interpret the system call on the Windows machine and one of the key things is we have to find out which system call it actually is and that is sort in a register called

IRA X on 64-bit Windows so that we actually know and we also know that the result of the system call after it was handled is also stored in that register and then the system called on Windows pretty much follows the Microsoft calling convention so we have four registers four parameters for this is our CX RDX r8 and r9 however on system call that doesn't really work because our CX is used internally so that is swapped with our 10 but still we end up with these four registers for the first four parameters the rest is just past on the sack but for the rest of this talk to make it simple we just focus on the register

once so it doesn't get too complicated so what do we need to know in order to interpret the system call now we need to know the Windows version for example we have a Windows 7 Service Pack 1 and then we need the architectural state which contains the registers that we've been talking about and we also need the memory dumps and both of them we need before and after the system calls so we see what went into the corner and what came out of it now coming back to the example we have this register state now in a little bit more readable form because I've already highlighted the four parameters for you and also the number now we see it's hexadecimal

52 and we have four parameters that still don't really make sense to most of us I guess and what we also know is after the system called the result is zero so yeah it's the first information we understand it's the error code zero which is successful but all the rest is still yeah pretty much unclear we have to figure out what system for was that and what parameters went in now how do we find that of course as usual the internet knows things there is for example the Microsoft developer documentation unfortunately that's pretty incomplete when it comes to those low-level system calls and that's also partly on purpose because that's something Microsoft usually wants to hide because they have

their own system called wrapper libraries like ntdll that contain all the wrapper code so you as a developer don't have to worry about it you just call a function and out comes the result but for us from the hardware low-level perspective we have to deal with this low-level system call so we have to find out what it's all about and of course we're not the only ones there are third-party websites who contain who gathered this information and presented it and I've give you I'll give you two examples one of them is Vic's ilium org which contains the system call list for all the Windows versions that this person found and the respective numbers for the system calls

and then there's also something called undocumented auntie internals which contains system called signatures for most of the system calls we're looking at so again coming back to the example we can do a first assessment with those websites and we have the signature here for empty create file because this x52 that i've shown you earlier actually corresponds to auntie create fire on windows 7 so now we can make sense of the parameters we have the first one pointing to a file handle that will be the result of the system call and then we have something called desired access which is the access that the application wants and we have object attributes which is again a pointer we'll come to

that in a bit and then we have another parameter that yeah it's just there for completeness because it's the fourth one now as I mentioned this there are pointers in there so to interpret what's going into the kernel we have to follow that pointer for example this object attributes thing because we want to make sense out of it so we take this number and yeah it's we take this pointer to the data structure read it from the memory dump and then yeah kind of have to make sense of it but what's this structure layout you have no idea again so we also need type information in addition to this signature signature of the system call

but yeah the previous paste that we found doesn't have this information again so we're looking somewhere else and where do we look if anyone of you is familiar with the windows debugger it will know that the windows debugger has these type information but where does it get that from there's something called the symbol fires the program data phase files and these contain type information and luckily we can download them from the Microsoft server and again luckily there are pausing libraries for Python that can just interpret that file for us and give us the output we need and to warn you a bit there's a lot of text but I've highlighted some of it again this is what comes out of the the

program database file if we look at this signature and the object attributes structure that's actually the one at the top and the one I'm particularly interested in today is the object name which is at a given offset within the structure and it has a certain type again but this type is again something non-trivial so we have to reinterpret the PDB file and find that type inside and that's the lower part this is actually the unicode string type in Windows that's a began a structure that contains the length and a pointer to the buffer of that string now with this knowledge we can actually go into the memory dump and see what's in there and this is looks like this we have the

pointer to the object attributes which is this one and at the beginning we have the length that's not too interesting but then at offset 16 so the second line we have a net an 8 byte pointer in this case little-endian so we have to read it backwards and we end up with something to 2f for 10 so we go again into the memory dump and read that part which is this one there a Unicode string and again that and the given offset we have an 8 byte pointer read it backwards we have to d1 a 80 we go into the memory dump again and finally read the string of the object name attribute which is

some start MoDOT but fire in the administrator home directory those of you who already met us at the booth they will know this string already the rest of us maybe you will find out later and now this decoding the memory is quite a cumbersome process we have to do it we have to find the memory location we have to decode a structure so we have to find out what this structure is what it looks like what members it has and then we have to follow pointers again so we do the whole thing over and over again until we find the information we actually want and once we've done that even if we write it in a script

it's gonna be pretty tailored to the specific system called weave we're dealing with and if we have to decode the next system call we end up doing most of it again or at least modify it now to write a generic script we would actually need something machine readable like a lookup knowledge database and this is what we've built so to remind you what we want is to go from ones and zeros through one interpreter module to something like this nice picture that just shows us our information and this is where the main core of this talk comes into play which is our s internal scraping toolkit so we've seen all this information on the Internet and what we

want is you want to take it all in and munch it up and then out comes the knowledge base that we can use in the script so we take these sources that I've talked about the Internet and the symbol files and also sometimes we need some custom knowledge to make sense of out of all of it we throw it into this this Graper and hauser for the pdb files and what comes out is a knowledge database containing the system core definitions numbers and also the types and the current sources we have for this are the aforementioned Beck zillion dot org for the system called lists of numbers and we have the intern in the internals website for the signature

definitions and then we also have the symbol files from Microsoft for the type definitions and now I'll show you how this output knowledgebase looks like first of all we have the system call list which is just a normal table mapping the system call name to the respective number on the Windows version we're dealing with as you can see sometimes the number is different in different Windows versions so we have to deal with that then once we find the corresponding name for the number we have to still you have to look at the system called signatures which is represented in Jason in our case and that's just a list Maps the system call name to a list of

parameters that it takes and every parameter is described by the direction so if it's going into the corner out of the corner or both it describes the type and also the name of it and this type that is in there we can actually look up in the type definitions that's the third output artifact we have which is just a map from object from the type name for example object attributes sorry that map's it to the length of the structure and then all the members that are again described by their name the offset and the type so we can go recursively and find in the object attributes the only unicode string and then find the unicode string in the same file and see

what's in there now with all this we can do this painful manual interpretation I've shown you earlier just in one script and the script looks like this by the way you can download the entire presentation and the example together with jumps to play with it yourself with this URL it's also on the summary slide and yeah what you can see if you run that script on the register states and the before and after a memory dump you see that it is the anti create filesystem call it finds the parameter going into the kernel that's called desired access it's a bit mass and then it finds this object attributes pointer so it will follow that one because it's

a pointer argument it will decode it by looking it up and say there's a length parameter root directory and then it finds this Unicode string which is again a pointer so it follows it and it prints this name that we've seen earlier and then it just goes through the rest which is not too interesting but it also find then that the system call has a result of 0 so we're now moving to the afterwards interpretation and one of the output parameters if you remember is the file handler which is again a pointer so it follows it and says the handle is hex 34 and then just for completeness it also outputs the rest yeah so this

also looks already looks quite nice but there is always when there's some good there's also some bad and ugly involved so as I've mentioned before maybe you've heard it there are some missing entries because this and the internal website only has documentation for most of the system calls we're looking for for example I was looking the other day for anti open key X and this one was not included in the website there are also sometimes some inaccuracies in those websites for example there is a missing pointer indicator in this system call when combined input output parameter is marked as you long when instead it should be PU long as a pointer so we have to kind of deal with this in our

knowledge base because it would be bad if you look up the wrong type and that's why we have some work infrastructure to deal with these Corrections at least until after we've reported it to the website maintainer until that is fixed we have to kind of deal with it local which brings us also to the next steps because of course we want to add more sources to this we want more coverage and more accuracy and one example that I've left out earlier where we can find stuff is source code of Windows related projects because those projects already have this kind of information in them they implement part of this interface so they have to know it and that's why we

want to look at stuff like react OS wine or process hacker and pass their source code or use some part of their knowledge base to include it in ours or may be merged and at some point and one other thing is if we build something like a scraper we don't want to do that all the time because maybe there is no update maybe the updates are wrong maybe the updates are incomplete so we want some kind of versioning and update checker and what we also want is more operating systems because currently we only support Windows 7 64bit because that's the one we needed of course it would be nice to include all the other operating systems as well

and the result of all of that is we realized that there is no such thing as the complete knowledge base on the internet yet and part of the reason is that everyone just builds their own little stuff that they need and then they're happy so we decided to open source this part and we've already announced this this knowledge based generation is available on our public github account and what we're hoping for is to engage with the open source community and make this better over time and there's another thing we're running a little bit late so it might not be possible but thanks to all of you who already participated in our tiny CTF challenge we had a life set up there

where you can play with this system called interpretation live on a running system now to summarize this and then yeah we all can go into a nice evening and have a beer we've seen that interpreting low level system cars is very hard and annoying work and part of that is that the information is scattered around the internet and it's difficult to find and it's cumbersome but then what we've also seen is that once you have the nice knowledge database scalability really can Rock and you can build something nice that can be applied in a generic way to every system how do you find and then we also hopefully establish that open source rocks because we can just all profit

from this effort and hopefully never do this again so as I mentioned we have this on our public gitlab the presentation and the example can be downloaded here and if you need anything more our wanna get in touch with us there's our Twitter handles - clay party and we're Cypress tech and with that I'd like to thank you for the attention and if there are any questions fire away [Applause] so thank you Marcus twas really interesting anyone asks question okay so I do have questions in the canal isn't there an August structure that would allow you at least to know whether at least the number of parameter of the Cisco and some kind of type information

involved the Cisco where exactly would you find us I think I'm not a colonel expert but I think I've read that there is the sea school table inside the channel structure but also an odd list table which is maintained in the Canon so that when it goes into the Cisco it knows what kind of argument it expects no I have to admit I didn't hear of that so it's worth looking into it I don't know we don't have any data on this I'm sorry okay yes and that was my only question so I think everyone wants to go home now maybe okay so thank you thank you [Applause]