
so I got my start working in disc forensics about 5 years ago four years ago I don't know and um for the last year and a half I've been working on memory forensics uh and malware is kind of a newer feel to me before I was doing more um like task focused just figure out what this artifact is um so yeah uh this is my presentation it's going to be on Hunting fileless malware with Tre setter uh this is my first bsides and my first time talking at a bsides so uh let's
go so what I've been working on kind of as a pet project is writing a tool that can both detect and deop skate trees I mean um Powershell scripts that have been put through not station tool um and I've been using the tree Setter library to do that is anybody familiar with that anybody heard of it a few so um I'm going to give you all some insight into what that is and why it's so
useful uh Tre Setter was originally developed for the atom text editor it is a tool that can take a grammar for a programming language that you write in JavaScript and convert it converts it into Json and then into a really fast C parser and once that is done it provides you with an API for accessing elements of the tree and also for uh running queries against the tree um for a while this wasn't something that I looked really far into it was kind of just like some thing that is part of these days and I was like I don't know what this is I don't know if it's worth the time to learn it I'm just going to
put it on the side until I was on YouTube one
day and I watched this video on by this guy TJ debris who explained how you can use this tool to uh basically do syntax highlighting on SQL embedded in your source code if it's inside of like for instance a RW string literal and rust and I was like this is really cool like uh I want to have this as part of my development workflow so I I spent the time figuring out how to use it
uh so um yeah there's many language grammar out there for all kinds of programming languages that are pretty well supported um if if it's a common programming language it probably has a really awesome grammar for it and even some of the Lesser known ones uh have support language ner love WR yeah thank God for them uh there's yeah see so kind of the most the power feature of treesitter is these uh queries so you can write what are called s Expressions if anybody's ever used Lis or scheme you're probably familiar with those but they allow you to express um as a kind of data structure uh what you're looking for um so this is kind of what I wanted to
achieve which is where my SQL is highlighted as sequel inside of P python which is highlighted as python because that's just really fun um you can also like grab that part of the document out shell it out to like a formatter and have your SQL formatted for you and reinjected your documents so as a developer it's just really handy um and
satisfying so this is what this looks like in um in neoven there's a actually a built-in tree inspector so once you get your language grammar properly set up and installed you can in command mode run the inspect tree command and on the left we'll pop up this little window where you can navigate through your source code as a syntax tree and if you press o from within that panel it'll open a window on the top where you can uh experiment with queries in this case this query is looking for an assignment statement where the left identifier is equal to the word Fu and it has been labeled statement and the identifier has been labeled ID that's what the little at
symbols for those are called CES and in tre setter so uh on the right side of my document the word ID there on the far right is actually um being produced from the query editor it's saying hey I found your ID it's right there it's
F here's another one that is like a little bit more complex it has a couple of requirements the first one is that the string on the left of a binary operation is equal to hello and the string on the right matches WHL w r l um and that actually performs a regular expression match on the content of the node so you can perform these like really nice targeted
searches and those are called uh predicates the things on the right so how does this tie into Power shell OB station so once I had um learned how to use this tool and integrated into my workflows in some pretty cool ways uh I was talking to my boss one day about enumerating commands that like are run from bash scripts and he was like that's probably like kind of a messy task I was like well maybe not like if you have access to the syntax tree for a bash script then you can just query for commands and um it would be like pretty trivial to list all the commands that are run inside of a script and I was like okay so if there's
one for bash I wonder if there's one for
Powershell oops so Microsoft had produced a grammar for Powershell that was being maintained by them for a while back like five years or something but it had kind of uh been neglected and it was out of sync with the current Tre Setter Library you couldn't actually use the grammar so I started looking through the forks and I noticed that the Airbus sear team had adopted it and was actively maintaining it and I was like aha if a uh if an inent response team is using this then we can probably do some pretty interesting things with it so I'm grateful that they have taken up the mantle of maintaining that
since my kind of pet project is still in the early stages I've really focused on using the info obious Library as kind of like my uh test data generator um and it's a well-known tool for opy skating Powershell scripts it uses several different types of opy station I won't cover every single one of them today but there are some that are extremely common
so the first one is uh token op station um there's three options on the menu when you running B op station for this the first one injects back ticks into the uh command the next one uses the command invocation operator on the left which is the ersan and then concatenates the pieces inside of a pipeline and then evaluates it that way and the third one third one uses format strings um in a similar fashion with the invoke expression operator uh and this is problematic for if you want to use traditional tools to um search for bad things in a Powershell script like uh call a virtual aloc or create process um or any of the other many many
things invoke web requests things that you just like generally would want to know about so um yeah so a traditional scanner like it's hard to make sense of this but uh Tre Setter actually can pinpoint the exact location in a script where there's a command invocation expression and here's an example of that there's a um library on GitHub called tree graer that I kind of hacked in Powershell support for and it allows you to like rep files but with Tre Tre Setter queries instead of uh regular expressions or patterns so in this case it's running my splack and Cat query that I put into a file it's like too big to put right there but you can see that it actually
spits out the exact part of the document that matches your treer query which is pretty cool
there's also as based OB stations um a lot of these that I looked at don't actually impact your ability to detect or decate much um with the one notable exception is the uh set variable a trick that you can see here where instead of using a traditional variable assignment it's done dynamically so um as you'll see later you have to be kind of careful with that but generally reordering things doesn't matter in tre Setter because uh the S expressions are not sensitive to the order of operations necessarily like if you say find me a command with an argument and the argument has like you know another argument to it it doesn't really care that those are in the
correct order it's going to look at all the child nodes and see when that matches
uh this is an example of string out station um what this does is it puts the entire command into a string and again uses the invoke the invoke expression operator to run that command and this one has a whole bunch of stuff down to it it's uh using environment variables it is doing format strings it's doing catenation casting from integers down to characters so the original script is basically unrecognizable in this
instance going back to those environment variables you can see that they've scrambled the they've mixed the case and they're indexing into it in order to produce the Alias which is an alias for invoke expression which allows you to run Powershell from a string basically take the string and convert it into a Powershell runnable uh presetter again easily detects this because of the invoke invoke expression node
type and here's another example of those uh type cast tricks where you make characters from integer literals it's just another step to slow you down and make you have to um manually analyze this and there on the bottom is example of how Tre set or queries can locate those exact locations in the document where a cast expression is taking place so it's really doing a lot of the hard work for us of figuring out uh the components of a Powershell script and producing a useful interface for dealing with that there's also compression in the uh invoke OB Station Library where it will like put everything into a compressed base 64 blob and then uh run
it through this pipeline that decompresses it from a b64 stream and then uses once again invoke expression on the left from the uh the on spec environment
variable so detection is like really only a small part of the problem because analysts have to very carefully decate these detected scripts find out what they did what they might do um and then go from there this can really slow you down if you don't have a good way to do it um and time is precious when you're doing an response uh these obious stations are often layered on top of one another so you might have a format string that gets split up into a concatenation operation and it can just get very messy even though it all evaluates down to the same exact power shell script and there's really not great publicly available tooling for this that
don't require you to run a Sandbox since you don't just like running random po shell you don't know what it's going to do
so going back to treesitter it was developed for code editors like neov and Adam and there's an API for actually editing the tree so um you can take a node and say hey tree set or API I would like to update this tree at this node with the with this text and um it will then requery parse the entire not the entire tree it will reparse the tree but only the ranges that you changed so it's extremely fast and that's how it's able to uh update the syntax tree of a document on every single keystroke is because it doesn't need to reparse the entire document it only needs to reparse a single section of it so what I did for my tool was I
identified small atomic operations that I thought were reasonable to undo that were part of invo copy station so like casting an integer to a car okay I'm going to edit the syntax tree and I'm going to convert that back down to a literal s and then when I see two string oper ends on a concatenation operation concatenate those back together and you just run those in a loop until you stop getting matches on those kind of reducible operations
so this is a basic example of that um I can be reduced back to I because we have no reason to keep strings separate and being added together like that um you can do your type cast format strings back ticks can be stripped out comments can be stripped out uh you can once you've done a lot of those things start fixing the case on things if you keep a you know just a hashmap of for a hash set of common um Powershell commands uh you can rip rip out comments I've seen bad Powershell scripts that have just like giant areas of comments to just be annoying and also uh mess with signature
analysis so this is like kind of Microsoft's Baseline for detecting bad power and I used string obus on that to produce this nasty looking thing and so here's my tool in action it goes through it and you end up with pretty much the exact same thing that you started with um and that actually runs much faster than that I'm putting the thread to sleep between each individual iteration and at the end it will show you exactly what Atomic operations were completed in this case nine format strings 11 string literal pipelines two cast expressions and I missed the last one uh one string member usage so this was an even bigger sample and this was the kind the one that I
kind of was working against as I was developing this I was like if I can get this like back to a usable to a readable State I'll be pretty happy with where I'm at
and lo and behold after a lot of fighting with the r compiler and writing tree Setter queries and debugging them and all that stuff um I have something that can make this usable uh you can see it works from front to back kind of uh going through each of those atomic operations carrying out the edit reparsing re-editing
another trick that I did with this particular script was um handled the random variable names almost every sample I've looked at has like just like randomized variables which are extremely hard to like keep in your head like bz2 xqp and like another variable Nam something very similar uh it's just like hard to keep track of mentally when you're analyzing something even if you have to even if you have like partially deated it so I added a layer that goes through the variables and uh renames them with like this Docker container kind of thing where okay I can see that in a document I don't have to like think of bz2 bz2 like No And so this is the end result of the
original test script um still has a couple things to figure out the one of the big challenges is uh null variables in Powershell um you can actually use uninitialized variables in any operation and they will like evaluate To None So when you like strip these things out you have to be very careful that you're not deleting something that is actually initialized in a part of the script that you haven't yated yet my thinking at this point is to just like wait until you've done all the other ones and then uh get rid of any variable that looks like junk and then go again and see if you find anything
else so you can do some even more complex things with queries than just finding like these simple Atomic elements this sample has um this kind of like strip block launcher where it uses a stream reader from Gip stream from base 64 text and uh to make things worse it actually the Bas 64 string on the inside has these format operations and concatenations and stuff apply to it so it's not as easy as copying and pasting this and uh piping it into like B 64 and then gunzip or something so this is the query that I wrote for de skating for or for locating these kinds of things right here and it looks Giant and nasty but it actually
wasn't too hard to develop because you can cut out parts of the syntax tree and then use those as the base of your queries so I found a sample that looked like this I snipped out the uh part that was the launcher and then I added conditions requiring that for instance the first command is equal to new object uh the second command is equal to or the object type is stream reader uh new object Gip stream new object memory stream and so on and so forth and at the very bottom sorry tiny text don't worry about it is a a requirement that it is actually a string a single string literal and not a whole bunch of other stuff glue
together so this can prevent it from attempting to unpack something like this until all the other operations have been carried out like this so it goes through the internal part of the base 64 um and then once it matches it unpacks the payload pipes it or like sends it to Gip and uh you get to see what the actual payload
was another cool use case that I've been thinking about um and this is one I kind of just like hacked together at the last minute is extracting binary payloads especially for uh Powershell scripts that might be used to like inject Shell Code into another process you can write a query that looks for uh bite arrays that have a like a fixed type of uh variable in each one and automatically pull it out convert it to a a vector and then I put it through this simply Library just as a pro proof of concept here so one of the great things about tree sitter is error recovery um especially in memory forensics because we deal with
so much data that has isn't in is in an imperfect State like we deal a lot with smear uh things that come from memory might have like been truncated at a page boundary or just skewed in ways that are unexpected uh tree Setter handles errors extremely well it just replaces the part of tree that is missing or that it can't figure out with an error node and then it continues to try to reparse until it can find something that it makes sense of so here's an example of that on the tail end we have this error node you can see there um trying to run this with a traditional Powershell interpreter it complains about uh a missing delimiter
but when we send the exact same script through tree Setter or F this tool really um it can still handle it there's an error note in every single iteration of a syntax tree it doesn't care because it still has access to the rest of the tree and in the end you'll still end up with something uh that you can
understand so some another advantage of this is speed and portability uh this is written in Rust and C it's really fast it could be a lot faster even because I made a design decision at the beginning of this that uh I didn't and realize was going to have performance impact which is essentially costing the O in on every single kind of query I want to execute instead of just like bundling them all together and running them all at once so this handles 745 clean Powershell scripts in 42 user seconds cuz this is threaded uh and that would probably be likely closer to like 4 seconds if I had to guess one of the big challenges with uh
the power shell gramar in particular is the complexity that it produces in some cases uh in Python arrays are very nice like it's an array and then it has a bunch of items that are all like children of the array of the array node but in Powershell every array is actually either two unary expressions like an in and an in or it's an array which has an array sub and a unary expression and so forth so you get this nasty recursive structure that makes it kind of hard to query against like if you want to determine whether or not every single item in an array is an end you can't do that with a simple query
what you can do is query for the contents you can grab a node of like the top level array and then you can actually query within a single node in a second query so you can um kind of iterate on this and there's an example of the the actual grammar itself and you can see it's a unit expression in a r literal expression or a a real literal or a un expression so that's why it produces that recursive structure and that's it um if anybody has questions or comments feel free to reach out to me on any of the socials and uh if you thought this was cool and have thoughts on it I'd love to hear
have you made any of that open source on your GitHub or is that all not yet it's uh there's like zero ER handling so what's that I ass successfully generic generics not you're not matching specific OB tools have you found that successful uh I still have to do more research um but I I do plan to audit all the major obus tools and yeah I would be willing to bet that most of them do these like string tricks and stuff like that in environment they don't [Music] use I'm sorry they don't like [Music] pi