← All talks

Threat hunting dotNET malware with YARA

BSides NYC · 201837:4147 viewsPublished 2023-04Watch on YouTube ↗
Tags
About this talk
A Kaspersky researcher demonstrates hunting .NET malware using YARA rules, focusing on the .NET module for detecting obfuscated and packed samples. The talk covers metadata analysis, assembly inspection, and resource sections as high-signal detection vectors, with practical examples including the Ploutus ATM malware family.
Show transcript [en]

happy when I got accepted for this event it was like I wouldn't say a dream but it was between my top goals to to assist here besides New York um I'm part of Kaspersky lab Global Research analysis team um we wanted to talk about a little bit about the era how we are using it to to find new samples and during this year also to perform some incident response cases before joining Kaspersky lab I used to work as a dotnet developer that's how everything started that's where my life went sideways I started with DOT net 1.0 I was doing custom systems I was working with some friends trying to make money to pay for my University

and I thought well okay either I go the the javaway or dotnet and I remember that day very clearly because I tried to use Java for the first time I download it netbeans and I tried to use I remember a grid bug layout and tried to put just just a button on the screen and it took me like 30 minutes and I said no I'm going with.net this is my my destiny so why um speak differently about.net and why think about it in a different way when it comes to malware well uh Microsoft started to think about this framework as a default component of Windows systems um since the year 2002 is included by default in all Windows operating systems

so as a developer this is extremely useful for me but also for malware writers when you know you can run your code in every Windows system and you can access exchange credentials you can access active directory you can access SQL Server straight from.net it becomes a superb lateral movement tool there was another presentation about lateral movement and that's why you we are seeing an uptick in.net samples we are seeing that the bad guys know of course they know about this and they are trying to use it so why speakabout.net in 2015 we did a research with another colleague of mine he works also for for the research team and we saw that every year we were

getting new and new samples if you see for example 2017 that number is for detections that is the total litations that we are seeing in the lab and it's around 300 million so you can see it's a pretty big number and that's why we're talking about Yara today because when you have 13 300 million detections and you have as many if not more samples then you need to find that needle in the haystack it's very difficult to find those threads that are worth researching about those threads that are worth for customers to pay for your threat intelligence feed you need to find something special and that's where Yara comes in so when it comes to.net we see some

four categories top four categories if you see for example we have malware 13 then we have adware then we have riskware and I don't know if you're able to read that one is pornworm with zero percent but that zero percent is four million detections so some people are being naughty together okay everything went sideways when we started to see also ransomware creating.net the number of samples just skyrocketed uh in 2013 we saw cryptologer and since then there has been many more ransomware samples not only with net but with everything but in this case we are seeing height and tier that was an educational malware and we have seen some in the while around somewhere met with hint here

at.net this is one of the uh if I don't remember them correctly it was crypto 888 but it was a spin-off of our Brazilian ransomware that it was actually making good money in Brazil so for me this is unbelievable I mean people are using open source code to create malware we're getting new detections we're getting new samples we are overwhelmed by the amount of information that we have so we need to find a way to find those samples that we want to study that we want to investigate further and this is where Yara comes in so this is um darkseal sample and this is part of another research I did with Bar Paris it was about steam

Steelers as Steam Steelers tried to hijack your sim platform credentials or your account they try to steal your inventory items your video games and it sounds silly but it was a multi-billion Dollar business so yada is extremely simple or extremely complex as you want it to be so to write our era Rule and I don't know how many of you know about here or have used era awesome how many of you have used the.net module two three yeah honest enough so to write out yellow rule you just need basically two sections in your syntax you have of course the rule name then you have a conditions that it could be as simple as saying condition false

and that's it but when it gets more interesting is when you start using the modules because Yara has many more features than using just string pattern matching or using regular expressions or using a certain byte at this position you can go much much more further than that and this is why I started to think about the.net module to write your rules specifically for.net malware since version 3.6.0 they added this module and in the beginning I was extremely happy and I was eager to try it so I downloaded the the latest Tiara I installed everything I was going to run it I wrote a couple of rules and it didn't work it didn't work the way I expected

and it was the beginning of this presentation because I wrote an email to Victor Alvarez who is the developer of era when I told him hey Victor I found this bug this bug this bug and he told me okay talk to Wesley Shields and Wesley was the maintainer and the developer of this.net module and he told me okay this is a bug this is a bug this you got it wrong let me teach you this is a bug okay and let me explain you a little bit more about how can you use CRM more efficiently so he gave me a lot of tricks tips that I'm going to share with you and more importantly he he was willing to

share that information and that's why we are here in a besides event that's why it's it's important to keep sharing this knowledge so why think about.net malware differently when you are dealing with traditional samples you you see that right away you go to static analysis you try to see strings you try to see five attributes compilation date things like that but with.net not many people realize that you have a lot of metadata embedded into the binary so you have for example um this is from a ransomware that we analyzed in 2015 it was called coinbold and we found a lot of information at the time I didn't write the era rule back then because the dotnet module didn't

exist but I remember we work closely with the information we found in this dotnet binary we work with the Dutch police and they apprehended two guys in the Netherlands and it was the beginning of another project that it's called normal Ransom you can access that website at nomorransom.org and we began to understand how big the problem was with.net because you have all this information for example when it comes to ransomware if you are using the.net libraries correctly it's impossible to get the decryption key this is not roll your encryption you're using Microsoft version of the encryption and I guarantee it's much better than anything I can write so we started working with the police we

started working in a different way then you have other samples for example that one is another form the steam Steeler campaign you can see in the metadata some anomalies like the file name for example they're trying to to both of Malwarebytes but of course incorrectly spelled and then when you analyze the hexadecimal representation of a.net binary you see of course it's a standard PE portable executable file you see the MC heater does any anyone know why it's called MC owski correct but within this file you also have another header it's the header for the dotnet binary so does anyone know why it's called bsjb they try to replicate marcikoskin those are the initials for all the people that

work on this heater and they started working on this on 1998 and they thought okay if you're going to put this much work then Mark sikowski is like not the only one to going to get credit for this

okay perfect So within a portable executable file you have three traditional sections you have the text you have the relocation and you have the unmanaged resources within the text section you have within that the metadata that belongs to the dotnet binary and also the Intermediate Language code as you know.net is not compile code it's interpreted code so you have all the Intermediate Language bytecode embedded into there and I mentioning this because um writing There are rules for.net without knowing what you are doing um for me it's a tricky business because every once and then you find that sample that doesn't fit how you learn to use the tool and you are done

okay so what you can see here it's I hope you can see it but I'm using DNS Pi uh as a previous presenter mentioned it's a really powerful tool it's still open source it's still free I don't know how but he is not making money out of this and if you are writing your rules for me it's one of the most important tools you can use because it will give you pretty much all the information that you need to write your rules straight from here

okay so as far as the metadata you have five streams traditionally in.net binary uh I'm not going to go into much detail into all of them just to mention that the most important one is this the little weird sign tilt so it contains the metadata tables within the metadata tables you have around 41 today 41 properties that are defined where you can find information about the assembly you can find information about the module you can find information about everything that is used by this binary within runtime but I didn't want to go into much detail in the framework so we had time to have some fun with Yara this type of [Music] um content is much better suit for a

workshop or maybe a training because the idea is that you get Hands-On practice with the era if you see it on a screen it's like that phrase I see I forget but when you're at home you are listening to music you're in Facebook you download Yara you can practice all of this you only need Yara of course and I'm going to use DNS pi to get the information and then to visualize the rules or to write them I'm going to use Visual Studio code for me if you are using Windows it's a very powerful editor it detects all of the current syntax of era and it works really well so one of the first mistakes I have that

Wesley corrected me it was how I identified Global unique identifiers so for example if you look at the metadata of a.net assembly you see that the identifier the guid is actually not that uid it's called a typelib but somehow the developers of dnspy are evil enough that they are mixing the concepts so what you need to know is the difference between an ID and a typelib when you are speaking about a typelib you are referring to that specific project so for example if you're working on a malware the type lab would uh would arrange and Compass everything in a single unit but if you are seeing new builds of that binary what you are trying to find is the new unique

identifier so that's a big decision to make when you're trying to find either a malware of the same family or a new malware then you can inspect the assembly references for example we're not talking about the references of the portable executable file but the reference is that the assembly is making for example if it's using Windows forms core Foundation presentation Foundation any other assembly that it's part of the.net family then you have the signature the version the build version everything you can find in a.net binary and here are the the references to the assembly I mentioned about and my idea was to do this in a way that you could see where to get information and how to

write a rule with that information in this case I'm using a sample that is called plotus plotus is an ATM malware that was striking Mexico really hard and we've been tracking the Pluto's family since 2013. they are still active and they have gone through different iterations and lately we found one of the latest modules in their malware by using Yara so it's it's something that we are doing every day in the lab and we found extremely useful to find new malware that we can write about we can understand more before anyone else so for the syntax you can use of course besides the rule name another section is called meta in there you can use any fields that you define I

typically add the date I wrote the rule what it's about what I'm trying to detect maybe it doesn't detect that but that's my intention and then you have the conditions like I mentioned you can be as simple as using condition false or you can be pretty complex using these modules in this case I'm using the.net module only you have other modules that are extremely important but in this case since we are trying to track just this family of malware and I know it's made with.net we're using just the new module so you can use the identifier the type lib you can use the assembly references and you can use for example each assembly has a public key of token so if

you are seeing assemblies that are pretty common for example Ms core lib or any other.net assembly that is quite common I wouldn't include it in the rule but I would include any assemble that is custom made or it's made by the malware writer you know just to be used in that specific thread

so you can also get module information in this case we are not talking about the assemblies but what is the file using at runtime so for example is using TDI for representation it's using kernel 32 of course then shell 32 you know all these modules that are part of the the number of API calls that are being done by this particular malware and then for me this is one of the most interesting parts of writing a rule is the resources section when you are dealing with portable executables you already know that there is a resources section but the dotnet resources is something completely different that's a part so in this case this malware was protected

with a commercial protector called dotnet reactor and using the typical you know static analysis tools you wouldn't find like any useful strings if you try to use either Pro or Olive power you would need to debug a lot to find something useful and of course you are finding that information in a dynamic way at runtime so if you want to write our error rule yes you can use that information but you need other tools to analyze at runtime for example you can use Yara with volatility so you can use a memory dump and you can check all the processes and see if any of those strings are matching the live processes but my idea and what we are trying to do is

to find a new malware within or marble correction so imagine if you have 300 million samples no you need something that works fast and you need to analyze analyze it statically so for the resources you have the size the offset and the name everything is related to the metadata heater I I spoke about earlier so Yara already calculates everything for you so for example you have a relative offset and an absolute offset when you are using Yara you need to use the absolute offset so you need to find where the beginning of the metal that the healer is for.net in this case you can use for example file manager and just F7 search bsjb and you will find

the beginning of the metadata

well then writing the rule and don't worry about writing anything down my idea was after this presentation to share it with you because I know that the one of the most tricky Parts about writing code for any language is to find the correct syntax and that's why I struggle the most with this you know you already know what you want to write you already know the rule you you know what you are looking for but Yara says no that's not the correct way but I'm not going to tell you which is the correct way

so in dnspy we can see that information too we can see the beginning of the metadata header in this case dnspy is using a relative address so that's not the real address you are trying to use um and for for example I talked to you about the streams that we have five streams the most important one being the metadata tables sugiara and we are trying to match a specific stream again you need to find the absolute position of that Stream So if you're trying to do that you have Indian spy you can browse all the streams and you would see the upset so it's very simple you see the offset then you go to form manager again the

best tool I'm free for all malware analysts and then you just need the most uh useful utility in the malware analysis World Windows calculator so you just need to add the metal the header to the offset and you can get you can use hexadecimal but you can also use a decimal representation and write your rule with that and this was one of my first mistakes when I was trying to write your rules with.net and Wesley told me no no Yara is doing everything for you but you just need to give it the the right information you need to use the absolute offset then I mentioned we have five streams in this case this sample for plotus had

seven streams and if you check dnspy I don't know if you can see the colors there but you see two highlighted in red I'm guessing it's red so those two streams are called gu Aldi oh and the other was is called blob not blob so you see something is off with this sample and when you are trying to find your malware when you are trying to hunt for new threads what you're looking for is anomalies and of course this caught my attention enough that it was worth to include in this rule

okay so when you are writing the rule for the streams you can use the number of streams you can use again as a resources you can use the offset you can use um the name of the stream and you can use the size in this case the size was meaningless because it was just one it was one byte and it makes no sense but it was something.net reactor adds to any sample that is being protected so it's not a way to find new samples of plotus of this specific ATM malware but it was a good way to find new.net reactor protected samples so if we wanted to write our final year Rule and you know

you saw a lot of code that you're not going to remember about but I'm going to show the presentation with you that's the idea you wouldn't use every piece of syntax available by Yara you need to take into consideration that when you are writing a Euro rule most likely it's going to be run against a database of malware at least in in our case so you need to think about the performance you need to think about what you are using against that many samples when it comes to Performance the the let's say the fastest way is using bytes at an offset then a little more slow you can use string matching and then the slowest is regular Expressions if you

are using modules this is even slower sometimes because you can also use for example iteration Expressions you can use a lot of properties that Guerra needs to convert and then find the other value to match so it takes a lot of computing Cycles to do that so how would the final rule for plotus would look I would use the type lift in this case I I only use the type because I'm trying to find just new plotus malware which means I'm not trying to find a new build of a malware I already know about I'm trying to find new malware of the same family I'm using some information from the assembly such as diabolt diebold is one

of the many ATM manufacturers I'm using the streams I mentioned about so you have a lot of information at your disposal but as an analyst it's your your call to decide what to use and what to live on the side

and there is a trick that not many people that use Yara know about and I only learned about this when I send a message to Wesley and he told me Santi why are you using dnspy you know you can use Yara minus D and the minus D flag will tell you everything about Yara and what it's doing behind the scenes so for example you can see all the offsets all the names the positions everything so you just called your rule where you're scanning minus D and yet I will tell you for example you can just write a rule that says condition false and at the beginning import.net module just.net and run it and then yeah I will tell you

this is what what I was trying to match your rule against so you get all this information that you can use to write your own rule so of course you can use dnspy you can use for manager you can use Windows calculator but if you're on a hurry this is great and for me it was Wesley why didn't you include it in the in documentation uh you know it it was supposed to be added but it was left there so for me it was one of the most important tricks that I that I found and I'm really thankful about learning about this

so the idea of this presentation wasn't uh just to give you the syntax in a way I was trying to um you know share why using the.net module is so important when you are dealing with different types of malware and why you need to think about.net malware differently in the previous presentation about medical devices I remember one of the bullet points that said.net malware easy to reverse in most cases that's the reality you just use ieltspy or dnspy and you open it and you have all the the source code available but sometimes when you have.net reactor smart assembly or or some custom version of computer confuser EX you're done you need to find something else you're you're going to have to open

up either or only debugger there is no way around that so you still need to hunt for those samples and you need to do it in a static way so you need to find some information that is going to help you find new malware without opening either or only I if you cannot use the strings you need to find something else and that's something else is all the metadata that you have in a.net binary when you're writing rules you need to think about in which stage of your threat hunting process you are in so what are you trying to do you know nothing about this malware but you have a rumor maybe in some group or

someone told you about there is this malware for example in Mexico attacking ATMs so what do you know and what you don't know is just as important to write a new rule are you trying to just find new malware are you doing for example incident response in which you know you need to go to for example a bank run your rule in all the systems and tell the bank okay this is infected this is not we know it's Lotus or it's another ATM malware

when you're thinking about Yara and its modules you need to think about us help to write in that specific rule you want to finally deploy in the customer or to hunt new malware but it's not like you are going to need just this module or for example the PE module or the mass module each rule is different and for me using the.net module is a good way to reduce file positives so in this case as a you know as a training example I just use only this module but in reality I would use of course strings if available the file size the heater any information I have to reduce the number of false positives because if I'm trying to run this rule

against millions and millions of samples I need to make sure they're not going to detect Internet Explorer or something like that you know maybe this rule is going to be deployed at your customers systems you don't want to call it Friday midnight you know hey this is detecting Chrome so when using.net you can be sure that you are finding the correct sample or at least reduce it to a threshold enough that you are able to find only what you're looking for I added some additional resources one of the best resources I I found besides your documentation and the source code when dealing with identifiers it was a previous presentation by a researcher from Silence he's called Brian Wallace

and this this research helped me a lot when we were dealing with the steam Steelers campaign because we began that campaign analyzing over a thousand samples of course we couldn't do that manually so we needed to find a way to see which samples were the most important ones and which samples we wanted to write rules to hunt for more so what uh Brian Wallace spoke about is using the IDS and the type lifts to aggregate that information and to visualize it for example you can use I remember he he was using a python script so you would see in a screen all the dots you would see them grouped by the type lib or the module ID

so you can see which families are getting stronger which families are the anomaly which families are related together somehow and then you can begin to write your your rule and and I was also working and I finished this just before getting on the plane I used a DN live to write a tool that gets all these metadata information the same that you would see with Yara minus d but my idea was to build a tool that was responsible of getting all the metadatic information but from a large set of files because if you are calling Yara and scanning a single file that's perfect but when you're scanning an entire folder and Yara is just spitting

out all this information on the screen you need to parse it somehow and use it to get statistical information so I did this pretty simple tool I'm going to to talk to you about what we watch the video the wallpaper was my creation it's copywriting but

so when you're using file manager you just easily can see the P either you can just F7 look for the metadata header in.net and see the offset position from there you have your Baseline where you're going to work so once you know where all the information is stored

you can check for the file and see what it has in this case I just checking if it contains or it's protected somehow I'm using die which is a simple tool you can check manually if you if you want and then with all the information you can just write everything in your rule using visual studio code so you can see it takes just a couple of seconds that's the magic of Hollywood and then you can write your rule and run it against uh all the samples that you have in this case I had just a folder called besides and YC and I would run this Rule and I would test if it was giving me some

positives or not then I was using the minus D flag to get all the information from that sample and that's how actually I was able to test all the values I was getting from dnspy and to see if it was what I was looking for then I was using this library that dnspy is using for all the processing that is called dnlib it's also open source and free so you can download DN live and modify it to your liking in this case I wanted to get all the information needed for Yara and only that information so I used dnlib and I just did a simple c-sharp code to get all the information and Export it to an XML file so what's

the idea behind that uh I was thinking about something when you for example use exif tool or any other metadata tool that you get all the information and you export it to a CSV file and you get some quick information and statistical information about the relations between the samples which is the most important information how many of the fields are shared between samples so you can write a error rule that would yield much better results than you know just blindly writing about it so this is just the XML file it's basically ER minus D but in the most complex way possible writing your own code in C sharp but it was fun and it was a nice way to learn more about the

dotnet binary format

and then the easy way with DM spy so I wanted to leave some time for questions if you want um I'm trying to finish thank you Mr Gates and Mr Balmer do you remember this clip from Saturday Night Live A Night at the Roxbury so if you have any questions or you want to contact maybe a Twitter or send me an email to to get the code of the tool it's perfect I'm available yes

thank you um I guess my um this comes out wrong but um dot net um assemblies are a lot easier to disassemble than you know the standard portable executable so I was wondering um because I'm not I'm familiar but not to if I'm like with Yara Could you um sort of set Yara to look for um intermediate um assembly code sort of like in like wild card patterns and regular expression and stuff like that yeah you can use it but not the using the document module you can use it by using the bytes yeah you need the the bytes representation yeah okay perfect okay thank you everyone for attending [Applause] foreign