BSides Bucharest Online Meetup

Name: BSides Bucharest Online Meetup
Uploaded: 2022-05-30
Duration: 1 h 8 min 11 s
Description: Adobe and CrowdStrike security researchers present two technical talks: a machine-learning approach to detecting 'living off the land' attacks that abuse legitimate system tools, and an analysis of file infector malware behavior, limitations, and detection techniques.

BSides Bucharest · 20221:08:1140 viewsPublished 2022-05Watch on YouTube ↗

Tags

CategoryResearch Technical

TopicDetection Engineering Malware Analysis

ResearchCase Studies and Incidents Analysis Technical Deep-dives

StyleTalk

Mentioned in this talk

Tools used

CFF Explorer PEStudio strings Wireshark YARA

Concepts

hex dump

About this talk

Adobe and CrowdStrike security researchers present two technical talks: a machine-learning approach to detecting 'living off the land' attacks that abuse legitimate system tools, and an analysis of file infector malware behavior, limitations, and detection techniques.

Show transcript [en]

so uh hello and welcome to besides romania meetup uh my name is alexander callister i'll be your host tonight and i'm the security manager at some adobe security team i'm very happy to revive the b-sides meet up after a two-year break uh unfortunate rate due to uh some events uh as you all know our team is committed to sharing the information knowledge and tooling uh that we have gained during the years and share this into the open with everyone that's interested to learn about our stuff so i am uh happy to introduce our specialists in cyber security and i will uh ask them to share their knowledge tonight we have from adobe uh tiberius and andre

they will present the living of the land also from crowdstrike ciprian beijing who will share us his lessons learned when playing with the filing factors so without further ado i'm gonna let these guys share their knowledge with us so tb and andre please thank you okay thank you alex okay so uh hello everybody thank you very much for joining this talk my name is andrei kotaya as alex already mentioned and together with my ass and colleague boris the great we're going to have a short discussion today regarding the project we opened for somewhere last year and that we're actually really proud of and we have been utilizing for a while now so we know further introduction a

supervised machine learning living of the land approach next slide tv

it's hard to be the presenter isn't it okay so a couple of words about us we are a part of the adobe security coordination center uh we are basically a data science research team that is composed for members from across the entire organization uh now as for the security coordination certificate itself its main task is to monitor uh uh for which they handle the monitoring and alerting part of adobe but also there is the response part so in other words the ac itself is actually a react uh does reactive security now the purpose of this team is actually to do data science research in the security field so we mostly focus on reactive security as the sec does so that basically means

identified tracer that cannot be detected via conventional ways uh now in other words we actually use the logs we have uh and the security that we collect from adobe assets and we try to find an anomaly so we actually really like sharing our work that we're doing that's what we're actually having this presentation uh and uh we actually do have a couple of open source projects uh blog posts patents webcast and so on and actually we're going to touch base on those a little later next slide tv thank you senor so what is actually living off the land well easily put living up the land means doing your job with what you have at your disposal and

the honest truth is this we're not in the middle ages anymore what we have at our disposal is more than enough uh so living in the lab is that the brand of concept like the first time somebody used this particular concept was at the beginning of the century so everybody in this meeting should feed the world except dragosh which i know your true age and uh things like attacks persistent lateral movements have been recorded and reported for a while using the leading of the line concept um now all the big kids are doing it but uh what is well what's actually interesting about living of the land attacks is that they leave a really reduced footprint

behind right so you don't have new binaries so you actually try to blend in so you and you already have everything there that you need so let's be honest the truth is if you actually have a bash python powershell curve certainly what and so on like what more do you actually need plus i feel like netcat actually became a default admin tool in most of linux distros right now so again living of the land attacks referred to multiple attacks uh segments that are actually delivered with whatever you already have at your disposal on the compromise box so that means that uh that means no external multidisciplinary spread or especially crafted tools or or next slide tb

cool okay so why leaving off the land well the truth is because the concept has been for so long some people already did some awesome research on this uh on this uh field right so uh i i posted a couple of links on the slide there that are actually the official legitimacy of the lab that is for windows and for linux right so for example uh if you want to download something and uh you don't know how to do it with go to one of the links you are going to find a couple of examples are you even have a button there like you say i want to download stuff it's going like what binder is able to

do that particular task right and you can just uh see what you have at your disposal there you can use a legit binary to execute some code well you're going to have examples there you want to escape a text editor like do a ctf kind of approach again you're going to find examples there um now the honest truth is though things aren't always that straightforward doesn't mean that anything you're going to find that it's actually going to work on what you're doing but most of what you need is already there and that's actually a great starting point um so what's actually the living of the land classifier it's uh it's actually a project that we developed during a

couple of months and uh we actually found really useful in our day to day alert anyway so me as an executive security expert i had quite a couple of uh incident security investigation and every time i learn something like that's the fun of the job uh so sometime i i've learned something that evaded the measures in place right and that's the expectancy when you're doing this this is the response right so something went wrong and evaded like what the things we have there but something actually might have evaded my security mindset as well right like i'm i for one might have not been prepared to find that thing in the wild plus if you think of the size of the

adobe data we're working actually with petabytes of data right uh definitely more than what a couple of dozen humans are able to parse per day so how can i make sure that everything that i'm detecting there uh is is accurate how can i be sure that nothing actually is bypassing my my security mindset or my security stack for that matter and the honest the truth is i cannot but i can give it my best and uh well that's actually what this project is about thinking outside of the box so what we did we took thousands of uh examples of malicious commands we did some magic around them and we actually built a classifier whose entire purpose

is to tell me if a particular piece of string just a string looks like a potential thread and why and as you can see really fast in the short video we have on screen uh the tool is pretty easy to use you just take a string and you put it there and it's the classifier is going to generate some tags and it's going to like if they that is actually a good or a bad or a neutral kind of statement now regarding the target environment uh the classifier comes with two previous modules a linux and a windows so for the windows one we're just getting a fresh batch out of the oven hopefully somewhere this night

we'll test it and we're going to evaluate it and we're going to make it available after that for the linux one we already have a pretty much your module that we are it's really similar to the one we are using right now inside of before uh for some sort of detections okay so that being in mind like what it does and how it does it actually let's take a look at what's the magic behind the detection part better can you take your next slide awesome okay so i already mentioned text or labels right and the entire magic of the project is actually the text so don't get me wrong the training data is crucial but the magic is in the text and my

classifier is going to be as good my as my tags are and of course as my data set is but tiber is going to offer more information about the dataset and the mother a couple of slides later so what kind of tags am i generating well mostly anything that actually crosses my mind so i care about the binaries that are executed i care about the paths from where they're executed or passed that are being contacted i care about how it communicates do i have network information there uh i care about keywords i really really do care about those little dash parameters there that appear with my commands and i especially care about patterns because patterns are actually a game

changer tb so what i'm going to do i'm going to go really fast through all those tags types that i i have mentioned and i'm trying i'm going to try to highlight how they actually function and how simple it is and why they are actually the subject matter expert of the project itself right so combine binary tags so what do they actually mean so basically the concept is pretty simple i'm checking if a particular string contains a particular command the name of a command right so if that command contains that particular string i'm going there to add a pretty simple text to it for example command python so at least in our example here i have this it's a standard

python interactive shall you see it all over the place right what i do i actually break it into words and then i make it to a list and i'm saying okay is a python tree in my just nearly created token if it is okay put a tag there command python 3. now the code behind is slightly different but the charts are exactly the same the second thing i'm doing is actually i'm using keywords and what are keywords now keywords for me represents a list of interesting strings from a security perspective or parameters uh that time i want to highlight as being present in the command so uh basically it's what i consider to have a

security implication where it might have a major white listing implication like if i see this particular keyword that means this command is actually good so we basically created at least based on experience and what we saw as being part of the commands and also in the process of fine tuning the model and we do exactly the same check as before right i'm looking if pti is present in my tokens i'm going to say keyword pti so simple as pi now of course uh this doesn't mean that the solution isn't prone to false positive right because you have uh commands that actually might be extracted wrong right and maybe that command is not present in that particular binary or you may have

parameters which associate with different binaries they actually don't have the same security implication so that's not a problem because we generate a lot of text that's the whole magic of the the project to check as many stacks as possible so for any false positive things we generate we actually have a lot of other tests that can compensate for those so really interesting all the static information that i'm presenting here is actually present in a file in the project called causes advice that's half of the magic of the project that's basically the subject matter expert that we added the project itself so you can modify the contents for that file at any moment we can add something

stuff modify stuff and so on and the results are going to be directly observed in the text like the tags are going to dynamically change when you when you execute them the only downside is the model is going it's not going to take into consideration attacks and might be reacting uh in a negative way to some text where you actually change the content or the context behind them so you might need to retrain the classifier with the new the new tags tv next slide cool uh so the same story actually goes for pets right i made the list of interesting paths for based on experience between sewing commands based on force positive and i'm doing exactly

the same check that i did last time right so it can be interesting right because you can have a binder that's running from a row location that can have a binder that's accessing uh interesting sensitive path right so i'm trying to inherit the model the machine learning model behind this mentality and automatically for it to highlight when it actually is going to see that even for new observation right if it sees like patterns regarding all things you know it's bad but in a new kind of data set he might react proactively and say okay this looks bad based on what i know from from the past same thing all the paths are actually available in the constants file and they

can be changed or updated anytime next slide tv cool now uh what i haven't told you is the fact that this project was built modular so first time we did okay we have this beautiful data set we're going to extract the binaries the keywords and we're going to create the model and we tested and it was awful we weren't able to eat the ending and then we said okay we're adding pets we added pets we extracted the pets we trained a mother it behaved better but still disappointing so the question was what work can we do there and we came the idea look you know networking is actually pretty interesting so what we're doing it's pretty simple

and it we're actually looking at the data we have an ip kind of an object and for that i'd be kind of a magic we're saying okay is it private is it public is it the local host is it whatever class it is uh that's really interesting for example taking consideration you might have to ssh reverse tunnels right and uh you're going to see that one is going towards the internal ip and one's going towards an external ip if you're the analyst which one you're actually going to investigate first right is the same mice you're trying to to apply to the model itself um so except now the problem is that you don't always have ips in your data sometimes you have

domains uh what i'm basically doing in the event that i'm extracting if i have the ip information that domain in a different field i'm just appending that in for that type information to the command itself somewhere at the end and basically it's going to be able to extract the specific tags for for that information now this particular feature had a significant impact on in the in the overall accuracy of the project but it was wasn't enough yet so going to the next one and my favorite one so if you guys don't know me i love ridgid i was made for living registrations was made for logging me the simple answer there now uh so we

have binary tags we have keywords we have patches we have network information and we have bad classifier so it took you the next step like what more can we like what more information can i generate from my mother so it can understand as good as possible the data there and the simple answer there was fragix now uh historic i've been working on content creation and i know there are actually two problems regarding graduates they can be too strict and ignore interesting stuff or they can be too broad and create a large number of positives so me as a good engineer i went with a second approach a broad regex that creates a lot of false voices

but the trick was actually to do a lot of rejection a lot of rejects many of them and i'm going back to the same concept as before even if i have an attack which actually is a false positive for my finding i'm going to have other tags that are actually going to compensate for it and it's going to tell the model actually to what to classify cool uh so the idea was to catch as much as many features as possible like pti spawn netcat nmap i know commands uh commands with parameters commands with uh ips commands with whatever we have there and it actually works really really well so in the end we actually take all the

generated data put it through the classifier and let the classifier decide what values or not so even if we actually have a lot of low confidence uh tags uh because of the rejects and i told you we have many of them uh the having a lot of context around the information is going to help the classifier to take the best decision now uh regarding regis we actually started to have a problem because we had so many regexes the classifier started starting to become really slow another classifier itself but the data processing phase right the part where i was trying to extract the rejects so we had to do some engineering there it was the moment

where actually i discovered python 2 regex2 from google which is awesome so we switched from reject to regex 2 and we actually grouped our regexes into multiple parallel groups like we have eight parallel groups right now which execute all of them we actually distribute them uniformly so they actually do not overlap and we're able to reduce the time by uh the by um by eight times so it actually goes faster than it went initially now um one of the interesting stuff with the same like uh if you need to modify rejects delete regis or add a new regex you're going to do it from the constants that by file and it's going to go really fast right

but any change you're going to do is going to be ignored by the model but that's okay i know a lot of people who are actually using this project there that they're not using it for the classifier they're using it for the tagging process right so if you just use it for the tagging process like creating data sets nba implementation and analysis and so on it's going to be enough if you just um change the console file and get the text that that you need but if you want them actually to have an impact on the model you have to retrain them on okay so i'm actually going to pass it now to tiberius and uh he's going to

have some fun as well thank you thank you andre and hello everybody uh thank you for the introduction so one of the most important things to mention about our approach is that we are using supervised machine learning algorithms these class of algorithms actually require label data in order to build the models themselves and the problem is that the large scale data set with living of grand command lines is really hard to come by so we struggle with this and we actually had to build our own data set from scratch in order to build our system on the living of the land classifier api comes with pre-trained models and we are releasing them for for free but we are

unable to share the data set with anyone else because it contains some intellectual property some clear text passwords some tokens and so on and that would not be very good for us but i mean if you want to build your own data set you can just use the same recipe that we did it's going to require some effort and some work but it's actually going to be worth it because you'll be building a living of the land classifier that is specifically targeted for your own infrastructure and you are going to you know not be generating too many phones positive because the model is going to learn how your uh you know system our systems are

behaving so we actually started with a number of 1600 examples of malicious command lines that were collected from the open data sources that andre mentioned earlier and to those we added a number of 7.9 million examples of b9 command lines that will obtain by sampling our structure logs over the last couple of months now the reason for the sampling is that uh by doing so we actually removed or had a lower chance of including stuff that was operational inside the data set and i mean as you can imagine the operational stuff really looks like uh living of the land commands and we don't want any of that in the training data marked as being benign

so there's a ratio of 0.02 between the malicious and the benign command lines uh but that's actually okay uh because we are maximizing the f score and this ratio of actually make sure that the classifier does not produce many false positives and it's really trustable when it comes to laboring something as being malicious also it kind of follows the same data distribution as you would see in real life so we took this data set 7.9 million examples in total and we branded instead of did it and represented earlier adding you know tags for binaries parameters uh networking information and patterns the total number of tags that we generated 7.5 million labels uh with 500 of them being unique

now you can imagine that i mean there's a gap between the 7.5 million labels and the total number of examples and we have to mention that some of the examples actually generated three four five six maybe eight tags and others generated no textbox why but because they were not important from the security standpoint and we are not handling those uh you know super simple command lines that are on every linux box also the fact that we have a really small number of unique tags actually means that the classifier it's going to be uh it's not going to work for the data it's going to be robust and this is again really really good so to validate our data set and our

approach we actually evaluated three classifiers using five-fold uh validation and for those of you who are not familiar with type of validation the strategy is pretty simple you take your entire data set and you split it into five identical folds you try to preserve the bridge between matches and nine command lines as close as possible to the original data distribution entire data set so 0.02 and you train each classifier uh five times of on every possible combination of four folds and you test on the fold that is kept aside and then you can report the accuracy of the classifier using this strategy so based on this we computed the mean f score on the standard deviation

for random forest and svm on the logistic regression the lowest training time was for the random forest classifier which is also the only classifier that was able to run with a sparse input feature set which means that we didn't have to convert the input data into dense matrix and this actually reduced the memory footprint required on the classifier by a lot also the same time reporter for the svm classifier was computed after applying something that is called internal patch on psychic learn and dispatch actually classifier you know running parallel and without dispatch we were honestly unable to evaluate the classifier because it took a huge amount of tough time uh to train it and to validate it

now finally we have one more thing that we have to mention so while during runtime uh while we do rely on the classifier to predict it's something that was not previously seen before is malicious or not um in for normal examples we felt that it was not good to miss something that we know for sure that is bad so we added a special tag which is called looks like non-law and this text actually works by overriding the decision of the classifier for the examples that were labeled as being benign but they actually resemble something that's malicious inside our data set so in order to list we take a big example and we compare it with

everything that's malicious in our data set using a fuzzy comparison method and while the obvious choice would be leverage time distance this is a really high computationally high computational algorithm and it cannot you know run in production so instead of using leverage time distance we have uh we switched to something which is called squaring um and that's what it has been previously using machine translation to actually assess if um automatically translated sentence is good when compared to the gold standard data and it's a score you know linear score that starts from zero when in our case two command lines in common square is zero and goes up all the way up to one if the two common lines are identical so

we set the threshold of 0.7 and if a new command line is close it's closer than 0.7 to something that's malicious in our data set we assigned it looks like non lol tag and we mark it as being malicious now this step was not evaluated used in the evaluation of the classifiers it's something that we had during downtime so uh we don't have to worry about anything finally um how to install living of the link of the line classifier there are two major options first and the easy this one is to actually use the feedback package you just work with installation in a virtual environment or on the whole system and you have everything up and

running the other option is that we actually use the github installation we have to clone the repository you have to create a virtual environment preferably so you don't uh mess your global python installation activate the environment and install the requirements now this uh option is uh a bit you know harder to do but it's the only way that you can actually train your own model so if you are planning to create a custom model for your own infrastructure you have to do to go with github installation once you have a working uh installation or copy of uh living of the land uh in your notepad and environment just import the load c which is the

classifier and platform type from the main api uh create a new instance uh specifying the platform type which can be linux or windows we do not support the s6 yet have a list of commands that can be can be a single command or it can be a multiple list of commands and in fact if you are doing a massive data processing it is actually recommended that you call the api with a large number of commands because we are going to batch them and we are going to process them in parallel so it's going to be a lot faster than calling the leaving of the line classifier one command at a time so regardless of how

you run it you are going to get a

couple of uh results in that binary it's not just good or bad but i'm going to get back to this one once you have a live demo but you also get the tags for each command that you are applying living of the land on and you can actually use those tags for filtering for searching or maybe to build your own analytics and statistics and so on uh so keep in mind that you don't have to actually like on the classification itself i'm going to try to share my uh screen to do a quick live one seconds

okay do you see my screen can anybody confirm that yes that's perfect so i have a virtual environment that i created and i just installed uh the jubilee the notebook and started it so i'm going to use the package installation it's going to run really fast for my computer because it already cached i'm going to import the lcm platform type and create a new classifier for linux then i'm actually going to run the leaving of the classifier for those commands over here and i'm going to do it one by one because i want to be able to easily read the output and interpret it so i'm going to go over them really fast what you can see here

is the classifier uh the living of the land classifier actually trying to describe what's what is happening inside the data so if you take this command line for for instance uh you can see that it saw the command ssh which is this one so the keyword minus r uh which is the minus r over here it's a public ip address which is this one and it also so i saw the ip look back which is uh this one over here and it also triggered the pattern that andre added the classifier which i say is called ssh r and it's basically ssh followed by a minus r in close proximity to itself and it was labeled

bad or malicious and if you look at the the other command line some of them have the looks like non-long tag which means that the decision of the classifier has been overwritten by this tag um there are quite a couple of examples but some other examples don't have this this tag active which means that all the other tags were used by the classifier and the example was labeled as being malicious uh based only on this kind of behavior behavioral analysis on what of what the code is doing and there's one more important example here uh so is this one it's apt-get install python3 basically it was labeled as neutral because the system did not generate

enough tags for the classifier to come to a conclusion whether the example is malicious or not and this is kind of normal for standard operational stuff such as this one just installing a python free environment it's definitely a living of the land uh so it doesn't you know have any really really important tags for the security point of view but if you have a really long command line uh and it's getting labeled as neutral um that means that you might be dealing with something obfuscated because that's why we are not being able to spot and assign labels so um basically this is the whole presentation i'm going to start sharing my screen right now or

maybe get back to the open presentation and i want to thank you for uh being with us and asked you if you have any questions uh require regarding the meaning of the name classifier

are there any questions

are you using the classifiers for um for other purposes other than detections

i'm still trying to promote sure i mean we are using it for design enrichment like for the dragon component we have another project which is called oss and it's open source for a while now and actually was now the core utility for our new eap project that we're doing internally and we are using the living of the land for data enrichment data generation and tagging system but in the end is going to get still into a detection so yes no somewhere

so one other thing that tiberius mentioned is that uh by default the living of the classifies a command but if you want to do host your user behavior you can do that with the living of the land and the oss project that was done on the other slides okay were you able to share to share to switch the host option oh yes i was amazed

so andre do you require to be the host no no yep yep it seems that i'm host okay then oh moving on let's see if i can share it um yeah uh we've passed the question of do you guys hear me so do you guys see this yeah we can see it but it's not in presenter okay this is yeah it's okay so you've already got a sneak peek of what's going through okay so the presentation is open [Music] okay man i have no idea why i'm so nervous i've done this several times before but i it's just me so hello everyone and uh welcome to to the guys that joined us during the presentation from

andre and tb uh my name is shipriyam bajan i am a security researcher at crowdstrike and today i'm going to tell you about a filing factor and some lessons that i've learned when playing with it that it was uh it was a bit of painful job first things first what is the final effect verb is going to be like a quick agenda then how do we get infected because we all of us have at least one friend that always says that he cannot get infected he don't uh he he's not the guy that needs to buy an av product because he's never get uh he will never get malware on his computer because he doesn't know what he's doing

and moving on after the the machine gets infected we're going to see how the infection actually is performed and how to remove it based on what the the analysis results were first things first what is a fan effector uh to not be um misplaced by a worm they're kind of similar but they're a bit different a filing factor is basically a malicious software aka malware capable of infecting files to spread on systems on the entire operating system on removable devices mounted devices and even the shared folders that are shared across the local network but in order for an infector to successfully perform the infection routines it needs to seek out and copy their virus component

known as their malicious code to certain files it cannot infect all your files so those files are required to be executables libraries known as dlls drivers aka any file that cis and even html if your fan infector is ram-net and it works to javascript but there are many more uh well most of the infectors now do not use a single type of infection it's not just going to do something that's just popping up a message box and that's it hey you got infected or this message comes from your infector known as the adada they need to combine multiple uh techniques in order to provide the complexity of the file when it comes to the reverse engineering

steps that's why i've said that the file effector is basically the malware's marketing team that's trying to represent every file on your system because once you execute one infected file that virus component is going to be spread across your entire operating system and you're not going to be able to notice in like 10 seconds uh well all you'll be able to see is going to be the same files that you know that you already had but their content is a bit altered or now they have something uh different and that something different might be uh something uh prepended like uh header modifications something appended like overlay that's the key word that the most of us are familiar with

uh or even abusing the trampoline or the uh the code cave when you find uh empty spaces or uh empty sections on your file or null bytes a file infector might see them as an opportunity to copy the virus component into that sections and yeah a trampoline it might sound something super complex but it's just a jump at the beginning of the file to tell the application to start executing from another uh memory address rather than the original entry point after that execution is being performed it will return and start the usual entry point execution tricking the user into believing that nothing happened they might have features not only the the effects steps uh that they might do

uh popping up message boxes uh downloading files sending funny messages creating emails sending emails uh playing with permissions on files bring with users permissions all the stuff you name it if you're the one writing it the sky's the limit because when you're done with the infection routines after you wrote it and you tested it you can do whatever you want with the rest of the file now how do you get infected well it's like the majority of the cases is via phishing because if the email is crafted to look professional it tricks the user into believing it's the real deal and most of them have attachments it's it's hard nowadays to send an attachment and executable on the email

as most of them are verified before sending and you need to be creative so uh the most used scenario now is uh by a documented micro enabled options and if you don't have macros enabled on your machine there's no problem because they already thought about it in advance and when you open the attachment you'll be presented with a pop-up which is actually an image saying that you need to enable the editing for the file and to enable the macros and when you enable the macros that macro was some kind of a downloader which got the actual infector that was an executable from whatever the malicious url was and it was embedded in that macro and

all your files uh are going to be after directory traversal in fact that one by one the well the computer will run like nothing happened and you won't see it coming until like the next day when you see that everything is uh is moving slowly and some of your applications are not working anymore how the infection is being performed well you have the infector that you got from god knows where it from the directory traversal it found so let's say one file because you only got one file on your machine it was a clean install on windows it got your file that file was executable as i said it could be executable dll html or cis

in this case wasn't executable it performs that encryption routine and those extra steps those features and your file has been altered aka infected but what was to be mentioned here for that particular infective that i've been looking at was that your infected file not only that it's it has become infected but it's now an effector itself so you started by having one downloaded infector and ended up having one infector and one infected slash infector which is one of your own files um and now to reveal the name this infector is known as neshda i know it's old uh it's uh still yeah it's still in the wild it's still seen almost daily uh it successfully infects

basically everything but it has its its limitation as we all see the infection overview goes like this we have the header we have the body that's the original file that's about to be infected the infector opens that file with read and write permissions it keeps the body untouched but it will skip the header so to that body it will attach its own header which is always going to be the delphi header that's hard coded inside the virus component and that's a great indicator because it's always the same so being hard-coded and not polymorphic it's it's pretty easy to find it but what happens then in your memory when the file is opened the infector is getting the original

header from that original file which is not yet infected takes the header it performs the encryption routine that extra feature that i've been mentioning and what you will see as an overlay on your infected file is the actual original header but encrypted so now your file not only that has an overlay and well it's obvious when you are looking at the file the next day you'd see a data that was not supposed to be there and also that has no sense because it's encrypted but you'll notice that the header is different during the analysis because that looked a bit fishy we managed to find some of his behavior and from the static analysis perspective there are some

limitations but then when using the dynamic analysis and i mean the debugger we managed to get out the the behavior of the infector to see exactly how how it performs the infection and what it does so that the lv header is always the same it's going to be a carbon copy on all infected files uh what you can do with that information is just take any of the infected files take the header uh i know hash it like md5 or chat 56 and you can use it as a signature to easily find the already infected files on your machine or on other machines in case you have access to them uh when doing static analysis you're using

sis internal suite and outruns and strings and all the other p parser tools that are making your job easier and it has some obvious strings we're gonna see them in a minute uh those strings are not necessarily a good indicator because if i am the attacker and i run the effector on your machine i can modify the virus component to not be to not leave those strings in those specific locations and in that header so without having the strings because they are not affecting the virus component behavior at all they're just strings that are not even printed out anywhere they're just there in the file if i modify the strings and replace them with null bytes

the virus is going to do the same steps that the effector is going to perform the same infection routine but those strings are going to be missing and if you have let's say a yara rule that's looking for those specific strings it will fail but the file is going to still be malicious and you'll probably miss it the virus component has the capability of performing a self-decryption routine on the overlay so if you're looking at the file from static analysis perspective you'd see that the overlay is a bunch of nonsense just random bytes doing nothing but uh if you're trying to see what the infector is trying to do with that overlay even though when it's creating

it on a new file in order to be infected or what it does with the existing overlay on the infected file uh you'd notice that it's starting to perform some kind of a routine that's going to reveal the actual values of that overlay that that nonsense and after two or three iterations you'll realize that that nonsense is the actual original header of the file so in order to make sure that you are on the right track if the mother is performing the decryption routine and the first two bytes are not mz that means that file is not a p file to be infected and something went wrong uh how the decryption routine is going to be performed uh

well it's a decryption so it needs a key that key is a specific double words uh sequence of bytes extracted from overlay it's always going to be found in the overlay and each file has its own byte sequence that serves as a decryption key that encryption routine with that decryption key is going to recreate the original header based on the encrypted overlay and having that decrypted is going to recreate your file because the body was never touched by the infector and now in memory you have the original header but decrypted is going to rebuild the file and it's going to drop it into a specific location i've already mentioned there is in app.local temp into a very particular folder

and at that point when you are executing the infector it will perform the infection routines directly traversal looking for new files looking for files to infect start infecting files but that dropped file that's been dropped into the templation is going to be executed within a remote thread and the problem is that the user is going to believe that everything went well if let's say it's a video game what happened in the if the infected executable of the video game is going to be executed the clean version of it is going to be dropped into the templation and after that a new thread will start the original file from temp location not from the desktop where the user originally double

clicked it uh now how many of us are checking where we are double clicking something if that shortcut is indeed where it was supposed to be like it's the games or c app.local temp uh and and that was uh yeah that was one of the tricks that the infector performs as a feature but also i found that uh it might not infect certain files and it might not infect files from specific locations so if you if you're performing the research on the infector and see how many folders from your clean machine have been traversed and how many files have been infected you'd realize that something stopped the infector from going into those folders and infector files

and it's not starting to infect by its own the effector works only if you double click on anything and that's literally if you double click on your browser to open it the infected file which is not your browser is going to execute alongside your browser and in case your browser is not already infected it will be as long as you as soon as you close it uh and this these are the obvious strings that i've been telling you about those are just well strings used in not even in prints are just there for them to find so uh at this point we realized that this file effector even though it's pretty complex for 2005 looks more like a joke between two guys

like one programmer tried to troll the other one but it's uh yeah it's still pretty solid for a fine effector it's not trying to download anything not trying to connect to internet it's not trying to remove anything from your machine but it's infecting your files by appending and prepending data to your uh to your files rendering your machine and some of your application unable to run there are limitations the the fan effectors also have limitations uh one of the features is that the header that delphi header that it's a always the same has 41 kilobytes as a size and the problem is it cannot replace the existing header of a file to be infected if that file does not have at least 41

kilobytes in size so if you have uh the smallest executable ever like a print hello world but without importing all the libraries and the size of the file is like 39 kilos the infector will skip it because there are not enough bytes to work with in terms of decryption routine and also if you have files that are bigger than 4 gigs the infector will skip them because there's too much data to work with uh well being in 2005 we concluded that the infector works with the fat32 files instead of ntfs but also those are the features those are the limitations the problem is that the infector itself performs some damages also on your file it will not judge your file if it has or

if it doesn't have uh specific sections like if your original file has debug section like it's your own application that you have written in visual studio you'll always have the debug section that's basically the file path with the the extension of pdb inside your inside your file if you are performing the hex dump on your file you will see it but if you are opening the infected file on any p parser cff explorer p studio or whatever uh you'll see that the pointer to that section is missing it's either uh unavailable or it's gonna be say that it's uh invalid even though in hexdump you'll see the information that you'll see your pdb file path

the same goes with certificates and signatures the problem is with resources as they are part of of the header and the header always will pointer will point to those resources that every parser that tries to understand them uh end up with uh claiming that they are in russian but in fact that mother is from the rest so they're pretty close now that decryption routine after the overlay is being decrypted and the original file is being rebuilt and dropped into the temp location the modeler needs to know which file have been decrypted and that's one of the feature it will not infect the file that's being dropped by itself so if the infector dropped is its own clean version of the file inside

the temp or up at a local temp the infector needs to know whatever you have in that location beside your file that's just been dropped needs to be infected so in order to separate the file that's been dropped from the rest it's going to create a new folder and that folder is not there by default you won't see it in strings because it's already encrypted in the virus component and the problem where the problem was that it was encrypted and the solution was okay it's encrypted but how does it decrypt and the same decryption algorithm used for overlay it's used for the for the folder name so what appears to be the first ring decrypted

uh a subtract between two numbers it's actually a string and that's the actual name of the folder where all the files that have been infected are dropping the their original version uh it's also decrypting the exe wildcard and the temp 5023.tmp but that's not being used i i think this version of nestle it's it was somehow still in development and the guys just stopped developing it so what do we know now is that we have a safe house if you want your files to not be infected just paste them into the into the specific location just look for this folder is always going to be the same even though it appears to be randomly generated numbers it's always

going to be the same uh how it looks like well um i'm playing guitar lately so having an infected software that's related to my guitar it it feels like it fits the scenario i double click on that application on the desktop and in app.local temp and that specific folder you'll find the clean version of the same file but it's not infected so the infected one runs from the desktop starts to infect everything on my computer but it will also start executing the clean version on a separate thread so my application will start normally and i would notice nothing different about it i'll notice that i won't be able to see that anything happen and that was a problem and when it

performs the directory traversal it will not go into this specific folder so if you have a file that you do not want to be infected just paste it here and you're safe what other limitations does this have well it have features not necessarily limitations if you do not have that checkbox on the windows and the folders on the view properties with display extensions for known file types you won't be able to see and see windows the sdchost.com you'll only be able to see svc host but if you're checking the file it's not going to be a com file it's going to be the actual executable that started to infect your machine so whenever you have c windows or when it

can be in d windows you can install it whatever you want uh whenever cc windows svchost.com svchost.com does not exist that's the virus component that's the actual infector that started the mayhem on your machine it's an executable uh it has to remember where it will left off in case someone stops the infector from infecting all of your files like if you are running it into the debugger and stop it after fifth file that got successfully infected it needs to know what was the last successfully infected file and that is saved into the directx.cis and again that's not a driver file that's a txt file it's just a plain text holding the file path of the file that

has been successfully infected the last time the the infector ran uh it's persistent but when you say persistence you expect something like current version run or current version run once startup auto runs schedule tasks well this thing is a bit different and well i i kind of was impressed when i see this this is the reason why whenever you execute something with double click from wherever you are on your machine uh the effector will always work and will always start and look for files to infect in case you downloaded some new files in the meantime and you see the link between the virus that resides in your windows directory and the one from the persistence the

registry key it will always execute svchost.com so if you just remove this sdc host.com from your c windows you are safe it's not going to to execute it as a persistence mechanism but if you are mistakenly double clicking on on another infected slash infector file from your machine uh everything is going to be rebuilt from the xero and uh yeah svc host is still going to be there and the persistence mechanism is still going to be there but what happens if you run uh multiple infected files without noticing it let's say you are opening slack discord and a browser just like 10 seconds nowadays so you are executing three infected selection factors files and at the same time

the problem is that they might overlap at some point and they all of them will try to infect the same file and things might go wrong therefore the developer was careful enough to add the mutex and that matex has a single job to make sure that only one instance is being executed at once so if you run three in factors only one of them is going to be the one to to actually start and perform the directly traversal on your machine now we see what it does we so how it does it we need to get rid of it uh well at first we might look on the internet for software to help us remove

things like that but also because we research it and we know what it performs and what it looks like inside we can do it on our own because we know it performs a decryption routine where we can just track the decryption key to see exactly where that byte sequence resides apply the same decryption routine manually somewhere outside the file in order to decrypt the to the the byte sequences ourselves rebuild the file ourselves and remove for google tv remaining like the infectors header and that overlay that's being encrypted and we do not need it anymore so what we have to do is to remove the header because we do not want that delphi header anymore

we keep the body and then we just apply the description key but the overlay here's the trick the overlay is not fully encrypted only the first 1000 bytes were encrypted during the infection the rest of 40 uh 40 000 bytes are untouched because they look anyway as nonsense they are part of the header and about 200 or last 200 of them were the parts of the file how to rebuild the file well using the hex editor or xxdate it's your choice here so easily explain stuff remove the delphi header focus on the overlay split the overlay first 1000 bytes then the rest 40 leave it untouched by decrypting the first 1000 you obtained the original clean header of

the file as it was never infected and having the body that was never infected there you go you have your your original file and now yeah we are closing in these presentations um what i've learned uh a file effector it does not need to be noisy it can be quiet it can be noisy it can display messages it can have logs it can do whatever it wants and whatever the developer wanted to do but it can in fact starting from any file it's not any law to have a specific file and then always from that file to start infection any file on your machine represents a potential starting point for manufacture even if its damage can still cause havoc

because as i said okay you can remove the strings you can remove the overlay it's going to have couple of exceptions but until it reaches to those exceptions it will try to infect your files and if you open the file try to overwrite parts of it and then fail to successfully infect it that opened file without the closed handle was already damaged by not being successfully overwritten so instead of being infected now you have a damaged file and now you have a damaged machine with an operating system rendered unable to run properly and yeah you will see that the machine gets lazy the more and more you open an application you'll see that it takes

more and more time to to realize that you opened it because it requires multiple threads and mostly your windows will be not able to keep them up and running at the same time persistence when you check for persistence as i mentioned do not expect anything to be on autoruns or scheduled tasks on current version run or run ones uh look for everything look look inside the file when you debug it make sure that you and the breakpoints are best friends check for mutaxes semaphores and yeah check your locations that will always give permissions to any file like the temp location like update a local temp and also i guess the c users public it's another one

verify everything you know as signature because if you are building your own database of signature of list of signatures you'll be well it's going to make your job easier and you'll you'll recognize stuff faster than before you don't need to research and recreate the steps again and again in order to prove that that effector is indeed the one that you are suspecting having a signature is one of the is a good approach and uh a good research results into a strong signature so not using only yara with strings you can also you can also use the byte sequences not to do do not rely only on strings i i guess that's what i'm trying to say

here and i've tried that chuck nori stuff with staring at the file and it will not confess anything and uh one piece of advice do not use all the available tools on the internet over the same file because it's not going to work at some point maybe some of them will provide you some information but you need to be able to transpose that information into what you are looking for when you start to research a file make sure that you have all the contacts that you can gather and also make sure that you take notes so bring yourself a notebook a pen and paper that's the way and if you have a deadline and you have to

perform the research it's going to be a long night so have your debugger of choice combine it with ida and that's it you you will start the research on your own file and that research is uh revealed two things even a file effector has some sanity checks it asks all the files from your machines if they are already infected how by uh checking the first 1000 bytes if well if those 1000 bytes matches the ones from the header it means that uh the virus component is already there the file is infected there's no need for the infector to perform the same operations and uh the temporary location and windows directory are never going to be

touched by the file infector so nothing from c windows will ever be oh and also i guess program files is the one so see windows c program files and see users your username update the local temp are never going to be parsed for inspection they are going to be parsed to see if they exist in windows you'll be presented with two drop files that svchost.com which is the virus component that serves as a persistence mechanism and that directx.cis which is basically a plain text file now remember what i said about the friends that they don't need any av because well when you download something from a legitimate website when you have anything up up and running when you have

everything in place you have a license you don't click on ads and stuff like that uh yeah the answer is yes you can still get infected the infector does not judge you on your machine it's just as soon as it's there you just double click on it and it it gets the job done and uh that was it one thing to mention here i know i've mentioned the decryption routine but i did not show it because i hate explaining mathematical operations and that was an algorithm with big numbers uh also if you need it it's inside the file it's inside any of the infected files uh so just ping the besides booker stuff and i'll be happy to provide you

the the actual decryption routine and how to find the decryption key because that was the the main trick here how where where is it how to find it how to apply it

thank you so much gpm thank you so much everyone that's been part of this meetup uh i think probably we need to also finish with the ctf uh

no uh that is running until midnight oh okay we've changed that okay okay uh probably you can stop the recording now and uh

BSides Bucharest Online Meetup

Related talks