
mic on hello yeah it is good uh first of all thank you for coming to my talk I know it's tough after lunch I usually go on a siesta after lunch thank you very much for coming here instead of taking a nap so today's presentation is about malare automation so who am I uh my name is Christopher Ellis I'm the principal malware scientist at RC n witness when I first began analyzing malware I had long hair I long my hair was up to here and now almost all of them are gone so and I'm also the author of Mal rits and botnets a beginner's guide so if you have a budget for a book uh please get a
copy all proceeds go to my kids 529 college fund so uh please get a copy in Amazon I think it's $28 if you buy it from a bookstore it's $40 so it was just published August of last year uh before joining RSA uh I was with Amala for three years I was there a resident Mal reverser and researcher before that I was with f secure I actually built their qualum for uh research and development center and before that I was with Tren micro uh it was in TR micro that I first uh started analyzing malware so they hired me straight out of college and uh during that time they didn't send me an email instead they sent me a telegram so I was
tell telling the story to my kid and he doesn't even know what the telegram is so I should have saved it it would have been cool to have a screenshot here but I think my mom threw it away or something and uh you can follow me at Twitter you can send me uh questions at Twitter uh if you need any help on a malware or anything security related uh I would be happy to help out just send me a message at the Twitter and this is a link to my uh blogs in uh r uh blog s so our team our team is called RC First Watch we're a threat research team so we specialize in a new emerging and unknown
threats so we work with our customers and also with law enforcement trying to track down actors behind uh any attack campaigns and uh you can also follow us in RCA First Watch that's at the Twitter handle and uh we blog in blogs at rsa.com all right so now the exciting part what's the purpose of the talk so the purpose of the talk is to understand the tools and methodologies behind the stering number of malware that has been discovered in a periodic basis and uh for us to be able to achieve that purpose I would uh first discuss the current state of malware uh the attacker's Arsenal so what are the tools the attackers use to
achieve the end goal of having an army of armored malware and then uh after that I will describe how they actually create the army of armor the malware and then after that we would uh I would be discussing what's the advantages of automation when it comes to conducting an attack and then uh I would give a live demo and hopefully it works so this is the current state if you guys have a security vendors almost yearly you get some sort of report that's uh that I usually call a Horror Story it's supposed to horrify you so buy their security products but it would always only tell you about the number of malware what they're saying is true uh
every year there's millions and millions of new and unique malware samples that are found like for this one this one was last updated in June 6th of this year and there's almost like more than uh almost more than 30 million difference from uh last year so the year is not yet uh has not yet ended but still there's already 30 million more samples that have been discovered so the keyword there is uh discovered so this doesn't account for the matter samples that hasn't been discovered yet so uh before uh when there's an attack being conducted usually it only involves one type of malware so when you capture that malware you create a solution for it you deploy that solution
you could actually stop the whole outbreak the whole attack but right now when an attacker targets an Enterprise or targets everybody that uh it can Target and get information from the attackers now uses an army of armored malware so it's not just one single piece of unique malware or one malware coming from a single family they're actually using different malware family in conducting an attack so uh later in the presentation I'll explain in detail what's the advantage of using a multiple M families when conducting an attack but first let's go to the tools that the attackers use so the DIY kits these armoring tools Packers encryptors joiners binders and they also use a scanners for
quality assurance purposes so going to DIY kits uh in 1992 this one came out it was created by a 15-year-old kid uh from Chicago so when this came out this was really very cool because you can create malware just by uh mastering the command line uh commands of this tool you don't even need to have Assembly Language skills to create their own malware but of course this is not as sophisticated than the ones we use today but it's more of like a hobby for this guy and as you can see here unlike right unlike now they uh uh put their name there or their code code name when they're actually releasing a tool like this so not to be
outdone another hacker group or uh I would say malware writing group the pmpc group uh the Falon Schism mass produced uh Falon Schism also created their own uh kit so they call it a mass produce code generator it's the same concept as a vcl uh you could create malware samples using uh just command lines you don't need to have Assembly Language skills now fast forward 15 years 20 years we get kids such as a spy which is much more complicated and uh one of the main differences of the kids we see today from the kids we see before is that kids before they can produce finite amount of uh samples or fin amount of malare samples so if you can generate
all of those create signatures for all of those you you have that kit covered but most of the kids today they use time and date as a seed so as long as time exists they could create uh infinite amount of Mal samples so another one is uh Zeus also the same as spy finan uh they they target a financial uh credentials so these actually are competitors but as you can see here it's very easy to create whatever you like uh you could actually uh create a config file putting there all of the network resources that the malware needs and also the targets that the malware would like to Target so uh even though it's much more complicated than the uh kits
before it's still easy to create malware using DIY kits another difference uh from uh from the older kits before is that the uh The Manor samples that are produced they're encrypted so they're actually a different uh how they appear on disk is different for each generation and also how they appear in memories also different so the attackers they could actually rely on the protective mechanisms that's provided by the kids but of course when you're conducting an attack you want to be sure so they also utilize other tools such as uh armoring tools such as the upx uh Packer so upx is very common already you could actually unpack a binary uh that's been packing upx in
just seconds you just download uh on upx run that to and then you can unpack it or if if you're a hobbyist you can use a o debug you can unpack it in 5 minutes but still it's it's being used today now before the most the popular upx is the command line now you could buy something like this like the script TB who doesn't have any idea that they could download this for free they could ually some of them buy it from uh people in hacker forums for like $20 $30 $10 but that one thing you can see here even though it says here upx doy the upx they seem to be different and they have
other options but then when you look at the codes some of the modules especially the one that uh that's respons for for packing the uh malware sample it's totally the same so uh my educated guess is that the people are creating this uh tool they just create those modules and then they just repackage it and then sell it for like1 $20 in D so the same thing uh this is a compressor so it has additional options it has anti-de uh debugger crack it can compress resources so this one is much more expensive you can buy this for $50 but the trick here is that when you're actually analyzing tools like this some of the options some of them
work some of them they really don't work they put something there you can take a box but then it wouldn't do anything in the executable sometimes it would just add their garbage code that actually doesn't do anything so it's because the way I analyze tools like this is that I have a very have what I call a pawn exe so it's an executable that I'm really familiar with so whatever changes is made to that executable I just remove those changes and then I reverse those changes so some of these uh options they don't really work so another armoring tool so actually like this one this one is in the cloud service uh not all tools you
can install in your system or using your system this one is available in the cloud so when I was doing my experiment before they were offering this uh for free but now when you go to IND detect. net if it's still alive today they would ask you for an access code and you have to pay for that access code now the disadv uh the advantage of whoever owns this service is that whatever you upload here they get a copy of it and then they can sell it it's easy to sell a malware sample uh in a different hacker forum or underground forums you can sell them for like one cent 10 cents so it it doesn't
uh like 10 cents it's a small amount of money right but if you sell 1 million samples for 10 cents that's a lot of money so I will show that later how easy it is for them to create a Mal uh or generate an army of armor that malware so another tool is a file Joiner uh during the computer virus days the infection routine is actually built into the malware so once it finds a host file it will infect that file so the infection routine is already built in into the malware but right now you really don't need to have that infection routine because infecting a file is already lame so most of the uh
deployment methodology used by malware it's a network related but of course it it doesn't uh it doesn't hurt an attacker if he has a malware file that can also infect other files especially if you're targeting an OS that you don't want your malware to be uh extracted from so this tool actually made the infection routine into a tool so the main idea here is that uh you have a malware sample you have all the features uh that you need for your attack that Mal sample and now you want to deploy it now for you to effectively deploy it you can attach it to different uh popular files uh you can attach it to a uh GI file an MP3 file a document file
or an execut exe executable file that's a very very uh I would say popular and then you could just share it online uh drop it in any uh software download download site you can spam it or you can deploy It Whatever put it in a USB stick and then bring those USB sticks in a conference like this and put them on tables and on chairs so uh then once they plug their PC it will launch the malware so those are different deployment technologies that you can use so this is the same thing ex bundle file Joiner they're the same thing so these two tools even though they they look different and they're packaged differently most of the modules that
that I found in them are very very similar so again they're either borrowing source code from each other or they're stealing from each other's uh source code for so AV scanners so of course if you want to attap something you want it to be uh undetected now this is just a I'm just showing here an example of uh of ab scanners but uh most attackers especially those that are sponsored by uh States or competing entities they have the budget to actually have appliances with them a very popular security product so they can test their creation against it especially those that use uh Network deployment Technologies they want to make sure that whatever Network protection and Enterprise they're
targeting is using they can bypass those protections uh but this is one example of a of a multi AB scanner so the main idea here is that you can run this tool and then it would check against all AB that's listed here whether their creation is detected or not is the a CL service or just the included the scard in the product already all the use vs total to do scale oh I'll go to vs total later okay yeah I'll go to vs total later so those scan engines is integrated in that software already uh uh uh no uh the question is uh he's asking whether the scan engines are integrated into this tool already so
this tool would actually download those scan engines so the scan engines usually they're the dll files or the executable files that's being used by uh the security vendors to scan your ab so what this does is it it downloads them cracks those uh engines so that they would actually work in the same box and then it could run those but then the disadvantage of them doing that is that sometimes they kill the dynamic scanning feature of this AV vendors so they're only left with the static scanning features of the vendors so uh there's still a disadvantage for that but of course there's a work around there they could just create a VM box install one
of those uh the the AV vendors they want to uh bypass and then uh subject their M Recreation there so if you don't have that Tool uh you can use uh virus total or no virus tanks but nobody uses virus total especially if you're going to conduct an attack because what every time you submit a sample to virus total they submit it to a different uh security providers but then again again the reality there is that uh AV vendors they receive like hundreds of thousands of malware samples every day and all of those are put in priority so uh whoever is the highest paying customer has highest priority for their samples to be processed so most of these samples that
coming from uh let's say from regular folks from the from the industry or coming from submissions like virus total they go to the low SK and sometimes they don't get processed like in months even years especially if uh the AV company is really busy on uh solving high priority stuff but here in no virus tanks they have the option that says do not distribute samp but of course we're not sure whether they're really Distributing it or not you can just stick here and nothing's preventing them uh from actually Distributing those samples all right so putting all the tools together so you have the DIY kit malware so the DIY kit malware will produce samples as I said it could virtually
create uh infinite amount of Mal samples because because it uses a time as a seat so as long as time exists it would create a unique Mal sample and then you can subject those to different uh armoring tools crypters realtime Packers and other armoring tools tools now right now they can afford to do this during the Dos days they cannot because uh as the malware file increases and the more Evas techniques you put in that malware file it takes on much more memory and much more computing power so every time that you run it on a system it would slow down the system but here since uh computers are fast memory is cheap hard dis is cheap they can afford
to do this and the good thing about here is that they could actually uh interchange all of these tools and then it would look different so not only in uh not only on dis but in memory and also the codes it would actually look different so if somebody is not really familiar with the tools or the DIY kit that has been used here they might label it differently so this is what's Skilling the automated systems of uh most security companies uh as you can see uh from the figure earlier that there are like millions and millions of samples that are seen uh every year nobody can process those even even if it's one company employing all of the AV
researchers in the world nobody can process those samples so uh everything is processed in an automated way so uh for for the millions doll sample do you have a number how many of them generate a b automation how many of them is just right yeah we will we will uh go there okay later yeah so once they're done armoring their malware they go to to uh through their quality assurance they can use the intercloud AV Services they can use their own pre premise AB they could use the appliance that they had access to to make sure that their creation is uh can bypass all of those security uh Solutions now here sometimes if it's a targeted
attack uh remember there there are two types of attack we have the opportunistic attack and we have targeted attacks opportunistic attack is what we see in public usually they're they're the ones who make a lot of noise they're the ones who uh infect millions and millions of people targeted attacks usually they target a single entity or single company and sometimes the malware that's used for that attack is only designed to actually infect a handful of uh computers so there won't be enough noise when they're doing whatever they're supposed to do now for them so let's say they're targeting company a and Company a uses a security product uh B so even though in their testing that
uh sample that they're going to use to attack that companies detected by any security product but it's not detected by security product B they would actually deploy it especially if it's a hit and run attack uh when I say hit and run they infect the system get what they want probably 3 minutes and destroy itself so they don't need to have that much uh time uh they don't need a long time for them to actually do what they're supposed to do inside the system so how do they get those information uh it's actually very easy if most companies when they're hiring people all of the information they need is in the is in the job sites so let's say we're
hiring a Windows 2000 uh server admin that's well vers in uh AC directory and is well vered in uh security product a managing security product B willing to learn this this and that so you all already have that information now if if uh the company is a security Savvy they wouldn't have those information but sometimes the people who are already in the company they would use uh technical websites especially if they have a problem uh with whatever they're doing inside their company probably they're having problems with configuration so they would go to exports Exchange post their problem and then use their real email there so that's also a Telltale sign uh I mean a another source of
information to tag that certain information to a company now some people they would use a fake email address but then when they copy paste an email coming from someone that's asking that question sometimes they fail to delete the signature of of the original uh person that's asking that question so so that information is still there there or you could just call the company do a social engineering and ask for whatever to just tell them you're doing a survey or uh when when I was uh doing a uh bs7799 audit it was like a security audit so some people you just wear a PO shirt get a clipboard you can actually go inside the company and nobody will
ask a questions and it has been proven time and time again that uh it usually works so let's say passes its uh AB uh testing then the attackers will have access to an army of armored malware but of course if you cannot afford the DIY kit like uh if you're familiar with the Citadel that just leaked out uh before uh in the underground it's selling for 40 Grand a pop but ever since it it leaked out it it's worth nothing so unless you have access to a leak DIY kit that you can modify or you can just use as is sometimes attackers just use old malware use old malware a malware that has all of the
features that you need run them through all of these armoring tools and they will become new again they will become undetected again I actually had a talk about this uh I call it the green malware made from 100% recycled malware so uh and it's very easy to find old malware just get yourself infected uh probably most of you have Grandma's aunts that are not really computer savvy they would call you the middle of the night telling you they cannot access the internet or something's popping up like a porn picture or something and you remotely connect to that PC and you can find a weth of maler samples from their system so if all of us have access to
those the attackers also have access to those because most attackers they share information better actually compared to uh professionals most attackers uh share information better than us so what's the advantages uh exponential effect uh the more samples you have uh the better it is for you you can sell if you if you don't want to use them in an attack you can sell them for one cent a sample or 10 cents some people actually buy
I'm uh I'm wondering who who's buying those if it's for sale for one cent uh is that sort of like licensing thing where if you want to infect 10,000 systems you have to buy this thing 10,000 times oh uh you buy it per piece so when they sell you something they won't sell that sample ever again or so that's what they say oh see so it's globally unique yep yep so by the way when I say unique here it's the traditional unique as what's described by uh most uh Security Professionals because you can just get a file add a add one bite at the end of that file take a hash of it and then
it's totally unique and most of the most of the time when you see the stats The Horror Story stats in the beginning usually that's how they define unique so when I say unique here it's like a metamorphic malware it's Unique on dis and it's also unique in memory so totally unique from each other so another Advantage is a easy malware update and of course the knock AVR anyway this is old news but I keep bringing this up because even though the AV industry knows there's a problem here the way uh they're tackling the solution is really slow probably because there's no budget or other marketing needs they prence over real research and finding out the solution
so the exponential effect you can get a Mal DIY kit you can create infinite amount of Mal samples take that one Mal sample run it into an armoring tool and just based on that one Mal sample you could create infinite amount of malare samples again so uh you can do that because some armoring tools also use time and date as their seed so so they have encryption built in they use other Keys found in the main malware or they can use a random keys and they also add date and uh I mean time as a seed so just to make it much more difficult for reversers so usually if you if a tool uses time as a seed uh it could
virtually create an infinite number of malare samples unique malare sample this process you're sh it's I guess kind of an encryption thing have any uh like standards emerged or common patterns like I'm thinking that this uh looks like maybe eventually would evolve into a little bit like feedback modes you know that's used in traditional encryption so I'm wondering is this always just random like the guy who's running it just makes up a pattern or uh it depends on the tool like the tool I'm going to demonstrate it randomizes uh uh the characters that it uses as a key so uh it it actually goes back to history but I I'll explain later why why so can run it again another Army
tool and then so on and so forth you create all of those unique Mal apps how unique are the samples that are coming directly out of the mare DIY kits I mean are they just adding a few oh they're totally unique uh I think they have their own metamorphic engine okay so uh and their metamorphic engine is not as limited as the traditional metamorphic engine so uh there's actually a guy who was able to statically uh decrypt uh Zoo samples so uh when he showed it to me he got like 100 Zeus example uh samples and then he run his to and then when all of the encryption has been uh removed all of the uh
metamorphic effects from the metamorphic engine has been removed when you compare all of the files you could see they're totally the same so he was able to uh extract the main germ of that maler samp so uh by the way if you guys hear the term germ it's actually a term used to describe the malware in its purest form a malware that hasn't been uh armored in any way so the main idea there is if you get a germ you get the you use I the pro whatever code you see there that's the main code of the Mal there's no uh there's no tricks to hide anything uh you're saying these things are unique in
memory so yep could you dump the process memory and and they would the same anymore uh unlike polymorphic malware wherein there're they're different on disk but then when they're running in memory there a polymorphic m still uses a three uh it has three components it has the uh encryption decryption routine it has the key or uh a location to the key and then it has the uh the malware code so uh in a polymorphic malware for the malware code to run it would decrypt it and of course in memory everything has to be decrypted for anything to run or for any data to be processed so once it's decrypted you can just grab that memory image create a signature for it
and whatever malware I mean whatever antivirus product use that has Dynamic analysis uh Dynamic scanning capability you could already detect that polymorphic Mal but here it's totally different so it actually doesn't use those three components it changed the game completely so instead of using those three components what it does it that it mutates the code every Generation Now how that mutation is done so the key there is actually uh reversing the kit itself so actually in the industry most people still approach malware problems wherein they reverse the actual malware sample that's uh that's seen in the wild when in fact the best way to do it is to actually reverse the kit itself when you reverse the kit
it also has different protections uh that makes it very hard to uh reverse now I I can stand here and tell you guys uh you can reverse this reverse that but in reality reversing takes a lot of time so uh during the Dos days I can I can uh reverse a malware in minutes so uh but of now it's much more complicated sometimes it takes me days weeks even months and uh sometimes I cannot do it alone I need the help of other uh reversers that's why if you would see uh when somebody releases a report about a certain attack about a certain threat sometimes it takes them months to actually release that report because
that's the time it took them to actually uh find out everything about that certain malware of that certain attack so uh that's why every time there's a DIY kit that that's leaked and the source code is available it makes it easier for us actually uh understand what that kit is uh doing so uh going back to to your question about the uh the randomization of the keys in the tools so there's actually a history behind that so before when a malware when an infecting computer virus tries to infect a file and it wants it to be totally different what it does is it has a fixed location in the malware code that says when you take a host if
you want to infect a host file go to this location get the first 35 bytes that you see and use that 35 bytes to uh encryp that malware or to do whatever you want to hide the real malware code and then infect that file now since that is very easy because you could just have like hundreds of a p that exe and you could actually determine where that location is and then you could get that key when you're doing reversing so you have the maler sample which has the uh decryption uh encryption routine and then you have the key because you you use it against a file that you're really very familiar of of uh yeah so you have the key you have
the encryption decryption routine that's the Mal code and now you can reverse it you can actually create a solution for it but uh in tools what it does it doesn't use that uh the technique anymore of getting random bites from the host file that it's targeting it's actually randomizing it on its own now it's not really random because there's still mathematics going on uh in the malware code that's actually generating those so uh if you're able to reverse one of those tools you can actually generate all of those uh random uh characters that it's creating it's like a DGA uh malware so if you're familiar with the domain generation algorithm like configure so when you
look at the different domains it connects to every day like 5,000 for I think 5,000 here 5,000 or 50,000 uh when you see there when you look at it with your bare eye with your naked eye you think it's random but actually it's not there's a mathematics involved for it to generate those uh random I shouldn't say random for those uh domains so same concept that's being used in in this now easy malware update so you can have unique samples for malware deployment technology so uh what's a malware deployment technology so it's a technology that you use to actually deliver your malware from the attacker to the Target it doesn't need to execute the malware it just needs to
touch the target so what are those very popular deployment technology you have a driveby download site you have USB sticks you have emails among others so let's say I'm going to create a very sophisticated attack I would like to spam 10,000 of my samples and instead of me using one the same sample for those 10,000 emails that I'm going to spam for each of those emails I would put their one unique malware sample so 10,000 spams 10,000 unique malware samples now the trick here is that let's say I'm company a I get one of those emails and then I send it to my AV vendor and then they sold that uh that uh that drer say that M that's being
deployed and then the AV vendor would say just uh deploy the solution that would be stopped but then the next day they would still be uh when they look at their logs they would say why is their information still going out of my company I already deployed the solution the reality there is that whatever they submitted to their AV vendor that wouldn't be seen anymore that would only be seen once that's why when uh I I really don't believe in describing let's say uh we're creating a report about a certain attack these are the hashes of the malware those hashes hashes are the only ones that you've seen probably wouldn't see them anymore probably you'll only see
them once you'll only see them in that certain uh deployment technology so your solution is only as good as the number of malware samples that you have even though you're creating ristic signatures now if you can stop that one what happen to the other 9,999 if it gets in if somebody's not tax savy enough in your organization happen to click that uh sample then you get infected already and also malware serving domains can rotate Mal in minutes so uh when it comes to a botn net attack the bot agent us uh uses different uh network uh resources uh they use it as a CNC it's their command and control and uh they also use it as a drop zone if they
information stealing malware they can use those domains to drop information and they can also use it as their malware serving domain now a malware serving domain is a domain that is responsible for updating the components or the configuration files of the malware inside your system so uh unlike malware before wherein everything that you need for an attack is package in one file in one malware most attacks right now it's composed of different components so uh you have the attack component so the attack component could be the information stealing uh component it could be the Dos component it could be the Trojan component you have the uh bot agent the one that's responsible for communicating with the network resources
you have the configuration files that would actually tell the malware how it will do its job or how it will uh how it how how it will communicate to the attacker now one of the component that's my favorite is what I call the hor CRS component so if you guys are familiar with the Harry Potter so the movie came out when I was in kindergarten so so hor Crux is that that component actually hides itself it's inactive and then from time to time it would actually check whether the malare is still there or not if it's not there what it will do is that it would L initiate the bot agent if it's still there to communicate
to M serving to the Mal serving domain to download all of those components that are missing that's why some people would say hey I just need my system how come it's infected again now if the Mal if the attacker has has access to an army of armored malware he could actually rotate it and not use a hcraft component he could actually rotate it let's say every 3 hours or once every 24 hours or every two days so the main idea there is that uh if ever they capture a malware in a compromised system and that M sample is sent to somebody who would analyze it and provide a solution for it and uh and speaking from experience the fastest
solution cycle time when I was in Trend Micro was two hours so if we if we cannot uh provide a uh this is this doesn't include clean it's only scan if you cannot provide the detection uh capability to our product we have pay those customers now for this one if it takes them two hours and then the malware and and then that attacker actually wants to update it every hour so by the time they deploy that solution it's long gone so they would say oh all of the solution has been deployed we should be safe but then again since attacker already updated the components with a totally different malware but with the the same directive it actually uh makes all of
those Solutions uh useless so that's how they knock AV out so see Chuck Norris is uh very powerful I think this is from uh Walker Texas Ranger so the Young Folks probably don't have any idea what I'm talking about but it was a very popular show and if you watch Conan Conan O'Brien he always makes fun of Chuck Norris when he was still in NBC so here when com to malware creation time versus solution cycle time you can create a new malware in seconds compared to a solution cycle time so a solution cycle time is the time a security vendor gets the malware and pushes out a solution for it so as I said the fastest
is two hours but then if if the Mal is very very complicated it takes longer than that and as I said it's only detection mechanism it doesn't include the clean capability because most Weare nowadays they're so complicated and they embed themselves so much in the operating system that removing them would actually kill the operating system so I always uh compare to an alien that uh that's actually attack the human body removing that alien also kills the H so it's almost the same thing unless you have that capability as they have in Falling Skies where they could uh remove that alien and then the boy still lives I have a lot of free time watch you all the
time and uh AV evation effect is effectively utilized so if you're a group of attackers that has been contracted by a rogue state or a rogue entity or a company you can only have evasion technology available to your uh malware tools or malware components when you have the guy who's actually invented that technology but nowadays you can just buy the tool and you could have that capability so whatever AV evation Technologies available out there most of them have already been uh made into a tool you could just buy those tools and then you could use them so uh the tools actually became popular because before when uh when the bad guys communicate with each other they actually
communicate with the person directly they don't know the person they use a email addresses uh they use whatever form of communication that of course they're not using their real name but of course when uh when crap hits the fan and they're able to trace those people then they could actually arrest those people so what they did instead of them communicating directly to their uh to whoever wants to conduct an attack they just sell them the tools so this tool is capable of this if you have if you want more capability I would release a tool in this website and then you could use this license key and then it would unlock the feature that you need so it's
like a buffer between them and the people who are actually sponsoring the attack and uh since as I said earlier since most of the uh samples that uh AV vendors or security vendors receive it's in the hundreds of thousands of course there's no way to uh process them manually all of them go through a uh automated system where the Mal is replicated even the signature creation Now is automated that's why the naming Convention of a is messed up so uh AV vendor copies the naming convention from AV vendor B and then C copies it from a d copies it from C and then just goes around so it's like that uh Hyundai commercial so uh toota says
oh Hyundai came up came up with this and then BMW would copy that and then they're just copying each other so it's the same concept and if you have a million samples in Your Arsenal chances are they would create 1 million signatures for it and of course this does not scale if you have a signature or any black list for that matter that's more than five Meg then there's something wrong with it uh five six years ago there was an initiative in the a industry which we call a patterns limit where in all of the signatures that are not needed anymore they removed it so that they could actually send that uh very small file to customers but but then when they
send when they send the same thing to to testers to test the detection uh rate of their AV they send them a totally different uh signature so that's how it works and uh most of the Mal were used in attacks some of them are simple actually companies would just say they're complicated because they're breached so every time a company's breached they would always say I've been a hat they use a very complicated but then when you look at the malware it's not really complicated sometimes you'll see it's a 10-year-old malware that's why I came up with my top the green malware because once you remove all of the uh protection protective mechanisms you'll find out that it's a very old
malware so uh yeah as I said here understanding the Mal requires reversing not just analysis and as I said before reversing is easier said than done it's very easy to say but uh it's very hard to do I remember one time uh some press uh went to our company to see what we're doing and he asked me what's can you show me the most exciting day you have in your work so what I did is just I turned on my PC uh fired up ID the pro and then uh I showed him uh a code of one malware I sat down and put my finger there and I told him this is the most exciting day
in my work so after that he didn't write anything about what I do it's not exciting to uh to probably to his readers all right let's go to the demo we have 14 minutes left so I'll try to make it quick so in this demo we will use uh
Zeus here yep so man reverse takes a lot of time you know there's many many samples you know but it's not skillable you know how do you guys address that you know how do you scale the the reverse stuff you know so uh the shortcut to that one is that you prioritize so usually most AV vendors prioritize based on paying customer or you can prioritize based on the not worthiness of the malware but still if at 10 or 30 guys we million million samples you know that's still not the same level you know yeah so you have automation is still key but you have to make your system smarter so uh are you guys familiar with the
machine learning so uh instead of of treating each malware sample as a malware sample you you treat them as data so even the all of the things that it prod in a sandbox each line of uh of of code in the report in the sandbox that's a data feature so it would actually help you cluster uh or I should say classify those Mal so the easiest is actually just classification answering whether it's malicious or not so it's just a yes or no answer but from the binary is totally different from the string you know they just everything you know it's totally different binary you know using machine learning can help on that stuff yeah because uh so even though it's
really really uh protected uh the machine would actually tell you whether it's malicious or not based on the data that the machine already know so for you to be able to come up with that if you have a massive amount of malware collection you could actually get all of the uh data features from that massive maler collection and the ma and the Machine or the learning algorithm would actually treat it as something that's malicious so uh it it's like telling it's like teaching the machine to say if you see something like this or anything similar than this it's uh malicious so uh so the the machine Ling you mean is totally static based right static based features you know not just
oh no uh it it uh you can feed in features there uh the features you could actually extract it from static analysis you can extract it from dynamic analys is so uh so what's your practice you know all some of the dynamic analysis you know features you know to the system on oh yeah yeah like this is like a complete paradigm shift so right now when people say malware analysis they would say do you do static analysis do you do Dynamic analysis so this one this totally different Paradigm is that oh by the way while I'm uh answering this question this is actually producing Mal and that's the number of Mal that's being produced as I'm talk so
uh so when you when you conduct static analysis when you conduct Dynamic analysis it will it will have output right so uh those output so they're considered a feature of the file that you submitted in your static analysis box or in your Dynamic analysis box now you take all of those data uh from known malware samples and then you feed it into a learning algorithm and then the learning algorithm would actually use those data that when you feed something in again let's say you use uh let's say you use 1 million mare samples so you feed all of those data and then you feed one malware sample so the simplest problem is a dis classification so just
yes or no yes for malware no for benign farm so if based on that data you're able to actually uh identify that that certain file that you input as malware then you're learning algorithm already works but of course there are different Mal uh there are different learning algorithms out there so you you need to experiment all the time sometimes the accuracy is 65% sometimes the accuracy is 95% so what you do is that out of the features you've actually put into that learning algorithm you remove some of them until you come up with a feature that would actually be much more effective and that the accuracy would increase so let's say one feature is uh
the compiler let's say based on your experiment oh I don't need to know whether it's Borland or uh Microsoft I can take out that feature and then the accuracy would actually uh what you call this shoot up to 95% so you can experiment on those so if people who are uh interested in learning more about this there's a very good book uh data mining so if you guys want to check it out it's it's really really very helpful so basically sorry so basically after the you find the features you do the feature selection right and you do the machine learning so what you going to do is you're going to generate your score and based on the score you generate your
priority list and then you choose thei the top in the priority list for reverse engineer right uh you actually uh in in in this uh thing that I'm saying you actually don't need to reverse engineer anything so this is like like a uh so it's like an automated system because if you if you reverse engineer one you're actually already defeating the automated system so you're trying to make it that much more intelligent so that you won't spend time in reversing the malware you only spend time in reversing a malware if it's totally an outlier so it's totally uh like why is this not working why and usually the system would actually tell you that it would tell you
these are samples that are outliers that really have nothing to to do with what it's doing but you know it's malicious so probably there's something going on there so that's those are the things that would be for your eyes only I mean for your for your review so basically you have some no that you know your system doesn't work detection system doesn't work you do yeah so that's why when you're doing that you actually need two components you actually need two skill sets you need a mathematician so that's where the phds come in and you need someone with domain knowledge so that's that's where the uh malware UHS come in yeah that's where the malware experts
come in so you marry those two and then you come up with this very intelligence system that uh you don't need any signatures the machine would actually tell you whether it's malicious or not but then again since it's a supervised learning whatever data you put there that's would that would be the basis of it if you put crap in it then it's also crap so is that just static characteristics that it uses or is it Dynamic uh it also uses dynamic un you unpacking to you and stuff like that yep yep so uh one example at least a Sandbox output so each line of those output that's considered one feature so that's one data that you can feed into the system
okay all right so while I was talking we were already able to produce 92 samples so if you have kids can go out to his baseball uh game come back and then you'll have this uh number of M samples you can sell or use on your own all right we have five more minutes and now we'll go to the other tool we'll do a uh quick demo of that
tool you know what's funny it's like uh when you have five minutes left seems like you're you don't have enough time but then when you uh look at Jack B the things he can do in a 24hour period is it's really very impressive
so here I would use a saw so this one just takes one sample from uh from the one we just created and this one uses a randomized characters the one I this is not this is not using time and data s so same concept you will see here the number of samples that it it would uh create based on that one sample so this is also where the green mware comes in you you just put here an old malware and then become undetected again but then again this is just one tool you could add more tools to this so we have uh we were able to create two three and it's it's like what one sample for for like 3 seconds
or not even uh 3 seconds all right so to be quick I was planning to create one million samples but we don't have enough time so so let's put it all together so what's a typical malware automated system looks like you can get a bare metal system very cheap bare metal system you could have a you could use a virtualization uh software to your kit use uh host your arming tools you could have a shared folder that would drop all of the created M samples to that folder and you can have an agent running in your bare metal system once it sees a sample in that folder submits it to the other uh VM uh
machine and then all of the things that are created would be put in an armored sample folder where that's where you actually get all of your samples oh not you I mean the attackers so uh that's where they get it and again you can just sit back relax line up for Man of Steel then come back watch the three-hour movie with a long fight scene and then come back you'll have this you can just relax be happy about it you can sell it or the attacker can sell it and do whatever he likes uh with those samples and life is good so that's it if you guys have uh questions again at tops can follow me in
Facebook uh LinkedIn if you guys want to link up let me know and uh also please don't forget to buy my book it's for my kids uh College fun and again uh uh questions violent reactions lot question all right so I know that the big fact to the machine learning one you know the industry is really sensitive about F you know even the machine learning can 80% sure is a mare but you still cannot release a signature for that you know I know had you know you know how they just from the 80% m to the final signature they do the to mixure yeah there's actually no Silver Bullet to remove false positives even configure configure
the 18 million domains it generates a year it's bound to hit two three or four legitimate uh websites so uh there's really no Silver Bullet for for false they just release a signature they 80% sure it's a mare or oh this one doesn't use signatures it you use a a a learning algorithm so it's like uh so when you actually find out the learning algorithm uh you could actually come up with an equation so that equation that's actually what's being used to determine whether something is malicious or not now I say that equation is what use it's very easy to write that equation but then coding that equation using C or whatever uh language it's lots and lots and lots of
lines of codes but then again it it might be 95% accurate 96% I haven't seen anything that's 100% accurate they toce this thing initially to get you know to have a good set or known malware so they know what the machine learning looking yeah this is this is very good when when you're doing experiments because one once you have access to a kit you can generate the samples and then you would have that data to feed into your system so at least you know and after classification you could go the next step which is a cluster so bling is that you're not only saying whether it's malicious or not you're actually saying now that this is malicious and this is
Zeus this is malicious and this is SP and you can go a step further you could say this is malicious this is Zeus this is created by kit version two this is malicious this is Zeus this is created by kit version one so it it just depends how how want how you want it to be that much more uh detail oh we have one minute left so I think I still have time for one more question who who's this guy I I think he's the guy from uh Duck Dynasty all right I have to stop and thank you again guys for coming I hope you enjoy the