← All talks

At the Mountains of Malware

BSides Charlotte · 201658:1781 viewsPublished 2016-05Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
About this talk
BSidesCLT 2016: At the Mountains of Malware Presented by: Wes Widner - @kai5263499 At the Mountains of Malware is a "how to" setup a malware pipeline of your own. This talk includes pointers on obtaining a steady stream of malware, extracting features from malware, and finally how to go about generating actionable threat intelligence from that malware. This talk will include hands-on demonstrations of each component of the malware pipeline.
Show transcript [en]

my name is Wes Widener I'm from mcafee global threat intelligence and most recently with Norse if you you're looking for some good lols go Google Norse lots of fun stories there so while it Norse I was tasked with setting up a malware pipeline and that's where all of this talk comes from it's basically a presentation I wish I had given to myself before I started on this project there was a lot of lessons learned and a lot of a lot of wrong avenues to go down so first off need to talk about what a pipeline is some of you may be familiar with pipelining instructions through a CPU simply put a pipeline is just an

operation on a set of objects in order to produce some sort of outcome but what's a malware pipeline malware pipeline is taking malware as you might imagine as the the raw feed coming in and coming up with intelligence based off of that malware now we'll get into what that really means later what type of intelligence to bring out but another key thing to keep in mind is that the malware doesn't want to be analyzed most people don't realize it but malware actually has a natural enemy that is the automated researcher who will go through and just download a buttload of malware try to pull it apart and then pick out interesting bits from it so malware

routinely employees what would be their version of an antivirus or an anti mount and anti hunting hunting stuff and we'll talk about the unique challenges that poses at every step of the pipeline so at McAfee Norse now at CrowdStrike common theme in my careers that I've worked in the intersection of big data and threat intelligence malicious stuff and before I keep on going I want to point out that pretty much everything that I'm going to reference here is malicious so buyer beware the so intersection of big data and malware that means that I'm trying to use tools to automatically pull in and run intelligence run it run analysis on this data but it's not just big data in the

in the typical sense this is data that fights back at every step of the way so another key point why would you want to set up a malware pipeline it's not just for security companies anymore malware pipelines are used all over the place but here are some of the products that you can pull out of a malware pipeline seed data for malware researchers like the gentleman that was just here before me and I think the one that's going to come after me both malware researchers malware pipelines help augment their their efforts presenting relationship between threats we're talking about different classes of threats so file threats to ip's domains all of that ends up building what's called a threat map

or a threat graph so all of that stuff is related to each other previously just to give you an idea of what's changed in the industry at McAfee in the global threat intelligence department they would track each threat as a separate category so malicious domains over here malicious files over here malicious IPS over here but the problem is a threat is really a combination of these different utilities so it would be helpful to show people the the connection and that's actually one of the things that Norse was known for was the map where we combined some of that data so third point here or third product of a malware pipeline is providing data to train

predictive models there is an enormous amount of malware that is out there but there's not a whole lot of new malware how many of you have ever played around with a malware building application like meterpreter or there used to be something way back when I was in Middle School of a malware lab where you could just point and click I want this capability and this capability and it would actually go and build the dot-com file at the or the dot exe that those types of tools make it easy to generate a brand new piece of malware brand new meaning that it has a new hash so we need a better way to detect malware not just hash based in fact many

people realize several years ago that hash based analysis is pretty much broken it's dying and it's never going to come back so predictive models along with that generating rules for automated protection systems like Yara rules or snort rules someone asked in the last session about Yara rules generating Yara rules to find ransomware there's different ways to generate Yara rules the most basic would be if you see this hash then it's bad that's pretty much it would be one of the fastest but also not one of the most expansive so generating better rules is what we're here or is what we're looking for with a malware pipeline what we'd like to do is say here's a bunch of different

characteristics that these files display and if they show these characteristics block and we can only do that with a large sample set so another product of a malware pipeline is to build better malware some clients really just don't want to blue team or block malware they would like to peek over the fence and see what others have been doing how they've been hiding from different detection systems and yeah use that to build better malware and the final one is just for lulz and profit it's actually kind of fun to go through and play with malware and we saw a video presentation now we're taking over a system that thick stuff is kind of fun and I'll tell

you about some of my exploits on that in just a little bit oh and this is a smaller sample size of people so if you have any questions feel free to throw them out at me I don't want it to just be a information dump but if there's no questions and I'm just going to keep on cruising so who uses malware pipelines I was asking earlier about the security community around here and what it looks like because most communities have a different flavor for example Huntsville tends to be more aerospace and Augusta tends to be more that's where the cyber security headquarters of the nation is kind of moving so now we're pipelines though are used all over the place in

well of course by infoset companies by big real big box realtor realest or retail giant's banks airlines and manufacturers everybody has their own malware pipeline these days for a good reason to if you really want to protect your users one of the best things you can do as us as a security team is to stand up one or more computers that no one is going to actually use for work but they're basically honey pots in your network they mimic what a user would have so if you're if you're getting a whole bunch of emails that are coming in that looks suspicious have a clean system set up that you can open those on and see if they are malicious and I

guess I'm supposed to add the obligatory firewall that system off and all that other good stuff otherwise you could shoot yourself in the foot yes exactly so this talk used to have three different sections sources mining and I would go into the data mining part but I realized that the data mining part actually stretched it way too far so I supplemented that with more hands on how to do this stuff so now we're just going to do sources and feature generation out of those sources but if you want to know about the data mining portion email me afterwards and out I'll give you some more information on that so to start with sources I guess my

daughters pretty much learned that if she comes upon a minecraft mod and it looks suspicious she'sshe's I've drilled it into her take her computer to dad dad will be ecstatically happy that there's more malware for me whenever somebody sends a whenever I get something like this an email which I haven't looked at oh maybe I don't have it on this computer nope so anyway whenever I get a malicious word document or PDF or something like that I'll stash those to the side and look at those later but even though we stumble upon malware setting up a pipeline is actually not that easy you can find one or two pieces of malware pretty simply but what we're

looking for here is a flood of malware so there's basically two to general broad areas of where to find malware the most important are the one that if you're in the threat intelligence business the one that everybody wants is the organic malware these are live malicious links these are these are the malware in the wild actively malicious now the problem with that is malware has a half-life of less than 24 hours especially URLs because sites like um Oh AWS and Rackspace and all these hosting sites they're very quick to take these links down because of that malware has gotten highly automated one of the automated things that malware uses a lot of is called domain generated algorithms

a malware author will reg sir hundred or so domain names and then their malware will be set up in such a way that it would generate one of those domain names to reach out to and so they basically have their own ad hoc network that they've set up so they count on being taken down very quickly as someone who wants to go find that I have to keep that in mind because what that looks like is if I'm running an automated script to go pull malware first I'm going to have to find lists of malware to go find and I actually have a handout that accompanies this talk that has several different free lists that you

can use if you want but one thing about these lists are you're going to have a very low yield of actual bad malware that comes from that you're able to download number one just because of this the host get taken down so fast the second thing is a lot of these links are set up to not actually give you the payload of malware unless you prove that you're that you're a user that you're a gullible user because of that we need to set up what are called either honey pots or honey clients a honey pot is just a think of it as a computer that signals that it's really vulnerable and you could come and do bad things to it or a

honey client is the same thing but with a browser and there are two free utilities I want to point out here along those lines in the talk I don't know if the description pointed out that I I was going to use some docker examples what's that in the description it doesn't matter yeah I think I pointed that out in the description but anyway one of the cool things about this and I have a slide on this a little bit later a lot of these security tools are pull or are freely available from docker as a full complete system so the first one that I point out is mal trev

so mal triva is a utility that has all kinds of free malware sources and these are usually these are organic and for those of you taking notes I'll have a handout that i'll send you a little bit I guess their network here is not going to allow me to pull down malware that's kind of sad but any rate this utility will go out pull from these lists of free malware sources and then go try to download those pieces of malware the I think I'm competing with others for wireless space too but thankfully I travel with an entire hard drive of malware just because it makes me feel all warm and fuzzy the somme altri will

go out and it will pull down malware usually in well look at that get into that in just second the other utility is called Thug thug is a honey client it it mimics of vulnerable browsers so there was some the dire that dy re these pronounce that I've only seen it in text you don't know do what dry it dried it yeah that flavor of malware there was at least one strain that wouldn't download unless you unless it detected several features in the browser like one very obvious one the browser had to have a view window of greater than zero by the hero pixels because headless browsers that's what they report and that's pretty easy for the malware author to

say it's probably a spider and I'm not going to give you the payload so other places to look for organic sources p caps I didn't put it on here but spam if you're a network admin you can probably pull these out off of your network pretty easily weird on honey clients like I said malware is well malware is big business hundreds of millions of dollars for these ransomware and other types of malware and zero day exploits are really expensive to get off of the darknet because of that they're not going to give these up really easily so that's why we have honey clients and there's actually several different flavors of honey clients and that's a

hot area of research exploits cost money especially zero-day exploits and malware plays hard to get not only do they want to know that you're a viable user but part of that being a viable user is having the right operating system having the right behavior a lot of malware now will actually have you go through several steps to show that you're interacting with the website or interacting with the malware in some way so the second type of source is a synthetic source synthetic is a feed of malware that somebody has put together some vendors put together and they will give you this is what they've seen it's usually because they've put together their own honey client or honey pots or

some other honey network and there's there's a lot of them that are free in fact if you ask really nicely to a lot of these companies they'll give you a researcher feed especially if you tell them if you promise and pinky swear that you're not going to resell it to someone else and there's also been initiatives from the the White House to kind of put together an official malware feed that any researcher can look at one of the best though is virus share calm it's Georgia Tech initiative they offer

virus share calm you have to ask him really nicely yay I did save that password all right if you ask him really nicely they'll give you a log into it it's not done they don't charge for it but you see the size of some of these files these are huge roll-ups 3337 gigs I think it's basically 7 terabytes of malware here do it yeah yeah these are huge so and lately in the last few months they've started doing multiple drops per month to kind of break up that huge file size so this is honestly the bread and butter of where I pull most of most of my malware samples well and a certain vendor that has an affinity for

me so if you're just looking for a specific piece of malware if you're looking for a dryer or ransomware or something like that you can pull down their md5 list and just search through that and find just the pieces or find which dump that those pieces of malware in that way you're not just blindly pulling down a huge roll up of malware so one thing about synthetic is that you're going to get a lot of malware and usually vendors aren't particular about selling you you know this is just windows malware this is just ransomware or something like that they're usually just a feed of malware and then you have to filter it yourself the two points to

note on synthetic malware feeds you need to pay attention to quality and overlap so what we did was we would pull down virus or virus share and then we would measure every other vendor against virus you and see how much of a delta difference they have against virus share another one to use is virustotal virustotal is a online service to look up hashes and information about malware it's run by Microsoft I believe that's where Windows Defender dumps all of their hashes and a lot of other places dumped into there as well so we'll get into that just second of information to pull off of malware yes Google okay I thought they were owned by my Microsoft

huh i guess i will explain why they're on Google App Engine any rate synthetic feeds are also usually enriched with data that the vendor is provided about the malware usually things like what they think the malware family is some of the better ones will try to tell you what cluster they've put it in so like I said earlier there's a lot of similar malware so fingerprinting the malware we're getting into just second as one of the is a hot area right now the last part about synthetics is they're usually really smooth so you're not going to have a whole lot of hiccups downloading it from these vendors as opposed to having to beat up the organic sources

just to get them to give you one out of a thousand pieces of malware that shut down so with that quality control for organic you want to know what the yield is the yield being the percentage of URLs that you get the raw feed / the valid binaries that you get and also controlling for duplicates because some some URL feed vendors who aren't either aren't paying attention or just want to inflate their numbers they'll give you a huge feed but it turns out that most of them are all pointing to the same piece of malware one guy actually yelled at us while i was at norse that his site ended up in a hundred different places because

he had one compromised link on his wordpress site and every link pointed back to that so it just came up in the reports that this guy had a huge treasure trove of malware on a site and it was funny because we weren't the only spider there's actually a friends of mine who did a research project besides DC as their talk for it where they did they crafted a piece of malware and put it out on de virustotal just to see how many other security vendors would go and get that kind of like a cuckoo for security vendors what they found out is that like I said earlier we're not the only ones that are running a malware

pipeline there's a ton of people that are running malware pipelines so valid binaries and then the diversity of domains you find a lot of overlap there synthetic what's their volume look like and is it reliable there's a lot of one-off shops that show up and they say we've got this feed of malware and they really just have one or two dumps of malware and that's it it's not a feed it's just a dump now a lot of the security vendors are good enough to say this is just a rollup it's not a feed that you can look at forevermore and others they kind of go stale so another part of synthetic like I said do they

provide additional intelligence like the the malware family name and the key is whether or not they try to fingerprint the file that's it for sources might have any questions about sources where to go get malware lots of malware just using that well using the free sources and we did have some vendors that fed us that was what accounted for the terabytes of malware that we were able to amass fun stuff so refining the data getting all of that data in is one thing but refining it is what we're all after that's the intelligence in the threat intelligence business this is where I slipped the docker stuff in how many of you have ever worked with docker before yeah

awesome so why did I put docker in here was it just to add another buzzword to my presentation yes but that's not all it's also in there because docker helps make things nicer so if you have you have ever tried to set up a project called cuckoo you will appreciate this that it's all wrapped up and cleanly available into a docker package that's important because setting up cuckoo is a pain in the ass by itself there's a lot of other good stuff to do to spend your time on not setting up cuckoo same goes for a lot of these security tools they're really great and the security researcher who puts it together bless their heart it works with the specific

libraries that they put it together usually security researchers aren't the most diligent about good software building techniques so anyway a lot of these security tools are available as docker images that also has the side benefit of making it easily deployable in any sort of pipeline scenario that you want to put together that means that your pipeline scenario if you Bill off of these darker tools is horizontally scalable so that's great especially as we've got tons of malware to analyze so when we go to analyze malware there's three different roughly three different categories from an automation perspective the first one is the very quick surface analysis we're talking about Yara rules earlier about matching hashes that would be what this

is just a quick hash lookup or bytes in a certain order specifically the magic bites when you run file over a over a file that's looking at the magic bytes so so running file over all the files that i've downloaded here and this is from a previous download attempt so it's a little bit old by now but you can see that there's a mix of p32 and HTML files in here does anybody know why there's so many HTML files in this malware feet yeah yeah yeahs emails malicious links yeah um sorry yeah yeah yeah when we set up the malware pipeline initially the question we had to come up with was are we going to log

error pages from the server because they could be just valid 404 pages like you're saying they could also be the malware authors attempt to trick the firewall into not caching that page and regenerating the page every time so your mileage may vary we chose to capture everything so we also get a bunch of things like this anybody know why this XML file is in here macro or windows internet explorer had a well older version still have this bug in it where the XML parser would allow a buffer overflow and you could exploit it that way so what you're getting from organic sources is also going to be mostly exploits the gentleman came up here previously talked about first stage

second stage third stage of malware infection the first stage is usually the exploit its finding a hole in some application something that they can get through to that would be PDF or Word documents or something in the browser itself just some way that they can execute their own what it's called shellcode the shellcode then we'll reach out it's a very small tailored piece of code it'll reach out and grab what's known as a dropper the droppers job is to go and get a bigger payload and put it in place the dropper then that bigger payload that it gets is usually the advanced persistent threat that's the the full payload of the the malicious file so usually well if you're doing an

automated pipeline you kind of want to break down what you're getting into those three categories as well in that respect most of these HTML files and the XML files are going to be the dropper first stage the pe32 s probably going to be the later stage malware and also I want to point out that the pe32 s are not all created equal you have all kinds of different variations within the P 32s and we'll get into that in just a second the another another one that you run into the archives the zip files that we that we saw and so there's Java so along the the counts of malware p32 reigns supreme that is the windows

execute packed executables right next to it and way down the ladder is Java executable or jar files and then way down from that would be exotic things like linux or mac or i think out of our entire library we had 3 iOS malicious iOS files mostly because it's so hard to get that to run so the vast majority of your pipeline is going to be working with probably pe32 unless I mean I started to stand up a pipeline just for Mac malware it's the biggest thing is filtering out all of the p32 what's what you're left with is maybe 300 a year or something this year it's supposed to double or quadruple which would be a

grand you know two thousand or so not a whole lot so surface scan very quick the idea here is an order of one operation so those of you who are familiar with the big o notation this is very important in big data because in our in our series of analysis we need to go from very fast to the slower so with our surface analysis we're looking at bit locations sha-1 values things like that and it's in this surface analysis that will also probably send these hashes out to something like virustotal and ask them have you seen this file I think virustotal is rate limited for the average you to what 10,000 a day or something so

what for what just for oh yeah yes yeah if you pay them a good chunk of money than they love it but if you're asking them the same questions over and over again then they send you a nasty email and you have to fix that all the kinds of stuff so the another key here is no dependencies that is this is not the place to do debugging or anything else and what we're trying to do is filter out if we can if we can with confidence say this is something that everybody's seen then we don't need to go into it now we're Mauer stays out in the wild for a really really long time long past

the active campaign you'll see echoes of campaigns that are just shared around it's kind of like the celebrities that keep dying every year because they haven't been heard from similar thing happens with malware people just share it around all the time even if there's even if the command and control servers that service that malware are no longer in existence so the second step once you've found something that looks like it's unknown or reasonably interesting would be a static analysis a lot of malware is packed or compressed or self executable files are also packed so malware tries to encrypt itself and hide itself like I said it it fights you at every step of the way and this is this

is the next real place where it fights you they there's a lot of malware authors who will intentionally add in tricks for a debugger to try to throw it off and if you if you really want to get into reverse engineering malware at the bite level open security com is a great place to start they have really good courses for malware analysis one of the things that they'll point is these like immunity debugger ore-ida pro they try to map out the sections of code for you automatically well malware authors they know what you're after and so they're going to put bite they're going to put code in there to try to trick the debugger and try to break the

debugger so there's a wonderful cat and mouse game there's also I noticed a really I think they're going to give out some of these goodies down here in the bottom I'll go ahead and point this out this is a really great book on that so black hat python by justin seats that's building a debugger in python and running that so a programmatic debugging which is great when you're setting up a pipeline you can set up breakpoints you can also do a lot of the stuff that you would do in manual analysis there's a lot of companies that specialize in unpack errs so there was one recent malware campaign that was using the nullsoft installation package does

anybody know what nullsoft is also used for any other application it is an installer but what do you know what software package that it's used to install no winamp when app is one of the ones that started that also used I believe for VLC as well it's an open source install installation utility but installers they generate usually you'll see msi packages they have a bunch of other files packed into it and that's what that's an unpacker so so malware authors they'll use sometimes they'll use something like a nullsoft installer but more often than not they'll make their own package system and it's not because there's is more compact or anything else it's because again they're

trying to hide from the automated detection so there's a whole range of vendors who will sell just unpacking utilities and they pride themselves on the number of unpack errs that they can support so that's a there's also some free ones but that's that's where you get in the difference between free and professional is the number of options the professional ones support so some other static operations checksums in the malware organic check sums are usually going to well there's a couple of reasons why the check sums are important one is if the checksum is invalid in the p.e that is the header of the PE if the checksum is invalid then it will give an invalid on memory layout for the file

and that would be really easy for the malware author to write something to jump around in the file so that's one reason that check sums could be invalid another reason that check sums could be invalid is that the a lot of the big feeds of malware they pull not just from malware that's on disk but malware that's in memory and if the malware is in memory as it's being run and something like an anti-virus program grabs a snapshot of it and pulls it up to the cloud then the memory locations are not going to match up with what the file has reported as especially if you're doing a process hollowing which is good process starts up and then the

malware just takes over that process and then an antivirus program or something like that oh I know that Windows Defender does this to some degree takes a snapshot of it puts it up online now the header will be wrong there and that that's important because if you're doing automated analysis you can't assume if the check if the header is invalid you can't assume that you could just walk the file like it would be a normal file to debug you have to kind of guard against that that has programmatic debugging implications the last two one here generating an abstract syntax tree AST that's what if you've ever used a debugging program like Ollie debug or immunity or Ida pro when they first load

the file you have this wonderful tree of execution that tree of execution is the abstract syntax tree so a lot of malware will try to short circuit itself if it sees anything that's out of the ordinary so out of the ordinary used to just mean if it's running in a virtual machine there was a time when when malware just would not run on a virtual machine there was a well-known trick called the red pill that malware authors would incorporate it's just a few bites and all it does is it would check to see if it was running on VMware or really just VMware I mean they had some checks for virtual box as well but that wasn't as

prolific so companies like McAfee would try to run a whole bank of systems running in VMware and just like this they would run their pipeline so malware authors would in the abstract syntax tree what it would look like is one block check see if its virtual machine and then quit execution well in a pipeline what you probably want to do is this other point up here patch the execution flow so it's it's a type of fuzzing fuzzing to get into that a little bit is usually when you think of fuzzing you think of finding holes in software that you could exploit that could that you could use from a red team perspective and and inject code into

from a malware pipeline perspective I fuzz the malware to see I want it to run to completion because I want to see everything that it does so from a static analysis perspective I want to see all the way down through reaching out to the command and control servers if you look at it as you know if you gamify it the whole process of malware analysis finding the c2 servers is the jackpot because if we can find that then we can shut down the central the points that control the malware and the rest of it kind of dries up so there's other things that are interesting to mine out of malware as well but that's one of the biggest ones

opcodes there's certain opcodes that malware uses that most normal software won't use so I mean for exploits it would probably be an op sled something like that so and the opcodes and combination of op codes is one of the things that machine learning is used for you take the file you pull out the opcodes and you have a whole bunch of op codes listed so for every one of these files especially for the PE 32s we pull out all of the opcodes and then we would compare those op codes and then there's a pattern that would emerge from the opcodes now this also requires you to have a really good control list to train your algorithms and and things like that

but just suffice to say static analysis one thing to pull out of it is op codes and the last one the last type of analysis for malware and arguably the sexiest is the dynamic analysis this is where I mentioned cuckoo earlier before i get to static i meant to show a few more of these so i kind of went off the rails so raid air and PE scanner are two static analysis tools to use with docker so right air is a static is a programmatic debugging sweet and PE scanner is something that you could use to pull out header information things like that the other static bits of data so dynamic analysis I mentioned cuckoo

earlier cuckoo is an open sourced dynamic analysis this platform coo coo works by the general idea is you have a virtual machine that is that has a snapshot to a clean state you inject malware into that that virtual machine you run it and you wait and see what it does and then when you're done you reset it back to the clean snapshot state and do it all over again that's basically what you're doing with dynamic analysis the problem though the trick is one getting the malware to run malware is actually tailor-made to certain operating systems in order to make the malware smaller and leave less of a footprint malware authors will make a lot of assumptions about where they're

running they'll assume that certain dll's are available so you'll probably notice you've probably seen this a user gets infected and you see a bunch of errors that pop up about MVC 40 dll or something the malware author is assuming that that file is available for them to use and it doesn't really matter in their minds whether it crashes on forty percent of systems it's just got to run on a few of those in order for them to infect and own that machine as a dying as a as someone who's putting up a pipeline the interesting bit for me is I need to have a variety of machines running and so I also need to kind of

tailor which piece of malware I'm going to run in which machine so if I'm looking at the PE header file I need to match like the dotnet version that it was compiled against with the machine that I'm going to run it on otherwise going back to virus share if you'll notice the files here start in 2012 the files in that archive in 2012 a lot of those are going to target XP or younger systems so they're not going you end up patching the virus out if you're running it on 2003 or 8 or 12 so you have to match the system to the virus unfortunately virus authors aren't going to help you with that by letting you

know but like I said earlier if you're setting this up for your environment if you're if you're protecting a network then you you were usually in a dynamic environment just want to set up the machines that are behind your firewall the ones that you know about I don't know what the school here runs but say that they run windows seven or so then set up a bunch of windows seven machines with varying different installation packages like different versions of office that might be run inside the organization PDF viewers stuff like that anything that doesn't run on those systems you can safely ignore as a as a pipeline as a pipeline architect i also want to turn off as many patches as

possible i want to give them malware enough chance to run so it's another tip that i learned don't go through and do the updates like you normally otherwise you get kind of frustrated because none of the bow horse running so a key with dynamic analysis is the hypervisor so the hypervisor is vmware or virtual box or qemu the the hypervisor is what runs the system that's on top of it along those lines there's also zero wine zero wine is a a utility for running malware in a in a wine type environment now 01 unfortunately hasn't been updated in quite a while they also use qemu so the point is everything with malware analysis comes down to the hypervisor if

I'm injecting malware into my cuckoo sandbox and I'm waiting for it to do something then what's the easiest way for a malware author to evade my detection do what yeah we were talking about it earlier malware but would just wait for a couple of days or so before it executes its nothing to them but it aggravates the fool out of me so if I own the hypervisor for the system and I guess another good way to demonstrate this with a slide background would be inception if I owned the environment then I can tell you all kinds of crazy stuff for example I can start the execution of the malware and then speed up the system clock and kind of warp it

into the future that way you think that you're you know a year or so in the future and I need to execute now or something like that combine that with fuzzing earlier great stuff so another factor in dynamic analysis with these environments it's you're probably not going to have a good time if you're malware can't get out to the internet a lot of malware first thing it does is it issues of pain to see if it's in a sane environment so I've had friends that have put together malware sandboxes in an air gaps environment that is no connection to the outside world unless it's malware that's specifically targeting like programmable logic chips something like that like Stuxnet it's

not going to run it's just going to try to reach out to the internet not find anything and assume that there's nothing you can do the days of malware just being a nuisance or just wrecking havoc on a system without any further goal are pretty much gone so malware now has a purpose so it's going to try to reach out and do you know it's going to make sure that it can get to its and it's not reaching directly out to its command and control servers it's reaching out to I forget what you call that tier of proxy servers kind of the the ones that mask the c2 servers it's going to try to reach out to one of those so

now that we've got those two basic pieces now what do we would do with them out with a pipeline well we can feed it back into itself now we're usually baguettes malware so you end up with this blooming problem where if you let it run for long enough it will go find more like I said droppers that's what they do it was mentioned in the last session that someone asked about a zip that contained three different pieces of malware well malware authors aren't dumb they know that they get there's power in numbers so one of those files you're going to click on and one of those exploit mechanisms is probably going to work so all three of those represent a

new piece of malware for us to put through the system and see what it does chances are they all go to the same c2 server so and if we know that they came in the same package we can kind of associate them together and say this was used as part of a set one key thing with a malware pipeline and threat intelligence in general is to cut out the noise there's an awful lot of noise and just by noise I mean in a cuckoo installation you're going to see a ton of requests out to Microsoft every time the system runs it's going to call it's going to phone home that's just a background bit of noise there's a whole

bunch of other tools that just that phone home for no good reason we were talking about it earlier on my TV phones home don't know why so because of that we need to establish baselines this will also help in your organization to to say this is what a clean system looks like this is what it does just left by itself it's not malicious it just does that good example would be a friend of mine emailed me a while back and asked what 11 e100 calm what that domain was and if she was infected anybody know what that domain is yeah google it's one of Google's back-end services and that the interesting thing is if you if you run

it through who is it some that Martin markmonitor yeah it's markmonitor registered to mark monitor so you unless you knew the baseline you wouldn't you know that that does kind of look suspicious because it kind of looks to the untrained eye like a domain generated algorithm domain generated algorithms have a whole lot of randomness in but most people don't really know that so with that I give you the link to the handout which has a lot more of the sources plus all of the utilities that I mentioned here and if you have any comments or questions feel free to ask now or tweet or email me later so with that I'll open it up and

ask questions I know that the next speaker is probably ready to come up here aging to come up here right you don't know okay yeah all right any questions yeah

so yeah the question is um the specific tools i use for static analysis and i raided your goodie box up here and found one of the books i assume that you're going to give this away i would love it if you could give this book away during this talk because this is actually really related to what you were asking I I will try to think of a challenge in just a second so along those lines to your question there are an awful lot of Python based tools so PE info or P file or P frame p info there's just a ton of Python utilities and I think Justin points out in this book that there's

there's a lot of libraries that he uses so programmatically that just fits hand in hand if we've got all these utilities that are already built on some of them like Mastiff things like that they're I mean they're a full suite but not a lot of those are programmer friendly so that's the real key is is it programmer friendly Ida Pro has a way to hook into it with Python so and of course if you build your own debugger than you can you have even more control so for static analysis we actually version one of the pipeline we scrapped in favor of reimplemented it all in Python just because of the amount of libraries that are out there now there are vendors like

reversing labs and you're going to pay these guys a buttload of money because they're good at what they do but there are vendors that will help you they will sell you static analysis tools now these guys specialize in unpack errs that's all they talk about they get kind of annoying about it but they are pretty decent about that so yeah we use a mixture of free and paid

yeah from what I've seen it's either free or it is just oh my God where did you come up with that number we actually had that with vendors as well so one of the things that I did for the organic URLs is I would actually price at per URL and there was one vendor that came out to like two hundred dollars a URL when everybody else was running like 50 cents or so and we had to go back to him that was a weird conversation yeah good

um yeah actually your discovery and where you go with the pipeline is radically different based on which one you're ingesting so if it's just the PE files then you're going to be you're going to be in peel & D buggy or you know disassembling windows files and doing all that stuff but if you're pulling organic URLs there's an interesting thing that happens there you're actually following the links online and you're basically you end up building a Google of malware spidering out there infrastructure and what you find is you may start with an HTML that points to an XML and in the XML it has a seat I or C data payload that you have to undo it's very similar to the capture

the flag stuff so you have to follow in and each you know the pipeline has to be built in such a way that oh I found this file and it had this blob of data I need to reverse that and go into it so and for non threat intelligence companies for most of these others that I've showed you like them the big box retailers all they care about is an attachment to an email so if it's not that and plus they can control the firewall a little bit better they can just say we're going to block all of these other domains and usually they have other security vendors in front of them that they can leverage

as well so yeah it really depends what you ingest depends on or that determines where you're going to go after it yeah so it's really it's not just a pipeline it ends up being a malware graph so it goes from this node at every piece of it ends up being really interesting all right all right well thank you guys