Presenting Threat Intelligence Automation Using Jupyter

Name: Presenting Threat Intelligence Automation Using Jupyter
Uploaded: 2017-11-20
Duration: 57 min 35 s
Description: Presented by Robert Simmons

BSides Delaware · 201757:352.0K viewsPublished 2017-11Watch on YouTube ↗

Speakers

Robert Simmons

Tags

StyleTalk

Mentioned in this talk

Tools used

Cuckoo Sandbox Jupyter Notebook LastPass MISP pathlib Python requests library Zeek

Platforms

ThreatConnect

Languages

Python

Concepts

JSON

About this talk

Presented by Robert Simmons

Show transcript [en]

all right morning everybody how are you doing thanks for coming out this talk today is about threat intelligence automation and specifically about using a software product called Jupiter notebook and first of all I want to sorry so I'm the director of research innovation at threat connect my team does research at threat connect into emerging threats you know stuff that is pertain to our customers and stuff that is interesting for us as well and so some of the methodologies that we use for our team you know what I like to do and what our team you know has done over the years is when we develop methodologies and have ideas about how to collect and analyze and disseminate

threat 'information and threat intelligence we also like to share our techniques and share the ways that we do it and I'll go into the specifics of why we do that so that you know I mean the obvious is making sure that other teams out there able to do the same thing at the same level that we are so today specifically we'll be talking about Jupiter notebook but before we get into that I want to talk about specifically what what is threat intelligence so if you're not familiar with this you know if you don't come from an intelligence background or if you come from you know if you're not in the threat intelligence or SETI cyber threat intelligence community or

anything like that there's some definitions and some you know ideas that I want to get out of the way at the beginning of talk so there are a few types of threat intelligence that we are not going to be talking about today but I'm going to define them just so that you know the different realms of what we're talking about so tactical intelligence this is attacker methodologies so understanding understanding attacker methodologies understanding their tactics techniques and procedures as far as threat intelligence in the information security world the the word TTP I don't like as much as the sort of more descriptive and using English words term of attack patterns so the different attack patterns that an

adversary uses to attack its target or its victim these are understanding these attack patterns this is tactical intelligence but we're not going to actually talk about that today operational intelligence this is details of a specific incoming attack so if you have you know if you're monitoring social media or your monitoring chat logs or your monitoring stuff and you have you can see an attack coming so if you've been able to you know go through the intelligence process and then you come up with you know an incoming attack of some type this would be operational intelligence knowing what type of attacks are coming at you at a particular time and also the ability to determine you know what your ability to

repel and determine future attacks so we're not going to talk about that one as well strategic intelligence this is sort of a very very high-level type of intelligence so this is it involves you know the decision makers at an organization and it involves you know understanding the risk involved in the different activities and systems that you have you know in your organization so doing critical assessments of risk and then communicating that to decision-makers so they can take this strategic intelligence and make decisions based on that but today we're going to be talking about technical intelligence so technical intelligence is specifically indicators of compromised indicators of specific malware malware families and determining relationships between our families and also

just understanding the details of malware which in you know the diamond model of intrusion and analysis these are understood as capabilities and we'll also be talking about the idea of infrastructure and so capabilities are things like a phishing email or a malware implant that are put on your network and then infrastructure are all of the IP addresses and servers URLs the transmission system that takes the information from the capabilities the malware and that sort of thing and then transmits that you know exfiltrated data or command and control back to the adversary so we're gonna be talking about that today and the intelligence process just to kind of continue on some of the the education part of this so the

intelligence process is a circular process so it's a it's a you know it works in a circle and you have you know you're planning in Direction SEC stage and then collection so we're not necessarily going to be dwelling in the collection stage today but the processing and exploitation so this is sort of the first the first thing you do is you have someone using software like Jupiter notebook and then their mind and going through the stuff that has been collected and turning it in turning it from raw data into information so it's not necessarily the point where you're turning it into analysis that's the next part so analysis and production is instead of you know you don't have

analysts looking at raw data you have analysts looking at information so the analyst looks at the information and then analyzes it produces you know an intelligence product and then disseminates that out to the you know to the customer or the client or whatever so in a different way I'll show you kind of where where what I'm going to talk about sits in in this process so this is the area that you're collecting data from so in our case here it's open source intelligence so you're just collecting malware from wherever on the internet that you can that you can so you're pulling in malware files and analyzing them and so you have all this data all this stuff out in the

operational environment and then you're collecting it from that environment and then after you've collected it you have a pile of data so then what do you do with this pile of data so this is exactly what we're gonna be talking about today so processing and exploitation where you take a pile of data and then use a tool like jupiter notebook and you turn that into information that an analyst you know a malware analyst or a threat analyst can then look at and make decisions as to what is actually happening and then you have intelligence at the end of that so i also want to talk about where you know what why I want to share techniques like

this and why I want to share processes like this so David Bianco has what's known as the pyramid of pain and so this is from the adversaries perspective so from the the individual or the group that is attacking your network or attacking your organization what what are the things that you can do what are the things that you can take away from that adversary and how painful is it for that adversary to replace it with another of the same thing and so it's really trivial if you you know find hash values of the capabilities of the malware that they're using against you so it's quite easy for someone if the and specifically if you are an adversary

and you have compiled a particular piece of malware it's very easy if you've already deployed that malware to change it a little bit recompile it and then you've taken away the hash value so that the malware itself is slightly different and so you know that that sort of really basic data point of hash values is is you know eat trivial for them to regenerate IP addresses so this is you know an IP address of a server that you have either compromised or if your on your own you know if you're running your own dedicated server that you've bought to run your malware operation those are also they're not as easy to regenerate as this because it involves

like you have to go out and compromise another IP address or compromise another server where you have to go and buy another you know buy time on a virtual private server or something like that so IP addresses are slightly easier to regenerate after you've discovered the adversaries IP addresses and you've kind of taken that away and then domain names again domain names if you compromise someone's you know domain or if you have begun registering your own malicious domains there is you know there's there's either money that you need to steal from somewhere and then pay for these domains or you have to spend time compromising other people's domains and so this is slightly more difficult if

you take that away from the adversary and then as we go up so network and host artifacts these are not necessarily hash values but these are more complex kind of behavioral patterns and artifacts that malware leaves behind when it works inside of a computer and so changing the way that your malware behaves inside of a computer is is much more difficult for the adversary than just recompiling it and changing some of the data that's in you know a section or something like that and then tools once you know if you if you are an adversary or adversary group and you have developed you've had a team of developers writing a whole malware ecosystem ecosystem for you and that is

your tools your tool chain if you know if that gets blown and some want you know if it's known that this is you or if that particular tool stops working or there's some way to block it then you have to start over with that entire tool chain and you have to have developer time like working on developing a new different you know type of malware and then TTP's so if this is this gets more into the kind of the mind of the attacker where they tend to do the same type of process when they attack you certain adversaries will have that same kind of characteristic process that they go through which can include their reconnaissance phase you know that can

include any any part of the attack and so this this is the part so that this is the part that I want to tackle the most whenever we want to help other organizations and kind of share our processes what I'm trying to do is have a mirror of the pyramid of pain where this is this is the stuff that is annoying for the adversary or you know bad for the adversary and then sharing something that mirrors this is you know sharing something that helps you determine TTP's is kind of more difficult for the defender and so these are the things you want to aim for you know sharing the different things that you use to combat these different pieces

of the pyramid of pain and so you know what I wanted the the ultimate goal is to share processes and tools that let the defender you know kind of attack the attack attack the adversaries TTP's so all of that aside let's get into the really cool stuff so Jupiter notebook who here has heard of Jupiter notebook to people all right let me try let me rephrase the question slightly who here has heard of ipython notebook one nodding head alright so they're actually the same thing I Python notebook is the older less pretty not not JavaScript eyes not web version of Jupiter so the ipython project became Jupiter and they you know rebranded and renamed themselves so what this is is notebook

programming and notebook programming is based on cells and in in sort of the way you think about this as far as you know the the types of development processes and the tools that you use so I think everyone has heard of a repl even if you haven't heard of that term being used for it so read eval print loop what this is when you when you log into a command-line and then you type Pearl the ERL or you type Python the thing that appears lets you type in line by line code so the technical like the medical term for that organism that is a read eval print loop so an repl so a monolithic script obviously this is

where you open it in a text editor or some sort of IDE you know and you edit a monolithic script and then when you're done editing it you run it and then you see what its output is and then you change pieces of it and then you run it and you see what its output is so that's a monolithic script so notebook programming is some sort of hybrid where you know you get some of the you get some of the benefits of being able to run a line of code and then see what that line of codes output is and then you also have this sort of you have the benefits of looking at your monolithic

script at the same time so we're gonna see exactly what this looks like in a moment so before I go into you some of this some of the jupiter notebook stuff i want to share with you some kind of little cool details of stuff that you know i've just come up with on my own to figure out like how to make a notebook programming safer a little bit because you're gonna be working with api keys and you know credentials and so as we all know it is a very very bad idea to take your api keys and accidentally check them into github it happens all the time you can go and and you know crawl github and pull people's api keys

and email addresses and all sorts of other information out of there so one of the things if you use github to to save your code it's a as a revision control system you can check your notebooks that we're building into github and that's a very good idea so you keep your you know changes and and change you can see what changes you've made maybe revert back to old ones but you need a way to not have those API credentials and stuff like that just baked into your code never baked credentials and stuff like that into your code bad idea so this is a way that I found to you know make it a little bit

safer and easier to to use credentials but have them you know have it all work but have them safe and not actually baked into the code itself so this is so the this is all based on custom secure notes I know this is probably going into some really deep LastPass stuff but LastPass allows you to create custom secure notes to put all of yours you know credential material like passwords and API keys and stuff like that and so if you use Jupiter notebook and then I'll show you exactly how this works you can keep all of your credentials in LastPass and then use use the LastPass command-line interpreters so they LastPass as an open source project

called the LastPass CLI and so all of the things that you can do in LastPass you can also do at the command-line and so what I've done is I've written a little snippet of code which I'll show you in a moment which lets you actually access your LastPass keychain inside of the notebook and then you never actually have you do have those credentials in memory of course but you don't actually have them in the text of the code so they don't never get saved to disk and they never get you know sent out to your github if you just accidentally check something in and you're like oh man now I have to rotate API keys and such like

that and so this is this is actually is how you set up your custom notes but the first thing you have to do is to kind of determine the ID of that particular credential so when you do it's it's just like command line what you do an LS you know to list the files so this is L pass LS and then it shows you all the different credentials that you have and each one has an ID you know this I actually redacted this out there so you don't see my ID I don't know I'm just super paranoid I don't know if these are like dynamically generated for my account but I'm still not going to

share it but so you get that ID and then what you want to do is so this is my API key for my threat Connect account so so this would be the sort of thing this this ID is what you need to put in as an input in the snippet of code that I'm going to show you in a second so this is this is it so this allows anything if you load this at the beginning of your script you can load this anywhere so this will actually this uses sub process which a sub process is a way to take Python code and access command-line tools so what I'm doing here is I'm actually checking the status of LastPass

to see am I logged in like have identical in to LastPass already or not and if I haven't it actually opens a terminal window so that I can go and log in to LastPass and then you know go back go back to whatever I was doing before and then it'll get the credentials so you take it takes the unique ID and then you saw in that in that screen capture of LastPass there are a bunch of different fields for the credentials so whichever fields you have populated with data you just give it a you give it a list of the fields that you want and then LastPass kind of treats URLs the URL that you want to go to it's special

so you either say tell it like you have a URL in there or you don't and then this part handles grabbing the URL so there's like a specific command line matter for URL and then this one just repeats until it grabs all the fields then it stores all that in a variable and hands that back to the execution that you're using okay so use stuff like this I think I'll probably share this on github somewhere you know in when I get back later today but this is another thing and I've show you this so progress meters are really good because you know when you're sitting there working on something that might be long-running code and you've got a loop you don't

want to have the loop get stuck for some reason you know that you can't see because if you're not using a debugger like PDB you can't really see that something has you know hit a wall or you know got stuck in a loop or it's something like that so having a having a progress meter so that you can see that something is is happening is really awesome and this is an open source project TQ DM which provides progress meters for command-line Python and progress meters for in JavaScript in in Python notebooks so enough of that let's actually look at some Jupiter notebooks so the first one I want to talk about is cuckoo so who is familiar with cuckoo

sandbox a few people I know you're yeah I know you're definitely familiar with Google sandbox but so cuckoo sandbox is a automated malware analysis system an AMA so cuckoo allows you to take a suspicious or known malicious file and then do dynamic and static analysis on it so you run the file in a VM and then you observe the behavior of that file and then you see the data that the file lifts left behind in that particular VM it also allows you to do a certain set of static analysis so there's a variety of tools that are baked into cuckoo that do static analysis and static analysis is looking at the structure of a file without

actually running it and we're gonna look at a few of the specific tools later on but this is sort of an overview of cuckoo so let's take a look at cuckoo let me make this a little bit bigger is that good alright so what I'm doing here and you know this this ID won't get you anything this is nothing important it's actually gonna be rotated after I finished the talk so except I was trying to figure out like how can I not show the ID and still show the jupiter notebook and it's not possible but so this is the this is the this is the ID in last pass of the credential for my cuckoo instance and so you know this is

this this basically imports my that last pass module that we just saw in the previous slide the next thing and also you know I I will do the the live demo God's part of this but I'm not actually going to do the submitting stuff out to the network and then wait for it that's a little bit too risky for a live demo so I've already run I've already run this part of the script and so by the way just some Python who here is a Python programmer awesome so who here uses path Lib yet all right start using path Lib because they fixed it in Python 3.4 3.5 path Lib actually started working and it's not a pile of garbage

and so right so what path Lib does that unifies all of the different other tools and other packages and modules in Python so that everything uses the same like path syntax and so path Lib is awesome I'm a great fan of it now was not a fan of it recently but not but it is it is awesome now so you know what this does actually loads the sample dot exe and then this lets me send that sample over to my kuku instance using cuckoos rest api and also by the way i'm gonna show you in the the jupiter notebooks that were that we're gonna see today i'm gonna show you basically three three general paradigms

with jupiter notebooks and how to interact with stuff so here i'm gonna kind of show you this is how you're going to interact with an api if it's like out there in the world rest api so you want to use requests and learn learn the syntax of requests this is a great component to use to contact api's and then one of the other paradigms i'm going to talk about is using sub-process which we saw a little bit of which lets you interact with command line tools in your environment and then the third the third thing i'm going to talk about is actually interacting directly with files on the file system and then you know loading them and stuff like that and

manipulating them and also all the stuff that we're gonna see today is based on json and manipulating an investigating data that's in json format because all the api is that we're gonna work with hopefully most of the ones that i work with have JSON responses XML still exists but it's my grandfather's you know data format so what we'll see down here so this is this is basically I've submitted the file it was successful I got a 200 code back from the API and then this is some metadata this is metadata about the the file that I sent so it says you know uh task ID blah blah has been submitted to your kuku instance and so you know this is the task ID that

you want to ask that kuku instance later on for the full report so then I've got this I just double checked so this is the I have you know drilled down to that specific task a report ID so I saved that into a variable called report ID and then I go back to requests and I reach out to KooKoo instance and I ask it please give me the report numbered you know to five six seven oh three and so then I get a 200 response code back and then I take that response and then this this method called dot JSON is a convenience function in requests which just if you know that your response from that API is

JSON you can just do dot JSON method and then the content here is actually takes all the JSON and then puts it into a nice Python dictionary so that it just translates the Python into a Python dictionary so now here's the cool part these are all the different components of a cuckoo report and so we've got a variety of different keys in this report so this is the basic info info is the section about you know information about the report itself so this is like what time what time the the thing was submitted and some information about you know kind of housekeeping information about the the submission signatures so these are the various signatures KooKoo

signatures that might have matched during the the analysis run but what we're since we are talking today about technical intelligence we're gonna focus first on network and network indicators and pulling network indicators from this report and so I've gone ahead and done some of the work here so this is so this basically what I'm what I'm doing here is I'm just printing you're doing a pretty print of the JSON output of that dictionary so this is the network report so under the loop role so this is the this is the network report and I'm gonna focus first just in the interest of time we can you know I encourage you to go back and like do this on your own and

explore the complete report but what we're gonna do is focus today on the DNS part of the report because the you know DNS DNS requests that are going out from the malware can show you what the what the specific if we go back to the you know that hostname layer of the pyramid of pain this is you know domains and host names that have been registered or compromised by the adversary so we're finding those we're discovering those via the DNS resolution so down here what I've done and actually I removed the part I just I removed the cleaned up version just so that I can kind of redo that in front of you so you can see kind

of the technique for you know drilling down into this information so what I've done is I've used T qdm to give me a nice little progress bar and I'm just looking at each of the DNS entries you know DNS objects in the network report and so I'm just gonna print each one of those objects and like that so what I want to do first is see what is if the if this what I want to do is look at the first the request right so I want to you know print out each one of the requests and then put it with an answer and so we would do print request and request all right so we've got all those these are

the different these are the different things that were resolved we have a number of false positives here but at this stage at this stage you kind of want to know you don't necessarily need to focus on false positives you're gonna kind of collect the data from this and then use you know exclusion lists or something like that because obviously Microsoft calm there's Torito ipv6 this is a false positive this is this is the workstation checking its time and resetting the time this is some suspicious stuff so the dot eighty and dot I T and then Open DNS this is these open DNS and curl my IP my IP Open DNS and then this is a

pointer record so this is also each one of these is probably you know these are definitely in false positives what's happening here these my IP is this is the malware reaching out to these IP checks to determine the IP address of the victim that it's just compromised so these are probably going to be the the information that we want to focus on but let's take a look at what's happening here so for you what we want to do is say for each one of these and we want to look at the answers so this is this section so answers let's look at answers

for answer in request answers print answer okay so now we've got you know we have one-to-one relationship with each one of these but let's remove let's try to remove some of the false positives and junk data so we want to focus on the things that actually had a a record response an a record in the answer you know I could care less about C names and pointers and definitely not care about NX domains and so what we want to do is say you know if if answer if answer type is a record now let's now let's see what we get so we've cleaned that up down to just a records and then let's say we want to

act we want to clean this up even more so we'd say answer data date date top ok so now I've got this and then you know we can go a little bit further and make it clean this up even more but you can see now we've just got we've got a nice clean data set and then you can start removing stuff that is false positives so there's a ton of different things that you can learn from a kuku report so I want to take a look at another section which is fairly interesting so if you go so virustotal who's familiar with virustotal awesome alright so virustotal is a system for for you to submit a file and then the

virustotal behind the scenes runs that file through as many different AV and endpoint protection products as they can and then gives you the results from all the each one of those AV products or endpoint protection and then it gives you whatever the detection or the conviction results is from that product and so let's take a look at what that report section looks like so this is what it looks like you've got you know you've got a couple of you have a link that will actually take you to the report itself and you know I've already gone ahead and done that so this is the this is the this is the report is what the VT VT Sui report looks like excuse

me and so you can then look down here and we've got all of these we have the results here but let's go and look at scans so what we can do is look at the particular scans and then we're obviously we're not very interested in scans that have a null result so let's say so for for scan in there so dole scans print scan all right so we've got all these scans these are so there's a lot of different things this was scanned with but we want to focus on just the information that is important to us so we want to get rid of all the that null data and so what we would do is say you know if if skin look

at the format of it so if scan result is none let me just double check it might at once

result yeah I see it oh sorry hold on and scans go back Oh live demo

okay so scan so that's the scan name and in scans so that's a scan name and then I need another one for scan and then result right

all right yet yes yes you're right yeah for a scan for temp in scan yeah no that didn't work I need to go another step down yeah yeah yeah oh no no no sorry items skin and then results results

there we go sorry alright so now we've got all these so we can print the scan comma and then results and result right and then we can actually say if if this not then show me that all right okay so there we go so now we've got the the scanner engine the the scanner engine name and then you've got the the scanner results and so one of the things you can do with this information is you know look for you know rights and reg X's and look for some common commonalities across the results and then boil that down into a set of words and then you can have what all the AV companies think this thing is you know you want to

discount things like generic and throw out stuff like wisdom eyes and and you know Zeus and stuff like that but you can you know boil this down into like an approximation of what the AV industry thinks this particular thing is detected as so it's one way that you can actually get a head start on your analysis okay so going back the next thing I want to talk about is bro I worry about full screen if you're not connected which is weird all right so bro what we're gonna do is focus on one specific file and analyze it using bro and so what's interesting is you know if you kind of you remove all the different tools except for one

and then figure out what can I learn just you this one tool and so what I've done here is sort of a thought experiment with what can I learn just from the pcap of a sandbox run and so you know if you want to follow along at home this is the sha-1 of the file that we're going to look at here so that you can check my work and see it or you can see it for yourself so this is an excel excel file called document dot XLS probably suspicious already but so this is the these these are the log files that bro produces this is not the full log file full log files are huge and have a lot

of different columns but I've shaved this down to just some of the information that I find interesting and this is the command line way of working with bro but in a moment we're gonna see the Jupiter notebook way of working with bro but so this is the connection log and so again just like in kuku we see a bunch of false positives and information that you want to get rid of and so you know stuff like this where you know you have ICMP which is going to a local local address and all this stuff you've got some you know kind of these are things that you might observe but are non malicious and are part of the normal

behavior of the computer so you want to get rid of all that junk and then you know you what you want to do is kind of observe what your sandbox results are if you put nothing in the sandbox or if you put something benign in there and then you can see you know can see good and then find evil so to speak so this stuff obviously is not malicious traffic but it is also interesting to analyze so the DNS traffic again just like we looked at in the kuku results this is where you can find some of the network infrastructure and then this is HTTP requests that are going outbound from the sample by the way let's let's go

back to the fact that is an excel file right this is so an excel file is making HTTP requests that's extremely suspicious and then after it finished making those HTTP requests it's making some sort of request out on port five six five one and then bro doesn't even know what protocol that is or what service that is so that is extremely suspicious traffic right there and so and this is what I'm going to show you this is the command line a way of doing it and then we're gonna look at how to do this via Jupiter so this is in the DNS log so this does DNS requests that we saw just a moment ago s1y app files dot roo so you've got

an excel file that's calling out to a image sharing site which we'll see why that's interesting in a moment and then arm ancestry which is on one particular IP address by the way so this one yat files dot roo this and it has lots of different IP addresses this could either be some form of you know you know fast flux where they're you know wrote trying to rotate or doing round robin DNS on you know infrastructure controlled by the adversary but no this is actually a legitimate site and these are you know load-balanced IP addresses that they're using round robin DNS in a legitimate way so this is Yap files as we can see this is a file upload I know you don't

read Russian but this allows you to upload images so this is like a kind of like imager or something like that where you upload images and then that Yap the not-yet files but the arm insists so if you go to our menses this is one of those sort of gray area products that's like a keylogger you know that you that they sell to spy on your kids or whatever like that and it's called remote manipulator so again we've got like an excel file that calling out to remote manipulator infrastructure so we have a pretty good idea of what might be the actual malware involved here and so this is the these are the two H this is the HTTP log and

so it's making requests and by the way if you look here there's something very I know this is kind of small but it's reaching out to that file sharing site or the image sharing site for a JPEG but bro is telling me that that is mime type X Doss exact so it is probably not a JPEG file and then I just verified so I go and like visit the visit that JPEG to see if it's actually a JPEG and it is not of course you know cannot be displayed because it contains errors and then so what I wanted to do is look this this is fu ID so bro puts u IDs on every row of data so that when you can go from

one log to another and you can correlate which row and this in this particular log is related to a row in another log so when you go and look at the file log you can see that particular F UID and you can see that the that row and the file log is related to this row in the HTTP log so this is the extracted file from that particular row in the file log and then this is again I go back and these are the this is their really this is the c2 traffic going back to the command and control IP address of the malware so what have we learned from pcap by itself the adversary is probably

russophone you've got office documents generating network traffic the payload here is remote manipulator and then the payload is hidden on a public image sharing site as a JPEG so this is like an attack pattern as we go back to what we talked about at the beginning so let's look at bro so again I've got I'm using path Lib and all sorts of fun stuff like that and loading the all of the log so bro what it does is it dumps when you run bro the command line it dumps just a huge variety of logs based on each of the scripts in the bro scripting language and so you can write your own scripts that that look at the network traffic

and then produce you know a log of you know whatever you want it to be but I'm just looking at the ones that are out of the box included with bro so things like HTTP DNS connections and different protocols mail protocol and stuff like that so what you see here so I'm basically just loading these loading each one of the files into a data dictionary we've got ten log files and let's see what is the alright so the first the first log we want to look at is the connection log so the connection log is kind of the overall parent log and basically if you don't have a connection log then the pcap was empty

so the the pcap the the pcap you've got for connection log has each one of the each one of the packets will have a specific connection here and so this stuff you know what you want to do is look for a particular connection and then find the the ID in the next yeah so fine find this ID and then you want to take this ID and look for this ID in another specific log so you know this is ICMP so if you had if you this one is not going to have another log but if you look down here we've got DNS so you'd want to look for this ID in the DNS log and then that would correlate this

particular connection with the more detailed information that you get from the DNS log okay so that's so that's bro I want to look at to really quickly to static analysis fools so exif tool what exif tool does it shows you metadata that's buried in the file itself and so exif tool very nicely has a JSON output so if you use exif tool and then the switch - JSON you then get nicely formatted JSON output and so what I'm doing here is I've loaded the the sample DXE and this is so the remember of the paradigms that we were talking about this is a paradigm of using sub-process to access exif tool which is a command-line tool

and then still get a nice you know machine readable output from that and then manipulate it and analyze it using Jupiter notebook so you can see here XF tool with - JSON I give it the file location and then BAM we've got all this nice we have all this nice data that XF tool is taken out of the file so one technique I want to show you here which is kind of interesting which XF tool is used for so this time stamp this is the compiled time stamp from the PE information in the PE file so PE is the the format that Microsoft uses for executable file or one of the format and so what I've done here I'm not going to

go into too much detail about what what you're looking at here but basically what I'm doing is I'm taking the time stamp and I'm normalizing it into pythons format called date/time so that I can compare the date/time of the compiled time to another date/time out there so this is really cool so the what uh what I've done here is grab this and then I've got this is the so this is the normalized you know Python syntax for date/time unfortunately I wasn't able to get from whatever IP address I was using I wasn't able to get requests to reach out properly to this JSON feed of hybrid analysis but hybrid analysis is a site where you can

upload files and get dynamic analysis back it's a free site so one of the things that's really cool about this is I've got this analysis start time and so the analysis start time is very interesting because if I take the analysis start time and then I look at the compiled time of the file that was submitted if those two times are very close together chances are the adversary is the one that submitted this file to a hybrid analysis to see you know is it detected you know see if I can evade this particular flavor of sandbox so the analysis what I've done here is taken unfortunately I wasn't obviously I wasn't able to reach out to that data

using requests but I did dump it to a file called feed JSON and so what I'm doing here is I get I drilled down to the analysis start time and then I you know create a thing called submission timestamp which is in that normalized you know Python syntax and then what I do is and this is just arbitrary I picked two minutes you know off the top of my head so two minutes date time so I time Delta of two minutes and then what I do down here is I take the submission timestamp and I subtract two minutes and I see if that's less than the compiled time stamp so and if and if so I'll

print adversary and so compile time stamp and so it didn't it obviously didn't print adversary so what I did here just to show you what would happen if these two were close together I have gone ahead and manipulated the the number of minutes here and then changed the comply put in a fake compile time stamp and I've compared it so submission timestamp - two minutes less than the compile time stamp print adversary and boom so we've got you know an adversary testing testing their malware the last thing I want show you is strings and so strings can be used for a lot of things so strings basically is a command that looks for strings like words that are contained in

the data in a executable file or any file for that matter and what I'm specifically going to show you here is hunting for PDB so program databases when you compile a program you leave debug symbols in there for doing your debugging so many malware authors are lazy and they actually leave their debug symbol path in the malware so it's good if you've got a lot of malware files to hunt for these PDB strings so let's look for a specific PDB string here so what I've done is I've grabbed all you know amusing sub-process again I'm grabbing strings I've split lines so each one of the strings is in a different element of an array and so for string in strings

print string and then so there's as you can see these are all the typical you know this program cannot be run in DOS mode blah blah blah ah but let's let's do this so if we import regex and then say and then let's use the PD so I have a canned PDB string reg X so this will find PDB strings that are of this format and then let's say if so match

and then we'll see that in string and we need to decode it by the way because it is so let's run that alright that work and then if match if match print string and so let me decode that too all right so now we've got we have a PDB string here so users admin documents Visual Studio 2012 projects treasure hunter so treasure hunter is actually a point-of-sale malware and this is a very good indicator that what you what the the file you have that you're working on that it actually is of the malware family family treasure hunter and then if we had submitted this to kuku you could then look at those virustotal results and what I've done is I've just

gone ahead and just shown you here so the virus total results if you go and look for you know repeating words across the different results here you'll see let's see where's so treasure hunter hunt POS etc etc etc so what what the goal here is to go from just this collected data and then boiling down into information so that you don't end up like Charlie on on It's Always Sunny in Philadelphia so any questions

thank you very much [Applause]

you

Presenting Threat Intelligence Automation Using Jupyter

Related talks