Phoenix: The Open Source malware analysis appliance

BSides Augusta · 201959:25355 viewsPublished 2019-10Watch on YouTube ↗

Speakers

Justin Borland Greg Olmstead

Tags

StyleTalk

Show transcript [en]

hi I'm Greg this is Justin we run a company called spark IT solutions and we are here to talk to you about Phoenix which is the open source malware hunting platform so we'll just get into a little bit of intro here excellent that it doesn't course it doesn't yeah I did you know all right well we're doing the sealed fashioned way okay so I did I don't know why doesn't know so well there we go there we go excellent so I'm Greg um I'm a software engineer like like unlike most of the people in this room I am not part of the security industry in my day job I'm a software engineer I started off at atomic energy

of Canada building tools for document management wiring and nuclear plants things like that I moved on to the financial industry to work for city in their global credit group building risk applications I then went on to RBC Capital Markets where I dip my feet in the management side of the house a little bit and running global sales and credit trading engineering basically building the trading software the traders used to move your money around I didn't like management so much so I left and went to Thomson Reuters where I headed up the icon apps development framework projects and most recently I've moved down to New York to work for affinity running all of the frameworks across that organization in the icon

space so my name is Justin Borland it's my my son there who's just the best thing the whole world so Greg and I actually started our careers at atomic energy together both of our parents worked at atomic energy and so as soon as we left school we started a company spark IT solutions on the chief operator and I mean then the worst way possible after that I went to to blackberry to work in the sock they're doing they are cut my teeth there quite a bit and then I went over to Equifax was the technical lead for the cyber threat center there I'm glad that we keep the streak up by not having a conference that we don't bring that up

that so keep you know keep that streak alive I guess but in all seriousness I I built the packet capture system at Equifax that we used to I'm the incident a big mullet deployment there do a lot of you know real IR work and and technical stuff and right now I'm at at Barclays adds a full remote employee so a lot of people give us slack for wearing suits but anytime I get to put pants on is a good day so that's yeah that's sort of me in a nutshell so we want to talk about Phoenix a little bit I'm gonna I'm gonna take a quick step back and just ask how many people here know what cuckoo is

show hands okay and how many people here and and with that show of hands how many people here have had an easy time setting up cuckoo in your own environment well you got an easy time we need to be yeah we should talk so wait a second yeah so how many people here know what misses are familiar with miss okay and how many people here know what Moloch is or are familiar with Moloch okay and lastly security onion should be a pretty big show you know one would hope okay so quick story we get into is so before the incident happened that at Equifax we were told that thou shalt security onion and we already had Moloch

globally and so our leadership brought Doug on and and Doug and I sat down and he says well why're you already have packet capture and all this what are we doing and I said I don't know but I'm being told to do this so let's do it oh by the way we're not we're not switching to net sniff ng or anything else we're going to use Moloch under the hood and so Doug and I spent the week where we were doing training about two hours a day integrating security onion on top of an existing packet capture stack already so it was really cool to actually work with them and I'm very thankful for you know being being chosen to speak here

today so but that's all I wanted to rant a bit so Phoenix in a nutshell is a standalone open-source malware hunting platform so you might wonder what that really means so at its core Phoenix uses QQ to run malware on virtual machines get the reporting off of it and allow you to you know understand what the malware is doing but what Phoenix does is that mer raise that up with mist Maulik and proper authentication and authorization so that you can share your your findings and your reporting properly with the people that you trust so you can detonate analyze and report on all the malware activity and Phoenix as a platform fully encompasses the workflow of triage analysis creation of

countermeasures and the validation of those countermeasures sharing the countermeasures to the to your tip the tip is integrated with with with Phoenix but you can share it to other tips and produce executive reporting with pretty bar charts and graphs and things that that the higher-ups will understand we try and index in store as much data as possible so that you can mine that data later one of the big things that we understood about the current offerings was that a lot of data gets thrown on the floor it gets cut and it doesn't get used and all of that is very very rich data that you could be using to protect yourself better later we've also made a

significant amount of improvements and bug fixes on all the core modules of Phoenix ie cuckoo miss band Moloch to make them work together better and to just make them work better or work great yeah yeah so life before and after Phoenix right again any excuse tip of my awesome little kid in a slide is a good one so why Phoenix right well to sort of go down that road you know we have to ask the question why are we reversing malware in the first place well to find out what it does yeah I mean kind of you know we want to know how to defend against it for sure but we also want to know you know what our adversaries are

doing what their capabilities are they're targeting motivation and you can only do that by measuring very carefully what you're doing so let's say a piece of malware comes across your desk now what right well one of the first things that we did when when we started building you know especially the UI and the workflow for how Phoenix works was you know we get anywhere from hundreds to thousands to tens of thousands of samples per day and the expectation is that we triage them we do we figure out do I care about this yes/no right and when you do that and you have a big team or a global team you need to find a way

to do that in way that is not work it's a byproduct of doing what you do and so in with with Phoenix you really get that litmus test if do I care about this yes no do I need to care about this can I already quantify this malware or qualify it right do I have countermeasures that exist you know are there additional countermeasures that can be gleaned TTP's perhaps right so when we talk about all this for us you know Phoenix is a free appliance people go and spend millions and millions of dollars on on sandboxes and this and that and they throw most of the data the vast majority of it on the ground and don't use it so

from our perspective the real question isn't like why would you use Phoenix it's why wouldn't you if you work for a big company and you and you have a big budget like this you know you could go and buy a commercial sandbox but what you wouldn't get is you know Linux support very limited the counter measure validation some folks will tout that they have that it's it's typically a black box type scenario you don't get any regression testing at all very little data to mine the there's strict licensing you're tightly coupled with the hardware that you are you know deploying on and that's typically by virtue of whatever virtualization technique they're using under the hood

so you know the proprietary data formats make each yelling a little bit harder you know the end goal for these vendors of these sand boxes isn't necessarily to make you know you a better faster stronger at looking at these threats are analyzing it they are incentivized the opposite way right they want to sell you countermeasures and controls so that they can make money right previously I've seen a lot of these sand boxes where the observables are just simply atomic observables one of the things that we spent a whole lot of time on when we rewrote the Myst module was taking advantage of relationships right it's one thing to have objects it's one thing to have attributes it's another

thing entirely to start relating those objects with attributes and other things and with other objects and doing that in a way that makes sense right so the last thing is you know there's no there's very little incentive to add custom intelligence to a lot of these tools they are vendor tools they make money off it you know it's a black box and for us it was just a massive opportunity to be better at knowing what the data is so before we get into the architecture we have a couple of questions already so the first one is does it have two-factor authentication no it doesn't yet we we plumbed directly into the Jango authentication mechanisms and we sort of

have that running intent and with all of the other authentication mechanisms through kuku and Miss pand all of that kind of stuff two factors on the list we just don't have it there yet if you are good with Apache to our front end is our front door is Apache too so anything you could do with Apache two you could in theory do with Phoenix as well is there a compare output from Phoenix with existing OC repository how good's your Python right I mean Phoenix is giving you you know structured open open type data that you can basically do anything with so if you are good at writing a couple tight little scripts you can do

whatever you want with it we don't proprietor eyes our data formats so you can fill your hat the other thing with that is that built into our dashboarding our miss tags so you can go and click on a tag and you've essentially got a a rest endpoint that you can go and curl and do whatever you want with if you want to go and iterate through that and smash that against something else then yeah it's it's fairly trivial so this architecture provides a bit old this is from one of our first talks and since since then we've we've considerably up the size of our offerings and done very little with our documentation in true engineering fashion so we'll go over

some of the stuff that we have here on the right side you can see the services the services are basically what you would expect to see in kuku except you don't have to go and build out and configure each one of them by hand so the QQ API is what you can use to submit files and URLs from your various other services kuku D is what is scheduling and managing your virtual machine detonations kuku Rooter is the thing that's allowing you to stand up and tear down iptables configuration - to handle routing within your your detonation systems and qqp is your processing module and the processing module is where a lot of the magic happens this is where you are able

to do a lot of reporting where you're able to do a lot of juicing like with volatility all of your memory analysis and reporting out to elasticsearch and running it through sericata all of that various stuff if you want to do something with the results of your malware analysis all you got to do is write a processing module it's it's an open interface it's very easy to write and we've written quite a few that get packed into phoenix KooKoo web and moloch is where your your ir or hunch analyst is going to actually interact with the system so as you would expect there is a web front-end the web front-end allows you to go in and submit

new files it allows you to look at your reports run your hunts all of that various kinds of stuff and you will see some of the screenshots later in the talk Maalik I'm sure a lot of people here are familiar with how many people are familiar with Moloch again ok but about half of you so Moloch is where you're gonna see all of your network traffic indexed and broken out in a very easy to search and very easy to to aggregate way and we'll see a little bit of Moloch and how it plays nicely with Phoenix very shortly at the top you can see your input feeds so Phoenix fully supports a lot of automated ingestion

from things like virus total reversing labs you can do it with your own API we're doing it right now from URL house we were gonna do it from another one I remember anyway but it makes it very very simple for you to build your own automated ingestion system and plumb that straight into Phoenix with TLP with different VPNs etc now speaking of VPNs down at the bottom is where we have our infrastructure we have Windows virtual machines that are the detonation agents so when you submit a piece of malware it will spin up one of these machines load the malware on detonate it report rinse and repeat now we've also integrated Linux now so we're one of the few

sandboxes that have Linux analysis available and it's fully ready to go works great we have MongoDB elasticsearch and a sequel server as our databases we've moved very heavy to the elasticsearch side using MongoDB as our source of truth because I'm not a big fan of using elasticsearch as a primary database so AB MongoDB is where we keep all of our data elastic searches where we make it useful and finally we have our networking layer now that says 33 open VPNs but you know funny story we were running on private Internet access PIAA we you know gave them ownership money and we had a good account and they kicked us off persona non-grata we were running too many VPNs so now we have

something on our on our test box we have something like 5 or 6 because we're running on north so if anybody has any VPN sponsors you know please come find us because we'd love to support more VPNs at the same time but they are in cross you know six different countries right now and they are fully persistent so some of the key features we got another ok let's let's still this question up there right now what platforms are supported so right now we are a Windows and Linux support is is built in we have had interest for a Mac agent you know we haven't quite gotten there yet but Windows and Linux are our bread-and-butter at the moment under the

hood this is all cuckoo so anything that if you have it working with cuckoo and you know how to detonate Mac malware with cuckoo come see us you know we'd love to bend your ear a little bit figure out you know what you're doing I'll be honest I just don't have spare Mac's that I can go in and burn like that ok we'll save that question for a little later so so some of the key features of Phoenix one of the biggest things that that we needed to do was integrate TLP a common complaint about sand boxes is that essentially if you have an account you have access to everything and we all know that's not

how the world works we have people that we trust and we want to share our information with and that's not necessarily everybody in the whole world so we integrated TLP with all of the services in Phoenix including cuckoo to make sure that you could either greenlight something so that it's available to everyone amber light it so that it's available only within your sharing groups which are again synchronized across the entire platform or red light it which means it's your eyes only we have Yara and Surakarta hunting which is a very important feature of Phoenix to validate your countermeasures and create new ones and those hunts on our TLP so you can run your yarra and Cercado files

directly on all of the malware that you have access to in Phoenix and none of the stuff that you don't the Mollick integration as I mentioned before also honors TLP so all of the network traffic juicing that you've done is fully available to you and Moloch and you can again make that available to the people that you trust we have per user virustotal and reversing labs integration in the system as well so that's not the same as the automated ingestion because that's essentially something running as a daemon outside of Phoenix but we also have a page within Phoenix where you can paste a whole block of hashes and your API key for reversing labs or virustotal

and have it pull down those samples run them within Phoenix just as if you had submitted them from your own local machine so that's that's very very useful to be able to grab stuff from there TLDR is a fun little API that that we banked while Justin pretty much banging together and it's crazy but it's really really good at correlating the hosts and network indicators that you get in your output from from your various malware samples the persistent openvpn circuits is a key feature you can go up to how 250 or I think it's probably something a little bit too many but it depends on how many routing tables you run locally so Charlotte call

it 248 that's near makes no difference a lot quite a lot we have a band searching with elasticsearch this is a really key feature because we have a really good sort of boolean query builder on the page that you can use to to build complex queries to so you can really find some good good Intel with the advanced search and it's really fast because elasticsearch is great and it's free as in beer and I mean free beer so I mean you can't really get any better than free so you know Sakai our hunt analysts a lot of the stuff when we when we were building this and conceptualizing this much of this was under duress right we all work incidents

every day and you end up having an incident come across your plate and man I wish I was doing X and it was automated because I don't have enough time to do this or that so a lot of this sended of being okay how do we get from seeing something and go working the full Oda loopback right and so part of that was when you when you have a particular piece of malware and it is new to you you have to go and write countermeasures for it what does that look like how does that work end to end and to take a quick step back at my day job we use Phoenix and we are programmatically detonating

stuff as it comes over the wire in our environment so we use a packet capture system from a vendor that shall remain nameless and we run yarra on that and when those files hit we pull those down and programmatically send them into Phoenix what when we do that we submit it with the priority of medium and the reason we do that is so that those jump the line over let's say virustotal submissions so we have Intel folks and and threat folks who can go and paste hashes in and do that that's sort of ad-hoc there's also an integration that we've previously done where if you have virustotal accounts and you share certain virustotal rule certain Yarra

hunting rules with a particular user as long as those yarra rules make meet a certain criteria things like the author being a valid user on your your Phoenix system it'll go and programmatically submit that and that gets submitted with like a low priority so that while there are free cycles for stuff that isn't crossing your wire in real time or near real time you can use those resources and try and harvest as many indicators as you can from external threat actors that you care about external sources whatever you want that still leaves the high priority stuff as being ad hoc so that if you've got an analyst and he's pulling forensics and needs it detonated now

that jumps to the top of the queue right so all this to go back and say when you are doing the hunting with things like you are in sericata as a currency for response analysts your goal is to create countermeasures that people can go and implement you know what have you done for me lately right and we wanted by virtue of you doing a hunt and writing the rule that's your paperwork don't you you don't need to be punished with additional paperwork for after you write the rule if you write the rule you use a nice template it has all the metadata that it needs to have and you go and hunt that when you're finished there and

it works and it catches what you expect it to catch you click publish to miss and I think is that on the next we have a little later long story short it it allows us to take a lot of that manual effort and automate it completely end-to-end so you can do things like pull your miss pin stands for new yard signatures that you can then go and push into your production stack if you're good with automation this system lends itself very well to to automate so I am a Canadian and I visit Canada frequently Christmas and stuff like that and last Christmas I went up to Canada and about two days after I got there

Phoenix back flipped on me and I had a segfault and I lost hardware and found out that I wasn't backing up everything that I probably should have been and so I lost a whole lot of time and and not much data but mainly time getting everything back to where it was so one of the things that we wanted to do and the only thing you can't call something in appliance if it's if it's you know not treated as cattle its treated as a pet and from our perspective that meant doing things like backing stuff up making sure that that's seamless to you know your user user base and stuff like that so we went through that pain and if

anybody does use the system that I operate you know apologies for the downtime but it certainly is best effort in addition to that when we went through a lot of our load testing on the hardware that we have which is about a nine year old Dell T 7500 with 192 gigs of RAM and a six disc grade 50 array and some SSDs you know we're doing 2500 samples a day with full 4 gig memory dumps being parsed and extracting everything that we can from so when we started doing that we realized very quickly that there was a whole lot of bloke that existed in in cuckoo that didn't need to exist and there was a lot

of stuff that should have could have been compressed and ended up saving a ton of space so for us you know we're looking at our burn rate for how much storage space we're gonna need for a year based on detonations per day and so on and so forth and that basically led us going back through and saying okay for every single file that we create on the file system if it can be compressed let's go and add compression right it's two lines of code and it'll save us money actual currency later on for buying new gear and stuff like that so let's grab a couple more questions here have you considered developing anything for containers or container ization so

phoenix is all containers everything except for the actual cuckoo VMs running in and Maalik are running in docker in terms of whether we have just you know we were looking into building something to actually do analysis on containers themselves not yet but I just wanted to make sure that we understand that the entire architecture is heavily leveraging Dockers and containerization are there any anti anti sandbox features malware that can detect sand boxes and go dormant yeah so that is much more of a guest Oh much more of a VM I'm going to probably move us to have a lot to do with what guests you're running in so long story short we do prep our guest

VMs and we also you can do certain tricks with things like virtual box or KVM or QA or whatever tool you use to do the virtualization that's a little bit out of scope but there's a guy on my team named David Heisey and he wrote a tool that helps prep your VMs for the anteye anteye world and that does things like generates browsing history generates PowerPoint file history Word document history Excel history all sorts of things that we've seen in a previous life or evading sandboxes so short answer is it's beyond scope for Phoenix the long answer is yeah people do it and I you know are we have people on our team that do that and we use that

finally for now there is is there a demo are you going to do one we don't have time unfortunately to do a demo we do have quite a lot of screenshots of the system that we'll be showing you and describing it in depth we only have an hour and we have a lot of content to get through so we decided not to go for the demo in this case if you'd like to see the demo or I'd like to see the system please find us we'll be around would love to show it we always love showing off Phoenix so we have no problem doing that in private for you and why did you name it Phoenix because making cuckoo

great again wasn't like it didn't roll off the tongue and it's sort of a tongue-in-cheek way of saying cuckoo is dying so speaking of cuckoo some of the changes that we made to cuckoo to make it better the API in cuckoo is entirely single-threaded so when you are submitting a lot of files from an automated submission system for instance from like a 1,000 that will flip over and die on you really fast for no good reason so we migrated the entire API to G unicorn organic horn to give it a multi to give it you know a multi-threaded environment and allow us to submit and process a lot faster we totally rewrote a lot of the

processing modules and the processing a subsystem itself to better play with a large volume of processing and then make it play better with the analysis part we would find that the analysis part ie the detonating on virtual machines gathering of information and the processing part ie taking that information and doing all sorts of great stuff with it would step on each other all the time they would they would run each other out of resources they didn't play nicely together so we implemented a lot of good technique like Ohio water marking or water gating and changing the processing media to do more in memory or all of it in memory whenever possible to really speed it up and make them play well

together and we'll show you some of the the net data results at the end of the talk that show you the does that mean we also implemented an auto tagging workflow with mist you know a lot of the time when you're processing gated into mist you don't want it to get lost in the din so you tag it with stuff that makes it obvious what kind of family it is or you know that helps you find it later so we developed a workflow that allows you to specify a bunch of very specific or very generic keys in regex and things like that allow you to automatically tag in and mask and that really helps you

organize your metadata and your malware in a way that you could find it again later so some of the other could change it's that we made the web UI was very old very difficult to work with and really didn't play with phones at all it so we we rewrote a huge amount of that we integrated a lot of you know much more current web technologies to allow it to scale properly with different desktop sizes and with phones and all that kind of stuff so now you can do your job from your phone that's great you could submit malware look at all the reports you know pivot into misspend Moloch and you know not skip a beat the

elasticsearch indexing so a lot of the original kuku depended on for a lot of its stuff especially for like the recent page where you would see the stuff that you'd submitted the search was inelastic search but it really didn't search very well it was just a very generic query so it was really difficult to use so we totally redid all of that made it really fast brought a lot of the information up to the surface so that you can very easily get a sense of the stuff that you're looking at in both the Advanced Search and the recent submission history pages and so that makes it a lot more useful we also now have an easy update path now I know with

kukiz 2.0 branch they have the cookie working directory it makes it a little bit easier to update kuku but it's still it's still a pain and all of our dependencies needed these updates as well all of our docker images for yara and sara caught hunting needed to be updated the actual web UI needed to be updated etc etc so now we have a very simple update mechanism literally just run update Phoenix on SH and you're good to go so one of the things that we you know rewrote from scratch completely was the the mist module so few of our friends when we were originally building the output into men chastise us very heavily for not using

relationships and they said you know the whole beauty of mist is that you can relate stuff and that you're identifying relationships and the relationship is almost as important as the data that you're looking at and relating so we rewrote the whole thing from scratch and you know a lot of that was meant to paint the picture of what's happening and as a byproduct of doing that correctly you essentially get executive reporting for this right so now not only are we you know adding the atomic observables and attributes for objects that you want to collect but we're also packaging up the controls in addition to that and relating those controls with the objects and attributes with the

relationship of mitigates so when you go and try and do automation or do some of this stuff at scale what you'll end up finding is that it's a whole lot easier when you write that information somewhere that can be easily pulled out later easily tied into into automation for other tools if you're doing any security orchestration step one is get your countermeasures in some place that's centralized right and so this is part of that path that you go down right so the C to drop files you know we want to relate things like URLs with the files that they delivered right the the sharing groups that we integrated basically every single user is its own

organization that's just the way that miss works so under the hood we have sharing groups and then the organization's are one or gives one organization per person and then that's how we manage the the t.o.p under the hood so the other integration for firmest is stuff like the recents page the Advanced Search dashboard in the recent page itself you'll see these check marks for whether or not you have a control either a yara or a sericata control and what's actually happening under the hood when you refresh that recents pages it is actively poling miss and the Miss Pei Pei and saying hey for this event do you have any objects that are type Yara oh

you do well now there's a green checkmark for that so that allows you very easily to take the recents page for Phoenix and use that as a work queue for building controls so now you know within seconds there's a big idiot checkbox if you have something that you need to work on there's even a radio button that says hide everything that has a control for either Yara or sericata because at the end of the day when we started using the system we're using it we're like yeah I don't care about this I don't care I need to hide this right so a lot of this stuff a lot of these features that we're talking about while they might seem sort

of menial when you're using this tool every day it makes things faster and easier to do so here's an example of some of the tagging that's happening under the hood so staged to download that's where we look for binaries that have been downloaded after the initial detonation so dead c2 if I seek reaching out somewhere but it can't actually get out right all of this stuff is stuff that you can go and start tagging to describe how that detonation happened and and you know what's interesting about it what's not if you look at the stage to download well now you make you might want your analyst to say okay we got controls firing on the first portion

of this and maybe stage one do we have controls for stage two stage three and so on and so forth right so for us it's just a better way to you know help us prioritize what to look at why to look at it right okay so we're gonna plug right along here with a couple more questions is there an on-prem version of Phoenix Phoenix is on prep so this is not a cloud service you install this on a big pizza box and run it on your network so it's perfect for air gap networks you don't have to do ingestion from you know RL or whatever you can just plumb it into your own malware pipeline so yes the answer is yes would

you consider integrating with mobile SDKs that is something we'd love to do I don't see why at some point it won't happen it's not at the top of our work key right now but it is definitely something that we take a keen interest in and again you know we are open-source we'd love to take contributions if you guys you know are interested in in building this by all means please talk to us we'd love to throw our resources in with yours if you guys have images that you're familiar with building for stuff like Android and you're comfortable sharing that reach out to us and it might be a quick fix to get those images into Phoenix mr. Nelson does

yarra talk these yard tagging to miss take advantage of metadata fields or just the name Yara tagging to miss is that auto tagging I guess so Auto tagging is actually used on any object that gets put in Miss period so for any object that gets extracted from the for any object that gets extracted from the detonation we can go and apply a regex to that and it described it as either interesting or not interesting and give it a tag number so it's not just yours it's not just surakarta rules it's file names it's it's anything that would end up and miss that you can auto tag ok so we're little behinds here so we're gonna

plug a right along the easy button install so this is the the the key feature for for Phoenix to be viable as a solution for businesses over ku-ku-ku-ku is is difficult to say the least to get up and running and to get stable and then once you add Moloch and Miss on top of that you're looking at days weeks months worth of work especially if you're unfamiliar with the systems to actually get something running so we created an easy button install to give you a single basically a single click or a single keystroke to get a bare server running you know Ubuntu running Phoenix in about an hour and it takes an hour because we have to

build the docker image for mist the rest of it doesn't take that long essentially it allows you to configure you know put in a bunch of configuration about how you want the install to look once it comes out hit the button go get a coffee come back in an hour you've got a fully set up feeding system ready to submit malware to just like that most of the interactive stuff is pre-configured so you you don't have to hit any keys once the the install starts running we have an integrated user and group setup process so with one script you can create users and all various systems put them into sharing groups relate them etc you don't have to

do any of the dirty work yourself so all of that works in tandem with the TLP and it's it's very simple to use now docker this is what somebody had asked before with regards to containerization we make huge use of docker because we are able to use it to scale out or hunting among other things so with the RNC ricotta hunting the processes themselves don't have excellent multi-threading built in so we essentially threaded it by spinning up docker containers so your yara or your Cercado hunt might spin up sixteen docker containers that each have a slice of the pie in terms of the hunt workload and they will complete in significantly less time than if you let see ricotta or

you let you're a multi thread itself because it doesn't or it doesn't do it very well we're also the doors open now to scaling to a cluster of hardware through kubernetes or mazes and as you can see all of the services that we have under the hood are docker eyes the dashboard which you'll see in the next slide is your first view into mists are into up into Phoenix this is what's going to show you account of stuff that you have in all of the different miss tags so that gives you a sense of the kind of stuff that you're capturing and it also gives you a heads-up display over your last 24 hours how much you've run how

much you've processed what's still in the pipeline you have quick search links to Moloch that give you different time periods of view on on different end reports and you have statistics on your box to see how everything is working along so this is what the dashboard looks like on the left you can see the missed tags and that's giving us a really good really good view of the stuff that we're seeing come across our desk and on the right you can see our stats this is not a very active box at the times because two in the morning maybe and the molix down below where you can see things like hosts and status codes the binaries that you downloaded

the certificates and the jaw threes that you come across all really useful information and you can break that out over three days one week or one month we also have integrated Cabana to allow you to get a view into all the logs on your system all the logs in Phoenix except for the hunting logs are piped up to elasticsearch as well so you have a single view over all of the the actual data that you've captured in terms of how your system is running what it's seeing are we getting errors all of that kind of stuff is available and then you can use gabbana to build really cool heat maps like the ones below now

they're a little bit difficult to see even for me standing in front of this giant monitor but basically we're looking at the source IP in IP tables the source IPS we're looking at users we're looking at the destination IPS all of that kind of stuff this is another one where we can see the hosts that we're requesting you know yes so this is more like users using this system and the troubleshooting stuff so if you have an appliance and you go and give it an appliance and it breaks and you need someone to help you you're trying to help someone fix that the first thing is okay well what are the logs say and if

we had to ask people to go and grep through logs you know every time they installed one of these to help us troubleshoot something on a git or whatever it wouldn't make sense so you know I'm a I'm a bit of a log junkie a huge elasticsearch fan hate log stash but I use our syslog and pump everything in to elasticsearch and this is the byproduct that you get sort of just built in because I already did it so we also integrated submission usage limits because we understand that some people like to fill that I had a little too much and so you're you know you end up completely monopolizing the system so we added a way to make sure that

everybody gets a default of 25 new reports a day it's very easy to administer it within the cuckoo admins console or the Django admin console so you can make sure that you don't blow up your system before you're ready to blow up your system the mobile-friendly layout as I mentioned before this is where we start to enrich the data in the recent and advanced search views to give you a lot more heads-up information on each row but also allow it to be collapsed so if you're using a mobile phone it doesn't it doesn't become an onerous process to try and scroll horizontally through all your stuff bootstrap and data tables jas I know they're a little old but they work

really great and cuckoo itself is not exactly modern so integrating those was the simplest way to get the biggest wins in terms of organizing data making the screen and and layout responsive things like that hot linking missed results for a yahren sericata hits when you hover over the check mark on the recent page you will see the rule that is the control for that particular submission and you can click on it to jump right into Miss and see the actual details of that the hunting results you you now have a heads up display on all of your hunts and the results from them the the key findings in each hunt and then you can jump into

the hunt look at the stuff that you've covered off and pop it straight to miss so this is an example of our recents page on the files tab you can see that this is being taken on a screen that's the size of a mobile phone all of the data that you see in that gray box there would ordinarily be a column it's been collapsed into the into the drawer so that you can easily access it but while you're on your phone if they're collapsed you can scroll through all of your various hashes look for the ones that you want look for the ones that have controls etc this is the hunts the hunts is another view where you can see

the rules that have been hit on which report and the metadata for those rules and then you can jump into the hunt to see all the different analyses that that hunt actually brought the result out for this is the same thing for sericata you can see the pcap right up and right up in front of you and again all this stuff will collapse into a drawer when your screen isn't big enough this is what the hunt looks like when you actually jump into the hunt itself so at the recent page you were seeing the rules that got hit now you're seeing the signatures the file that it hit and and which report that file was submitted in so you can

jump back to the report in in Phoenix to see all the details about it so one quick thing in that particular file path up there you'll see the word memory right so this is where it becomes useful for when you're regression testing your yarra signatures are you writing a gar signature for a file or something that's in memory if you go and run that rule and then hunt it and you get results and you are intending on writing something for a dropped file instead of memory or vice versa now you can very quickly see oh okay well I'm false positive thing here's the files I'm looking at now I'll go in tune accordingly based on the data

that I have and the latest feature of this page is that you could check up the regression test that you've just run this this hunt the new yarra rule nice and shiny and you see that it's it's hitting on stuff that you weren't hitting before now you can tick off the ones that you care about and hit publish to miss that's gonna pack it all up into a nice little in a little box it's going to put into the Miss reports that already exist all of the are links and the files themselves don't already exist in a mess and now you've got a way to relate your previous runs with your new hunts I know you're

thinking I need pictures pictures are what I need right so this is an example of what it looks like when you have a rule will mitigate something you can see on the far right there with the green box there's a rule and it's mitigating what you can also see here is the relationship that's created automatically by having something download a stage two so you've got the file in the middle which was your source file that you detonated and you can see very quickly there's drops and dropped by and it was downloaded through a particular URL right so now you're able to very easily look at this and visually go and look at that and say okay well

there's stage - I know it's stage two because I see a little relationship there right and as much as a lot of folks don't necessarily agree that visualization plays a big part in insecurity I've I completely disagree and I spend the vast majority of my time actually doing data visualization so this was a really good way to not only get that in there but also make it such that when your leadership says wolf we had a red team event and they did something what did they do well go and look at the picture and you tell me what what can I explain to you on here because very easily you can give this not necessarily to your mother but your

seaso should certainly understand what's going on when they look at this right so your kata hunting yes so same thing one of the things that's really really useful about this on the right there you'll see the URI and if you're doing things like ja3 hunts with in phoenix and you're writing sericata rules looking for ja3 this will show you the decrypted URL underneath the jaw 3 now if you want to go and start hunting jaw 3 but you want a good way to break that open and understand what the actual underlying URL was now you can go and quantify that and say this particular malware that uses this jaw 3 has been seen downloading all of these URLs and

it's done by dumping al say secrets in Windows Land API fingerprinting so this is still beta go to the go to the next picture so this is one of those things we were looking at the data that we had specifically around API calls on things like PowerShell and what we found was a lot of the builders would behave the same way from an API call a Windows API call perspective and what I mean by that is they would use certain functions X amount of times or between x and y and by virtue of collecting this data we could then go and create a profile for what PowerShell looks like when it's what is this Pat

out yeah Pat oh so in this particular example we found that a bunch of these powder builders were using in some cases it was always only X amount for this particular API call or only X amount of that and essentially it's a fingerprint for binary itself and then the underlying API calls that are made and account thereof all right so we're down below 15 minutes here so we're going to try and speed this up a little bit so advanced search is the place that you're going to go to get all of the information that you need if you know what you're looking for you can search for it it's backed by elastic search and as you might imagine if you if you're

familiar with goo-goo there's hundreds and hundreds of fields that are stored as part of the output of the malware analysis so we combine lots of those fields like let's say all of the different HTTP fields all of the different URI fields into a single observable and now you can't construct boolean logic to search across all of that with you know breakneck speed and then directly pivot that in tomalak and you have the full drill down in mobile layout just the same way that we showed in the advanced search so this is an example of an advanced search where we're looking for sure clean pressure-washing comm as a domain and that's matching that on probably about

15 different fields in the actual back-end data and lo and behold we found exactly the one we're looking for the command lines were out there the state to URL is visible it's a white bells travels comm and then we can go and pivot that in tomalak get all of the actual network traffic that came out the other side and it's it's a super useful thing so what it allows you to do go to the evening the next time so what it allows you to do is take host based behavior indicators and allow you to pivot directly to the network traffic that's related to it if I see a certain obfuscation technique used with PowerShell every single time I see that

that's this particular flavor okay well now I want to go and do a deep dive on just the network traffic associated with that so now I can go from a host based hunt to digging into the network specifically so that I can go and create countermeasures that'll catch us over the wire when it happens you can keep going on this one or actually go back one sec so in turn one more so in terms of boolean here's an example of all of the raw responses that had a 200 and an exe and where the request wasn't an exe right so think of all the use cases you have where you're looking at a piece of

malware and you're like oh that feels weird waited in my hand it doesn't feel right and you can go and quantify out of that in this and then go start you know deep diving and for all you know this is just simply to flow bits rules right this is a request in a response there's no reason why you couldn't do this right in in let's say sericata visualization I'm again I'm a sucker for visualization I know one of the reasons why I love Moloch so much arbitrary visualization we can skip past this okay so Nathan's gonna love this one so see ricotta and and Yarra hunting built in this entire thing was to get from hypothesizing to

regression tested to mean time to deployment down right at the end of the day we are doing art and science right this is creating you know these countermeasures isn't isn't an exact science it's often trial and error and seeing what you're doing and how that works and you shouldn't be penalized for not having the right data and being able to hunt grit accordingly so for this you know when you have a feel something something feels you know not quite right go and start into it go and start hunting against it start writing expensive queries look at stuff and figure out where can you use this right we've used this to built a neom signals blue coat see pls there was

a blue coat Cpl we wrote that was good for a motet for like ten months ids pcap cyber chef recipes again shameless plug cyber saucier doing a talk in november on that OS query any of the endpoint data ja3 anything else that you have a tool that you can put like data into or make better if you know the data is here for you to write controls and check if that is valid and would hit in your environment so performance was you know one of the worst parts of kuku we we had i'm sure a lot of people can sympathize with that we needed to make this performance of such a way that we could

you could run you know ten thousand requests a day so one of the biggest problems we had was MongoDB is the bottleneck so we've moved the searching entirely into elasticsearch and we've introduced a lot more indexing because there wasn't any other improvements that we've made date/time curry integration from Moloch TLP so now we're only looking at the TLP between the actual range that Moloch cares about moving TLP into my sequel so that doesn't have to deal with anymore database pagination volatility multi-threading that out that was a huge win the water gating is we talked about before and the docker hunting scaling was all things that gave us the performance that we have today the the dock or sorry the

volatility multi-threading was between 10 and 20 times performance increase immediately off the bat yeah the multi-threaded API as we talked about before a huge for automating submissions compression for Ida Pro and the actual AAA stuff is the stuff that we really needed to have as part of kuku we've had some weird kuku issues I'm not going to go through them right now so let's talk about performance just briefly on this ok so one of the things that we added with phoenix was the ability to do all of your processing within ram disk so you can cut out a ram disk of x-- size depending on how big your system is in our case i think we

use 64 gigs or something like that and that's enough to hold memory dumps at the same time and then what we do is when it gets full we don't keep generating memory dumps because it costs you more to go and move that to disk then it's worth it you're better off from a work perspective to just do the work that you've just created and wait until you get down past that low-water mark and then start detonating again because you're not penalized for moving large amounts of data around these memory dumps are for gigs and up right so in this particular instance we were doing our processing on instead of a memory disk we were using a raid 5 array and

you can see the purple at the bottom there that's IO and that's penalty is what that is so if you go to the next slide this is what happens when you move it to ram disk so if you look at the yellow that's actually multi-threaded volatility using a hundred percent of the CPU on the system that is that warms my heart from a performance engineering perspective because you are using that box to the fullest ability possible you couldn't squeeze out more CPU from that if you tried and if you look but this guy is completely gone you're not waiting to read this or read that your bottleneck has been moved and that's where it gets fun so in this

particular example again spinning disks fat faster spinning disks but still spinning disks and you'll see you get bottleneck by the purple IO spikes in there and although you do get a lot of performance from this Ram is where you want to keep this so go to the next one this is sort of a run rate on how things move through the different states of cuckoo so things from running to completed to reported that kind of thing and what you'll notice is the slope is your detonations per minute or detonations over time so go to the next one the higher this slope the faster you are detonating mech where and what you're looking at is the media

change under the hood so going from let's say SSD to RAM to whatever I believe the first one is spinning this this one I think is SSD go to the next one it was the opposite so this one is SSD the other one is RAM so again as as you can see you know the the from a performance perspective we did a whole lotta tuning we really wanted to be able to squeeze every last drop out of hardware mainly because we're cheap so since we actually have a couple minutes left I did want to touch really quickly on one of the weird kuku issues that we had to deal with and that is Rooter spam

so if anybody's familiar with cuckoo Rooter it is that like I said the thing that allows you to do all of your IP tables all of your routing it stands up in pairs down routes as the processing start sort of stands up and tears down so that it can run stuff through the VPN the problem with it was that the way kuku engineered it they said they sort of started up every time a scheduler run oh you end up uh burying your VPNs over and over and over and over again and we spent probably the better part of like six months figuring out all the different ways that they were basically shooting themselves in the face not even

the foot because it would totally tear down the system and we even had a little cron job to stand all this stuff back up every 30 seconds or something probably like using routes back onto the routing table every minute on the minute it was impossible to work with so now what we've got going is a much better system for standing up the ruder at startup and managing it throughout the entire process so this is one of the reasons that you say well why don't I just use kuku I don't think it's that hard to stand up this is why because if you use kuku even the new kuku you're gonna have the same rep the Rooter spam problem all

of the time with you use Phoenix all of that stuff is gone and and it just works out of the box for you and if you're a super power user go ahead and use our Rooter yeah we cracked that nut you pull it out I mean it's it's using sockets so it's not like it's completely embedded within the system you can use the Rooter on your own by sending sockets so I actually use Rooter within our day job where we we do source based policy routing such that if you go in and you're doing man malware analysis and you set your IP to between 1 and 10 that'll make you come out of Romania and if you set it between

20 and 30 that'll make you come out of Russia and we use literally the exact same kuku Rooter that we package up in in Phoenix ok so we've got two last questions here one can Phoenix integrate with Active Directory for our back not right now can apache 2 is yeah a question and i mean it's it's also the stuff behind apache like mist and all of those guys like we need to actually bring all this together under LDAP so at present we don't support that but again we would love to have somebody come and work with us on that and my favorite question I think so far can the system be integrated with email systems

metadata links or attachments for users that are not sure that is great that's exactly what we want to be doing because it makes so much sense to have an address or a hook into your mail server to grab all those attachments and feed them into Phoenix whether you're sure of them or not it doesn't really matter this could be part of your front door so absolutely this is something that that needs to be hooked in right so one more thing in addition to that we are going to be updating the Phoenix repository this week if you are security onion users there is a soö watcher Python script which you can point at any directory and give it

API credentials to send to Phoenix and now you can watch any directory anywhere and have it auto submit to Phoenix so the use case initially was go and if I see any files dropped in this directory that match this set of yard rules programmatically detonate them in in Phoenix and at my day job we can go from clicked link to fully detonated in parsed memory dump and the full report in around 6 minutes and that's a global thing so that's the end of the talk now we have a couple ittle bit of times anybody else have any questions that they they held to the end yes right there what do you got yeah I really I

mean the more you can throw at it the better so you can throw this on a desktop and you'll get a thousand detonations per day right if you've got it's a box I have a four year old pizza box at work with 384 gigs of RAM and I can do 7,500 detonations with full memory dumps right there's six gig memory dumps github yeah will will share that I don't think we put it on to the the slice of course we didn't we're not smart people so we've got three things to give away here so I'll ask the first question can anybody tell me the three primary components of Phoenix right there right okay you got some packet squirrel nuts

for networks Justin what do one now go ahead I don't know what's the module in Phoenix that's responsible for processing boom you get yourself a AC 600 USB wireless da there you go that's infer that over the peanuts okay and finally what is the multi-threading all right oh go for it what's the one thing that the Phoenix sandbox does that most commercial sand boxes don't do because everything at the hand I understand that all right you got a blue team handbook all right he's got the handbook already all right thank you everybody for attending our talk really appreciate it come find us if you want to talk Phoenix we're around [Applause]

Phoenix: The Open Source malware analysis appliance

Related talks