
[Music] [Applause] [Music] hello everyone welcome to our session now you see and see and now you don't my name is amica schulman this is sufficient you can see the resemblance related we'll be talking today about how to build a potent infrastructure that is very resilient yet at the same time very cost effective and affordable so we'll discuss a little bit what brought us to do this research what was the motivation behind it and then we'll dive into describing how botnet infrastructure is built today and how researchers are tearing it down once we understand that we'll go and show you a few examples that we're able to pull through our research of abundant infrastructures that are very resilient very cost effective very cheap and we'll end up with discussing further research and some conclusions that we had from this research so without further ado why did we pick on botnets because every real large-scale cyber crime operation today relies on a stable functional botnet if you want to do carding if you want to do credential stuffing if you want to do brute forcing denial of service or even if you want to pull off a large scale sql injection attack you need a functioning botnet so there are these organizations and personas that they grow and cultivate botnets and they sell pieces of them to individual organizations to carry on their cyber crime operations there are these organizations who like to build their own proprietary infrastructure and use it for the cybercrime organization but in any case this is like the first step the cornerstone of the cyber crime operation so when researchers and companies and law enforcement are tearing down or neutralizing that bond net it means they are destroying the cyber criminal operation so the question we ask yourself can we build abundant infrastructure that could survive the current practices of neutralizing botnet and can we build such an infrastructure that is cost effective enough to be used by your everyday hacker and of course the biggest question of all is would we survive this exercise so take it from here stuff so i would like to begin with actually describing the methods and approaches that are currently available for thread actors in order to build and maintain the infrastructures for these botnets i think we can divide them into three main difficulty levels with the first obviously being the most basic one and that's when an actor manually maintains a pool of domain names it means the actor must acquire dedicated domain names or it can use a bunch of abused compromised servers on which it puts some malicious content to help them operate as part of a c2 infrastructure using this method the actor must somehow initially deliver this domain name to the victim meaning we will find it embedded inside the malware binary or or found inside a dedicated configuration file that will obviously be in proximity to the malware now an actor that would really like to walk an extra mile using this very basic approach can maybe have a list of backup domains but these as well would have to be somehow delivered initially to the victim meaning they will be in a configuration file on the machine or they would have to be delivered via an already live and functioning c2 channel moving to some mid-level practices this approach i would say is for actors that really want to feel like they're using some sort of cutting-edge technology but are actually still relying on domain names so that's when we come across binaries containing dgas so instead of one domain name per sample we will find the domain generation algorithm in it based on a random seed or a set of parameters and a prefix it will generate many many many domain names that will be registered as required so instead of one server per session we will see many precision which might seem confusing at first but these algorithms are embedded inside the binary so once i have that i can pretty much predict how it's going to look like and what's going to come next now some actors would be very creative very thinking outside the box and would leverage um some platforms tools and and infrastructures that are not per se were meant for communication and we'll use them as c2s so we can see the usage of profiles in social media platforms so things like facebook instagram twitter we see that cloud-based file share services are actually gaining great popularity right now so things like onedrive and dropbox and then we can see some actors that take pre-made um whole utilities that have a legit purpose in life and then attempt to use them for malicious traffic a great example i would like to give for this one would be the great iran-affiliated apt muddy water that had a huge scale campaign all relying on a legit i.t utility called screen connect to have some malicious traffic looking like i.t in the network i have here some more examples for the usage of cloud bill file share services so we have dropbox and we have onedrive and we can see in the highlighted parts that for these methods the actor has a set of limited api keys that has to be embedded in the code and a set of pre-known folder and file names that are in the code and are then exposed in the url for malicious requests so these are resources that are available to thread actors moving on to our side of defenders security researchers security vendors and whatnot and what can we do against these resources so i think that we have a pretty predictable workflow that has four main steps so we first must identify this such malicious uh resource we will then analyze some samples that are in our position we will enrich the data to make sure we didn't miss anything and will then have some sort of response and remediation process how do we even identify such melissa malicious network resources well we have the classics of network anomalies or high on standard ports we have the use of strange domain names which can be anything in the spectrum between a funny looking name that doesn't match normal traffic and all the way to ridiculous mimics of legit services and we can sometimes even identify a high volume of traffic between different endpoints that we know are not supposed to have so much data transferred between them we have ids alerts we have anything that is based on snort ruling and then we have the more advanced edr's that will help us dynamically detect any malicious or strange communication attempts from processes in the network so we now have a bunch of network identifiers in hand what can we do with them we can hunt for malwares so we can opt for publicly available repositories such as virustotal or hybrid analysis but we have these identifiers from alerts in the network so obviously they didn't trigger themselves so we probably have binaries of our own we can now begin with the most fun of malware analysis which can be anything from the simplest sandbox execution and can escalate all the way to full reverse engineering but doesn't matter how difficult the analysis was because the outcome would usually be the same which is more network identifiers so because we're now such suspicious beings we will go ahead and try to enrich all the data we've collected so we can do things like look at unique registrar data so if i have an actor that is utilizing unique name servers i will find additional infrastructure that uses the same ones if i have an actor that is using abused hacked servers it means that it has some paths on its own on these servers and some malicious content on it so we can identify its naming conventions and filenames he likes to use and then uh enriched based on the content itself and find additional servers like that and we can even look at digital certificates so these can be certificates forged by the actor or ones that were stolen by it and then reused now that we've really exhausted all possible leads of investigation we should have some sort of response so sadly this phase today is still mainly based on iocs and specifically iocs related to outbound communication we can see that it's pretty much the convention these days we have some alerts from israeli search us cert and and all big vendors use iocs meaning that everything we've collected in the research is translated to one so hash value for files domain names uri paths and parameters we would expect organizations to take these iocs put them in a deny list and block we can take response a step further and we can sinkhole known infrastructure so we can register malicious domain names ourselves and make sure that all live infections will now turn to us and if we are familiar with the back office of this actor and the commands that are supported by its malware then we can go ahead and have a pseudo server of our own that will send a disable or a kill switch to all infected machines when it comes to remediation we can try and maybe team up with hosting providers to take down malicious infrastructure or we can team up with social media platforms and have them closing all uh threat actor owned profiles and we can really try and clean up infected machines from all persistent and malicious binaries and we can sure try that so if we're looking at the resources available to threat actors compared to our defense workflows that even though they're not perfect they're not so bad so you might think that we might have a shot of winning this war against crime because domain registration is expensive even if a domain is as cheap as ten dollars it's cheap if i have ten not two thousand um and we've mentioned that we can track some registrations when it comes to social media platforms and cloud-based file share services actually creating the accounts for these profiles is very laborsome because they will require things like verify email accounts and then active phone numbers for two-factor authentication purposes which really limits our actors and also these are platforms that are trying to fight really hard against fake accounts and bot accounts so again not in the favor for actors we have great technologies of edr so yeah we can identify malicious network patterns we've got great personnel capable for moral analysis that can analyze captured samples and identify patterns of situ registration to block further infections and of course we can block and take over known infrastructure so when do we lose though usually when we're up against nation-state-sponsored actors these guys have great resources and just a great abundance of headcount and budget and they can have as many dropbox accounts as they can dream of and as many phone numbers as they like and generally can do anything in a gigantic scale this brings me back to our original research question and to our motivation can we create a botnet infrastructure that is mega robust and resilient that is available to our everyday friendly hackers we now know that it must be based on publicly available infrastructure it must be indistinguishable from normal traffic in the network cannot be ioc'd one that if someone were to catch one of my malwares or one of my requests then it will not affect the rest of my infections and my bots and it has to be cheap and cost effective so we're experimenting with a few ideas and we'll start saying yeah this is going to work and somebody no i can i know how to tear it down and then we'll take another one and we'll turn it down again but eventually we came up with a scheme that we're able to reconstruct with a number of infrastructures um and first example i'll talk about is spotify and the reason we picked on on spotify is because spotnet which is a spotify-based botnet seems like a cool thing to do but also because spotify is very common today and you would see a lot of normal spotify traffic within organizations as a regular thing and the other advantage of course is spotify has a very nice api that can be used to interact with the service um so we took spotify and now the questions we have to answer are how do we encode the data using spotify uh how do we make the botnet spotify traffic similar to regular botnet traffic uh how to ensure that the individual bots are resilient and talk a little bit about the registration and how the reach and how the infection and registration process is also resilient so first question how do i encode botnet data in spotify and we found the easiest way to put data into spotify was using podcasts everyone can put on a new podcast with episode on spotify we're doing that through a platform called castos it cost us 19 dollars a month cheap enough uh if you want to go cheaper than that you can do it by using an amazon s3 bucket for example you put all your content audio images descriptions and the xml feed into your repository customs in this instance and then you go to the podcasters.spotify.com and you just put a link to your xml feed now everything that you upload to your source customs for example would be updated in spotify and published within 20 to 30 minutes so we have a way to put in data and how do we encode our commands in that data well one obvious idea is to encode our messages within the audio files or the images that are accompanying the podcast and the episodes of course there is a lot of room there and you can put binary data the problem is that when spotify takes the information from the original platform it transforms it so you cannot just embed a binary within an audio file there are ways to do that uh you could do audio modulation i think that there are some people here who remember the sound of a fax machine or an old modem uh so you could use that you could use ocr in images to encode data not too complex not that simple um but turns out that if we want to deliver short messages which can be text encoded for example by using base64 we could use the description of an episode in podcast and it's long enough to include short commands and instructions it is long enough to include uh the url for further download of a binary or more data and it can include the identifier of a next message uh one other thing it may include i'll mention it here talk about it later it can include a digital signature of the message so we've decided that we are going to encode button messages inside the description of individual podcasts and each individual port in each individual episode we need to podcast would actually contain the identifier of the next message which is again a podcast episode could be of the same podcast could be of a different podcast could be of a podcast from the same publisher or a very different publisher and the way a bot is accessing the information is by accessing the url that you see there and as you can see it's a generic spotify url with a unique identifier for each message so clearly if you try to put this identifier as your ioc you are going into a race you cannot win because each message now has a different identifier and the episodes contain the identifiers of the next messages so what we have here is a scheme that makes it very hard to put iocs in place in fact even if you are fast enough to block a few identifiers of messages the bot can go back and ask for older messages using their identifiers and the bot master at the same time would go to the source of the data change the description of older episodes and the bot is now up and running again so it's very very difficult to take down this in terms of iocs and then later even when you found out that you're infected and you now want to identify all infected machines within the organization you'd have to go to all the machines that have spotify on them look back at all their internet access history and try to find out whether they were accessing spotify with specific identifiers that you know that are part of the abundant messages so so so this is nice but then you have these dedicated researchers who are saying well we're able to discover all the identifiers and then we took down the podcast itself and we actually made spotify close the account that was used to publish that podcast what do you do in that case so very simple what we're going to do in that case when the bot feels that communication is down we're using the search functionality of spotify and we search for some specific keywords okay that we can change from time to time and we can choose whether we want very strict search keywords or loose keywords if we choose strict one like google here we get relatively small set of results we don't need to go through many results by the way most of them are legitimate okay if we do a loose search then we get a lot of results it cannot be distinguished from normal queries in the network and what the bot is doing now it's going through the results looking at the description of individual podcasts finding a specific pattern there again this pattern can be changed from time to time and once the pattern is found we know that we have the new channel for the botnet and in order to prevent researchers from sinkholding this we're going to use a digital signature within the message so the bot knows that this is an original podcast which is part of the botnet and not just a researcher trying to tear down the network and again these patterns can change from time to time so it is very hard for someone to go and say well this bot has no communication anymore so probably the only question now is how do we bootstrap all that how do we register and find the first message in the botnet channel and clearly we're going to use the same technology we're going to use the search functionality with some search terms that are embedded within the malware distribution we can change them whenever we change the malware distribution we get a list of podcasts okay we search for podcasts with a specific pattern in their description we check the digital signature and then we know that this is a channel we can start working with notice that even if someone captured our sample and knows the keywords that we're using they cannot block infections using this same sample of course as long as the search terms are not super unique okay so we're very resilient here uh and and and quite frankly what i've showed you is is that we have a great scheme you cannot use your normal iocs inspecting requests to disrupt communications of existing bots deny the addition of new bots to the network removing accounts working with spotify to to clean everything does not help you tear down the network because it would be up and running in seconds and and finding infected machines is very difficult because you have to go through all the machines that have spotify in them and look very closely at all their spotify activity in order to understand whether they are part of the spotify botnet or not and and if we actually piggybacked on existing spotify accounts in the infected machines that would have been even harder for a researcher to find so it works perfectly and and we could stop there but i realized that some of you people today here would say ah but that's spotify and and we will not allow spotify inside our networks that's social networking that's not for business people so we looked for a business application and we chose discord because discord is becoming very popular and because it's a business application and it has an even simpler api and the questions we have to answer are the same questions how do we deliver data and apparently in discord this is very simple discord uses websocket in order to get data into the clients from the server each client opens a web socket to discord gg and text messages are just poured into this channel from the server if you want to convey binary data no problem you attach a file to a message then the message contains a url to the file which is actually stored in glue in google cloud storage with a unique url for each and every file so i can send text messages i can send binary data i can of course use digital signatures to make sure that no one is think calling me and just to