← All talks

Data Exfiltration Via DNS Lookup

BSides Belfast · 201647:0228 viewsPublished 2017-09Watch on YouTube ↗
Speakers
Tags
About this talk
Martin Lee analyzes DNS as a data exfiltration channel, demonstrating how attackers encode and transmit sensitive data (credit card details, credentials) through DNS queries. Using statistical anomaly detection on subdomain lengths, he reveals real-world malware (including point-of-sale systems) actively exploiting DNS for covert data theft, and offers practical detection strategies for defenders monitoring DNS traffic.
Show original YouTube description
BSides Belfast 2016
Show transcript [en]

okay thank you very much well good morning everyone thanks multi-label I'm the tech lead of security research at Taos which is the threat intelligence and security research arm of Cisco and I had the great pleasure of working here in Belfast the last year for the lodging so it's lovely to see so many friendly faces who I know in the audience and to be back in Belfast which is great what I want to talk to you about today really comes from a critical reflection aspect quite a long time working in the security research analyzing malware and looking for attacks and as you do on a on a flight or a long journey start thinking and start thinking well if I

move to the bad side and I actually started writing my own power and conducting my own attacks knowing what I do about detecting attacks what I actually do and how would I like go about attacking systems to try to find interesting things and I think this is actually a really really good point if we are defenders with defending networks and hunting for attackers it's often a good time to start thinking knowing what I know about this is but what would I do how would I attack it and if I was gay would snuck in how would I as a defender actually know that a particular so the biggest thing that I really started thinking about was yeah using dns

lookups maliciously and then how would I know are actually bad guys using this moment so lockheed-martin have put together this slime appeal chain supposedly to describe attacks which got all these long and sexy words in there weaponization and command and control action on objectives military-style language for me my own personal sniper kill chain wouldn't bother too much about weaponization or command control for me if I was an attacking a system it would be about getting inside it in some way much the same is joining kill me I'm a great magpie for day sir so if I was getting inside a system yeah like really subtle would be interesting data of some sort that I can

get my hands hold off and then I have to get that data back to I need to export write it in a Sun in some way so first let's think about this kill chain think how we'd actually get inside the organization if it was me very very simply I would send someone an email this is the electrical attack this is distributing Roseto ransomware for me my personal experience I think the far easiest way of getting a sidewalk translation is writing a very nice letter to someone please click this link you would be amazed of the group of people who just open that without anymore even if you give the training and you teach people within the

organisation and you say don't click the link on unexpected emails and they'll do the training and the go away so yeah yeah did he mask if you were to send them huh expected email afterwards there would still be a subset of the people would still click it and he said well why did you take that and they said well because it was interesting so there's a subset of users in any organization that are resistant to that message don't click the link someone will always do it interesting if these people from a different defenders point of view are actually gold dust because they are your Canaries on the network so actually these individuals in your own systems

when you find them they're the ones so that are going to have the interesting manner where of their systems so you know you might want to make it more difficult for them to be able access your beauty interesting systems but they're actually really really good to monitor interestingly in our own organization Cisco our security of this has any continuous program of sending attack email such as these fake ones that are generated inside to members within Cisco just trying to see how effective this message is of people can if you don't click the link our C zone is very very proud to say that the response rates 30 cars to his attacks range from 10 percent in sales and

marketing down to just one percent within the security business which is great I mean it shows that security people a good picking up these kind of attacks and I have to put my hand up and say I was one of the 1% but fell for one of his attack emails but in my defense it wasn't an unexpected email I expected it three days before his attack email came through I had a question of some worrying about the accountancy for claiming back expenses that I sent to HR so I long to make sure our ticket can be any information about can actually I wanted to play back the cost of my home broadband from my employer I do that

is it possible so I got an email from HR saying yeah we look into it three days later I receive an email from someone who had never heard before with attachment saying here is your financial information you requested Liam I was expecting this email it was exactly what I expected so I got the email I opened up the attachment and it said you Martin Lee have opened an unauthorized attachment frankly you should know better but it wasn't unexpected email I was expecting it so if you have to send enough email such as this here are the documents you were you were expecting somebody would be expecting the document and would open it and if you said enough

of these two largest enough organisation enough recipients you almost definitely get through no matter what we also spend in the industry a lot of time thinking about vulnerabilities and gentlemen EPSCoR talk about Internet of Things bunker vulnerabilities there is a lot of vulnerabilities out there writing software is difficult and if there's a bug in the software there will be something which we can exploit something we can use to get inside the organisation the good news is that actually means the low-hanging fruit these vulnerabilities that are very very easy to exploit there's actually less than I spot he's like this that are being discovered the bad news yeah we're still looking at about 30% of all vulnerabilities our

network accessible low complexity and require a little in the way of authentication but I would go back if I'm attacker frankly I'm going to go through the easiest way in and to be honest and really really unlikely to spend time writing some sort of complicated exploit to try and trigger one of these vulnerabilities personally I'd adjust an email nevertheless there are awful lot of these easy to exploit vulnerabilities that are out there this has not escaped the attention of the bad guys so earlier this year it was a gang distributing Sam Sam ransomware who were actually exploiting a single vulnerability their attack kit was only one exploit they were exploiting a six-year-old vulnerability in jboss

JavaServer piece of software we are identified or what we found that this gang was exploiting this one vulnerability and we spanned the Internet we actually found 3.2 million systems that were running this city rolled vulnerability unpatch on the internet very very very large attack service if you wanted to exploit that within those 3.2 million at risk servers we identified the world wide open to the Internet 2100 can actually already be compromised they already of web shell uploaded to them but actually the attackers just haven't got round to taking over the machinery in spoiling the ransomware they had too many computers that were vulnerable but they're exploited to actually keep up with a mechanism of installing the

amount of the malware ransomware encrypting the system tomorrow become the paper it's more than the connection manager so if I'm an attacker to be honest I don't think I'm going to have that much difficulty getting inside organisation more than likely I'd sent an email if I wasn't sent an email I'd look for a relatively old bumper ability or easy to exploit and just get it to get sprayed in once I'm in there okay I want to find some interesting information do we think that's difficult realistically we have to do is look at the news headlines and we find lots and lots of documents interesting information which is being leaked which is being lost by attackers after they

get inside the organisation so I don't think finding interesting danger is going to be particularly difficult for me what might be slightly more difficult is actually getting that information back to me now if you're attacking you got really a limited number of ways that you could get that information back to you once you compromise the system and we're on it you then want to get so you say we've got evil domain malicious Lockhard where we're sitting we're hoping to get that data exfiltration we could use any of the more mechanisms of data transport over means we can FTP R we could login over SSH use SFTP and transfer the data we could even put in in some sort of web

request or something or even you know maybe post it to Facebook and get it back that way over the web lots of different ways that we can do this but if our sis admins are doing their job correctly then I would really really hope that they put in place some kind of fireman rule so if I managed to get inside and compromised the server where there's interesting information being held if the sysadmin and the network manager is really doing their job I would like to be they've got some kind of firewall rules in place that this machine because it's a server goes across sensitive data you can't just FTP it out anywhere or you can't send it over whatever protocol

you have to go in through a different system or be authenticated or something or even without the firewall rules particular means of transport then you put in PI P block lists if my malicious domain is causing enough noise on the internet and it's being seen open it now sooner or later it's going to get itself like this again if my network happened across systems administrators doing their job really like to think that they can deploy some form of IP block place to at least block connections with the worst offenders of malicious communication river so if I'm a bad guy and I'm thinking okay I want to get inside the sister I'm going to get the data back to me there's a

possibility I've also got a possibility that our very least I'm going to leave traces in the gateway logs but someone could go back and identify me I've brought my IP address which would spoil my fun can we think of a mechanism that I can actually use to steal data in such a way that it wouldn't be blocked by IP addresses and it wouldn't be blocked or the unlikely to be brought by firewalls and also that would be quite difficult for someone to trap in the logs so let's think about DNS so just about every machine is gonna do some form of DNS requests at some time it's very unlikely that we're going to war DNS messages on our firewall to

Schoology doing dns lookups it would mean that it wouldn't be able to do much of the work of any kind of machine so in a normal application the DNS requests we've got a machine which is asking for what is the IP addresses on goes the local DNS server it's either going to be cached or not it's not cached then that local DNS server will go to high level many serves DNS servers it's going to come back says well yeah why don't actually know WW an example is but I give you the name server for example calm you can ask them they will know the answer and they will come back with a reply let's have DNS words we use it all the

time it's going on everything in words how can we use this to steal data read very simply we can give subdomains so I'll compromise host we asked you to do its DNS lookup but instead of asking you know what is that the dress of WWE Bob malicious calm we closed our secret data as a subdomain who say I want to know the address of top secret data dot malicious comm guest while local DNS server local DNS servers I'm gonna clue go through the system we get to the name server of our malicious domain and that's just sent back anything but that request is now in the logs of are malicious name server so us with our

malicious that's all we can then go through the blog kick out these subdomains which are being asked and we can use that to reassemble the information that we're looking to external travel fiendish right you're always been a little a DNS lookups hardly likely can there be unlocking your my system we will server will top-secret information is talking to your local DNS or snot and you really going to have loads of my people or your local DNS server about one of the DNS servers he talks to probably not to be honest so we've got a viable mechanism for stealing information but there's still a couple more steps we've got we've got probably punctuation if this data that we're speaking is

anything other than like 7-bit ascii it's not going to fit inside our subdomain these days it's got punctuation it's not gonna work as well as space it's not gonna work so we also we've got a problem of maintaining case DNS lookups quite easily the case could be more awesome throughout the case and then the lower case we'd lose that information so we need some way being coding or top-secret data normally we're going to produce our work very frequently would use basics before probably base64 sometimes use a plus in the encoding which we can't have an a' dns lookup but we can use its cousin base 32 encoding which very simply changes everything that changes our top-secret data into this kind of

gobbledygook of encoding text it'll deal with punctuation it will deal with spaces it will deal with capitalization even if our base 32 encoded string gets changed into the lower case where we can easily change it back again into uppercase and we won't lose any information so for exfiltrating data absolutely wonderful so our DNS requests but we might see in our logs maybe something along the lines of we're going to be lots of www.samael.org we n s requests and there's also going to be this long string of basis to encoded data so already we can look at that and think if I wanted to discover people who were exfiltrating data using this mechanisms in our loss what am I gonna do sure my

last request audience participation how could I calculate I distinguish these these requests exactly we're gonna move that type of data how can we distinguish this type of

actually actually it's a very good point we could we could try doing that reverse lookup to see if it work which is a very good way anyone else if I sit the way of identifying it looking at the types of DNS requests and subdomains that we'd expect to be excellent size very very simply we'll just look for long subdomains easy right because if you want an excellent rate on that data base 32 occurring naturally makes it longer we'll just look for now soft mates in our DNS data right easy so one of the things about working at Cisco we've got enormous mountain telemetry to go looking in one of which is the Open DNS system where which is a managed DNS

system so lots and lots of people work with DNS requests through the system that we analyze these malicious domains breed let's go looking in this data wonderful with what 80 billion blue cups per day right if I'm looking for more sub mates this is far far more data than I can actually code with there's no way I'm going to wave through this by hand looking for which of these long sodomites if you do this exercise you will find that there are some people who really really like long sometimes if I'm all over the place and distinguishing those the ones which are delicious yes be honest we're not going to do this by hand we need to find another way of

doing this so what are we looking for whether there's something anomalous something which is unusual and unusual trace and in the data so to spot what is unusual what we need to do is identify what's normal so let's work with our data and model the distribution of subdomain lengths in DNS lookups so that's the curve in orange we see we've got the loads and loads of lookups of length 3 which is www and then that curves off way way way way way on to the right and as the subdomains get longer there to get less and less frequent and if we look at this data it looks very very much like an exponential decay it's an exponential

decay curve which is great because we understand exponential decay because we understand how to express them we didn't have a mathematical equation for this e to the minus lambda X where X is subdomain length so we can construct an ideal curve of what the data should look like and then we can just go through and compare the reality that we see with our ideal perfect fit curve to this and if we do this we find something very very interesting basically what we do is divide the number of occurrences of subdomain lengths that we expected divided by so the frequency of the observed subdomain length and we divided by the expected value and see what we

get so in a perfect world that should be about 1 it should be equal to what we expect www-where over the left-hand side we saw it original curve we've got this massive spike when we divide our expected value by what we see together well to know what it doesn't even break it's a you know it's a couple of times not by that much however as we come further out into the log the longer sub debaters we start seeing these enormous spikes so there's something happening here which is really really unusual with finding subdomains of this really long length that are occurring in much much higher frequencies than we'd expect according to our model of the days so instead of

looking at a hundred million DNS entries trying to identify things by hand really what we have to do is identify these lookups at this particular length these are our own orbiters this is where the bad stuff is let's have a look at the finder see what we find and I was actually really surprised and really quite happy to find very very quickly ActiveX filtration by now we're multi-grain is a point-of-sale malware and it export rates credit card details in exactly the same way that I've described text the credit card number and the pin place 32 encodes it puts it in a DNS lookup does the DNS lookup the bad guy is owning the name server DM

collecting the logs doing the reverse and finding the credit card absolutely fiendish way of doing it exactly the same way that I go about doing it here it is we see it happening in the wild I found this whole pattern of look up stuff that really I didn't quite understand what was happening when I could see is that they were very very closely related we've got this pan by the ll of three three identical letters that power lol Oh Lord and then we've got this dream and use m6p which was concerned and a whole load of random stuff but a number of dots in there to separate it out one of the domains so doj fgj comm was a known

domain used by multi-grain the other two domains among calm and beavers common world were previously know where we could see these were following exactly the same pattern and this was exactly what we were looking for exfiltration / DNS wonderful we blocked and further relations of the blades protect people using using Open DNS the malware itself it's cuts its information up into different chunks the first subdomain is a receive identifier and then the next bit in the middle to cut out the place someone actually works absolutely encryption is there take the credit card details encrypted it basically - encoded it and exported it over DNS once we've got those domains we can then go back to our

data and see what else is happening and if I further traces with these same malicious domains so there's two patterns look up which is exfiltrating the credit card data and then we've got these short little cuts that we see quite a few want the honest but marry me these are my guess is this is the malware just pinging out and saying Klein I'm alive but I've got nothing new to report will be my guess so so for spying encouraged on this obsessive looking a multigrain then went through further other pieces of suspiciously long lookups I identify from these anomalies came across his other domain 29 a dot de and again after much searching I actually find that someone who

identified that was another point of sale system that was exfiltrating credit card data's actually using a much simpler system it's just a xor key against the credit card details again used as a as a subdomain so again we were able to identify this see this within the data and just block the domain one that we make the world a better place people who protected so as I was writing this presentation something happened that hit the news the I thought yes this is exactly what I'm looking for see exactly what I'm interested in however it's far far more complicated open may be far far more interesting but there's no neat ending to it a few weeks ago

because firstly discovered the cell wrong apt Trojan so this is sophisticated piece of malware that they'd identified and wonderfully for me within the write up with five that has got a DNS exfiltration tool called dext brilliantly I know about DNS s operation I know about how to find it we can have a pad to the sale of a story also from the code that they published we've got these snippets of code found within the trojan blue base little to encryption brilliant we know about the base 32 encryption we know exactly what this looks like we've got the pad that we search for even better they've identified the domain that the bad guys were using bikes all calm

so although we don't know what many of these components actually do we can start making some good guesses there's something to do with Dean s exfiltration it's almost certainly going to be via and look up we also know that involves bass there too encryption brilliant we know that looks like and we also know the domain that they've been using so this is going to be really really easy right let's look in our data let's try identify something a bit more about this today it's actually owned by what looks like might have no reason to doubt is a legitimate reseller of domains so I was expecting to find the name server under the control of the

malicious entity or hell some sort of secret network somewhere no they're using a domain reseller okay why not kind of interesting this particular domain bikes allcom was on these had these two name servers domain name available there were 600 other domains that will also have share the same name servers so potentially they might just have chosen bikes for calm for this one attack perhaps picked one of the other domains that was being offered by this domain server for the rest of their attack also domain name over its name servers name right DNS calm and these or name servers shared the same IP space there always be very great close together so well okay that's just booked

and we'll do the same experiment that we do for the lengths of the DNS for these four servers are see what we find us look at the lengths of the subdomains that are being used interestingly we found no trace so suggesting perhaps the bad guys were actually had been very careful to cover their tracks so they were leaving traces in the telemetry that we had available to us because this peak of lengths of subdomains 10 characters long more of these 10 character lot we look at and we'd expect so I thought absolutely wonderful here we are end of the story they're doing their exfiltration by 10 character lengths subdomain lookups so let's look at some of these lookups so we've got

sort of a few types of woman's in there that we might expect for a DNS lookup we've got timesheets we've got wallpapers sales I mean lesbians exactly the type of subdomains that you'd expect need used on the Internet and then we've got this random stuff which obviously is very very bad indeed interestingly it's not going into that domain that I expected it to it's actually going to the name server of the name server of the domain malware kind of interesting also it's an intended characters law I would have expected something much much more longer so perhaps there's more going on here maybe their tools are cutting up his long strings into ten character lengths and repeatedly doing it to hide their

traces maybe not maybe something else I started looking in more detail at summer starts maybe thing book actually maybe this could be a denial-of-service attack and maybe we'll see some of the through Marseilles Johnny's requests random requests were main server to try and overwhelm it but when we look at the time distribution of it and trying to recreate that from the data that we have we're only sitting in a maximum five packets five requests per second which is much much less than I expect for a denial of service attack many times we're only seeing one or two also if this is a denial of service attack it's been going on for an awfully long time

far far longer than I expect the denial service attack and also the time between each one of these bursts of packets on average is 25 hours so this really really doesn't look to me as it is service attack other things that we had there's a bug within chrome which has occasionally brand and characters to look up so we thought yeah you know maybe this could be above somewhere some kind of artifact if so well we probably find it everywhere not only this one particular DNS infrastructure and when I went to try and find that the five largest posters of DNS systems on the internet including office 365 a load of others we've got a massive peak of three

but to be honest nothing that length of 10 there was a slight over-representation it appeared to be one debate that was making me test it that there's nothing nothing significant we store great oK we've got this one name so we know that it's been used to attack me knowing that it's got this over sampling of ten length strings we might know where else if I keep looking for other name servers I'm not going to find this pattern which means it's specific to that okay when I got to another domain reseller I found exactly the same cap so there's something going on here and if we look what these subdomains of length of 10 was yeah very very similar indeed again

the name server of this domain reseller the same 10 character length subdomains and also this subdomain may lose surveys what EU main stock net things book DB things got paid T which obviously had been subjected to the same kind of activity again we can ask ourselves is this a denial service attack though remarkably similar pattern to the last one 1 3 5 packets per second going on for a very very long time average length within between packets in this one hours it might be a DNS exfiltration it might not is almost certainly not a denial of service attack but it's kind of difficult to work out exactly what's going on here so I finally got this down to a couple

of possibilities this definitely isn't as simple as applied by having a malicious name server and sending the information direct to that the name service this case as far as I can tell are legitimate organizations that are selling domains so it's not that it looks like a bad guy sending requests over to a legitimate name so that's been compromised potentially a bad guy has actually compromised this and has got fish username or password why not got access to the DNS lost one of the domain name posters and maybe I allowed you to have some access to DNS information potentially there's no possibility of that API that the attacker might be never compromised the API some way to get access to almost I

have no idea another possibility is that our attacker is actually using the same term that same time it's very the wheel and that actually somehow sniffing the lookups that are going to the legitimate name server and just doing that intercept and holding that information and recreating it the other thing which I cannot discount which is incredibly incredibly frustrating this might just be sales testing tool and there's some kind of tool which has been used in the DNS hosting environment test name servers that have measured their response rates that I don't know about in fact what we're seeing that we're thinking might be exfiltration might just be an artifact of somebody's testing vision it's kind of frustrating

but they can't english between those but it makes quite a good example of how sometimes when you think you're looking for something you don't actually find it you find something that kind of looks like it but you need to keep your mind open to the possibility that what we'll see isn't necessarily what we would expect to see so to try and bring this up into some sort of conclusion if you're only going to remember one thing please remember that DNS lookups are a viable exfiltration mechanism you can use this as a mean to steal data and bad guys are actively doing this at the moment so if you're defending systems think about monitoring your DNS traffic

and look at the things which are unusual another thing to keep in mind if you are hunting for DNS exfiltration consider other options what you see might not be necessarily what it is the Google and to remember to a man with a hammer everything looks like a nail so if you're chasing DNS exfiltration lookups and find suspicious traces in your data then they will meet other explanations it's not necessarily in this book in terms of advice and what you can actually do about this I think maybe it's about asking yourself a few questions one the first place that you want to look up is your DNS loss where are they gonna be housed on your local DNS server

somewhere first question of asking how many do you have and then the next question is will widening so then most certainly more DNS servers is acting as DNS service hoping that will really need which makes monitoring calls that little bit once you've got those logs together I think the next thing is really to ask a question or system make the most DNS lookups last week and then to understand why why resistors to make you so many dinners doing that guess for us there might be invertebrate would all support it it might simply be could be because it's doing Traverse the event is the webserver is going to reverse DNS lookups for its models or something like

that but make sure you're aware of why is this isn't case and then as you build up those officers you can go down and start looking was anything changed from this week to the last week is there something different and again if you're seeing something different about asking questions the other thing I consider a managed DNS system consider doing things externally maybe using some sentences over DNS so that you've got team of researchers were doing this search for you rather than yourself definitely definitely as a good practice with whatever systems you are looking after I would encourage people to start thinking about how I actually attack this what would I do how would I break

into my system if I do this to detect whether someone's attacking my system is using the techniques that I use as an expert and then also once you've got data modeling try and come up with some sort of way to model the data so that you can identify normal from abnormal and when you find those things which are abnormal going investigating and often it's actually really really easy you will find the modeling stuff to exponentially plots as a function in Excel basically we use Excel trade graph look at what we can be measured maybe the lengths of things race and change those changes of things look what kind of graph it looks like get Excel to

model it and then you can divide your perfect line by your deserve values which are most different though 0 norm it is interesting just to conclude part of tallis talis is the threat intelligence and security research branch cisco and our own world we've got five different branches we've got threat intelligence who are actively they're looking for more attacks and information such as this within our telemetry the actual intelligence that they find they pass onto the detection research team who are the guys were turning that actual intelligence into detection logic which has been going out into various engines that we support in cisco security products also the software engineers who might those engines are also within tell us themselves that we

control all of the bits of detection we've also got a team who doing vulnerability research development zero days in various pieces of software if you've had an update to your Apple the system will be always work with the vendor in order to make sure these things get fixed before we talk about them very very much we think part of our our mission is to make the internet a better place one of the best ways that we can do that by looking for zero days getting them perhaps abilities we also have outreach of which either part our goal is to both go look in the threat environment and find the new techniques that are being used out there but also

to externalize that do talk such as this so that people understand the types of threats that are out there we also say we are hiring if you're interested come and have a chat the nature of the data that we have we've got 16 billion web requests in our telemetry that we can analyze we get 600 million emails comedian to a high cost system which we can analyze and we also have 3.4 million request queries we to go to our advanced malware protection system or you'll get all those copies of malware to analyze as well with that I just like to say please please please follow our research we've got our own website we've got a

blog book you can subscribe to you can follow our Twitter and school so found at Jason's chance represented to you and with that thank you very much [Applause]

yeah yes I found there's a yes is the short answer it wasn't particularly what I was looking for I went out specifically for pieces of malware excavating the data there's certainly there's a few recent software doing tunneling over over DNS we found traces opposed as well so I was I was satisfied that we they're all identified if you looked at it it wasn't something that was specific for but yes we we can on traces office actually it really just in terms of approval positives there was loss and loss or anomalous DNS entries that I could identified that I couldn't tie back to malware a lot of people seem to be losing DNS lookups to verify licenses of

software so I had a whole load of pieces of software that are doing licenses of us like knowing that in the VMS but I didn't know happened but I could I could see that there's a whole load other stuff that I just can't I to anything as far as I see usually probably isn't malicious but is using DNS some mechanism of the transfer transfer of data will do getting data from one system to another although it doesn't look suspicious but to be honest I have no idea what's going on so it's a matter of identifying the data the spot which is interesting and then filtering that down Department definitely identifies malicious that which is one where certainly isn't

malicious and there's still all over great stuff in the middle would you just continue monitoring I think it's probably easier if you're depending on a network because you could put in maybe through this and then see what sports work if you really wanted to all tight tight pack to find war machines and making of these requests in the telemetry that we have we don't have that but all about our mates and stuff so we can only observe without necessarily understanding what is making the request we're still in past honey spirit that's a very good exercise to go through any any other questions

you