
hello hello firstly thank you everyone for coming to this presentation this year's schedule is packed with amazing talks so I appreciate you giving me your time and now that you are here let's talk about what's inside the open directories of 96 different threat actors This research begun over a year ago when I was bored and casually browsing URL scan as a bit of a preamble for those who don't no URL scan claims to be a Sandbox for the web effectively you can submit your URLs to the site and it will visit them on your behalf and then give you enough information about the site for you to determine if it's malicious or not it's a Sandbox for
URLs however this is by far not the most important thing about this tool as a byproduct of collecting and analyzing so many URLs URL scan becomes an incredible tool for investigations as the index it has becomes cross between Google and the Wayback machine but tailored for th researchers like I said one day I was browsing URL scan and came across a URL that ended in slid RSA a private SSH key and it was being hosted publicly ready to be downloaded this obviously caught my attention and I kept poing around to see what else URL scan had found out about the host I realized it it wasn't just a private sh key on the internet it was actually a private sh
key on the internet in a SSH folder in somebody's home directory URL scan hadn't indexed just file but the entire home directory of a user I was shocked you could see everything a normal user would have on their system you know their bash history their notes files their Python scripts their payload um their SQL map logs showing that they're attacking governments and their Cobalt strike logs um what at this point I've started to realize this isn't just a poor sod who's published their home directory but a criminal who's simply happy enough to publish their tools Loot and everything else now what well it's happened once are there more of these I do more searching and I realize there are
there's lots of them too and they're in the past so I found them so so I found them the open directory has gone down and the data is no longer accessible this is going to be a common theme but now I know they exist I can only guess more are going to appear in the future and I'm not going to miss them this time I'm going to be waiting ready for the second their index I begin working on a script to automatically search for them validate them notify me and store the data the script Begins by grabbing the 100 most recent open directories indexed on URL scan from here we go through from here we go through each host and check
if it's already been seen by the script if it's new we can then check if the open directory is suspicious we do this by getting all the file pass and checking if any of them match a list of predefined rejects this list changed over time to crease both the number of false positives and false negatives but generally includes things like common pen testing tool names folder paths and specific log paths if it's deemed not suspicious the URL is sent to my server for manual review this allowed me to check it manually false negatives and allowed me to adjust the Rex filter to gain more accuracy in the future if it is deemed suspicious we will get the files and carry on with the
rest of the process there were some restrictions put in place during the downloading process for example there was a maximum Timeout on downloading the data it changed throughout the research but it was always in the multiple of hours also file types like ISO deb. temp Etc were ignored as they tended to give no insight and were incredibly large in file size this was needed as a common issue was important data being lost due to the directory's closing whilst the downloading process was happening amongst other issues once we have the data on disk we create a summary text file containing this time stamp of creation the total file size of the contents of the directory and then a combination of the
names of all of the files and their corresponding hashes the summary is sent to my server to both notify me and to let me know I need to start investing data investigating the data manually the summary file along with the bash history is sent to a GitHub repo in a folder named after the host name the original data is searched for Cobalt strike logs which are sent to a separate GitHub repo and then finally compressed into a tar.bz2 before storing elsewhere and finally the host name along with the URL scan ID is added to the database before the whole cycle starts again and continues indefinitely fantastic the script is ran and we start guessing results over the
next 6 months or so I would wake up every day with a freshly brewed coffee and go through the data found overnight after a lot of trimming down the data set to a point I'm happy to work with we are left with 96 directories of source code malware log files config files payloads and everything else Under the Sun this is a lot of data and analyzing this is a challenge in and of itself so I think it's best to look into some Trends and then dive into individual cases to show of
findings looking at the tooling used by the threat xes was incredibly interesting it told us two things that makes a lot of sense firstly tried and tested open source pen testing tools are also used extensively by criminals sadly the line in the read me that says please only use this for ational purposes didn't quite stop them secondly we saw copious amounts of mass scanning for lwh hanging fruits and known vulnerabilities turning access into profit is actually surprisingly difficult thing for a criminal to do and when hacking is your profession and your goal is profit by any means low sophistication attacks you can do on mass is often the path of least resistance for reconnaissance we saw a
host of Frameworks used X-ray ARS key Recon for the wind these are all neat off-the-shelf Frameworks that you can grab from GitHub and quickly start pilling in data lowering the barrier to entry in terms of technical ability despite the all-in-one Solutions being available most attackers prefer to use their own custom workflows with specific tools they have preferences for we had Mass scan nmap F scan R scan for Port scanning we had a mass F SubFinder for sublation and things like stroden of aquatone and text G scanners for ENT most of these tools are household names in pen testing and there's nothing crazy out in the ordinary so far but it was interesting to see these classic
tools use in a different perspective for example Mas gan's tagline is it can scan the entire internet in under 5 minutes multiple threat actors took this literally here's an example of a TA using mascan to classify FPS by their text stack and in the rest the home directory we can see exploit scripts for known cbes relating to those text STS again another thre actor used fcan a Chinese alternative to M scan available on GitHub to do large scans of the internet before sorting them latly into different files depending on their text stack which were then used for targeted vulnerability scanning all of these vulnerability discovered in this instance are critical and allowed for rce security teams contacted where
possible another example of something mundane normally but interesting in this context is how the subdomain and O tool a mass stores its results in an SQL light database multiple threat actors have this and it allowed us to look at exactly who they were targeting in a nice structured way to the target audience from Mass this is an incredibly convenient and great feature of the tool to an attacker who's published this it just compounds the obset failure and lastly before I move on to initial access there are a few examples of custom tooling 1A in particular showed real engineering capabilities programming from scratch an entire framework that allows them to control a fleet of AWS Bots taking commands in concurrency
storing results in a persistent database all with its own API that issues instructions and monitors the fleet despite looking for the the code on public repositories there was no trace of Open Source software used this was entirely proprietary the controller was using up to 31 AWS instances at once and with compute power like that it makes sense that this wasn't a SPO time criminal we could see in the rest of the directory the TA had been successfully scanning millions of assets associated with entire countries and industries that is a quick look at Recon now we can move on to initial access initial access and ask ourselves the question how do professional black hats pop systems I only really saw three Trends
emerge which actually makes sense when the data is put into perspective firstly we've introduced the concept already but we saw a lot of mass scanning for known vulnerabilities specifically with nuclei just as a recap nuclei is the modern vulnerability scanning engine authored by a project Discovery it's incredibly well-built piece of Open Source kit and that allows users to write yaml templates to detect vulnerabilities due to how easy it is to use its thorness of detection and the quality of the tool in general people over time have generally ported every obscure cve exploit code into a nuclei template resulting in now being over 10,000 official templates making nuclei very effective at picking up bugs the takeaway here is it's a good
tool this tool a method for finding access was sadly by far the most common it's easy it was often automated and built into pipelines and it relies on casting the biggest net possible catching any and all victims regardless of who they are that a Das o tag for nuclei saves results in a file meaning we can see where the results are stored and look at the total damage in the data set there were 12 and a half thousand critical or high severity targets likely to be vulnerable if you hack enough targets eventually some of them will be higher value 47 of them being government sites and 179 of them being educational institutions with a edld security teams contacted where
possible the second Trend in initial access is for hackers who weren't just happy hacking any old machine after doing recon on a specific Target they would go out of their way to manually find outdated Tex stack components of that Target before pulling exploit code from GitHub and waiting for shells it's nothing new if anything it's a less efficient version of the prior method but the fact that the threat actor chose to Target someone specific is interesting and later we'll see how that can give us some further insights into the type of threat actor and the final insight about initial access is sometimes you don't need access well consistent access you don't need code execution to make
money there are many darket forums and marketplaces where you can simply sell and Le sell stolen and leaked databases whether that's selling one off high-profile data sets that belong to Fortune 500 companies governments Etc or more consistent streams of less valuable data sets sold under a subscription model for purposes like credential stuffing gallery and SQL map we used in a number of cases often with success the great thing about SQL map is by default the tool creates logs which means we get to see who they were targeting who they successfully popped and even the very commands they ran every entry every entry you see here with an output folder means they at least tried to pop the target every
target. txt file shows us the exact commands they ran the exact endpoint they targeted the exact parameter they chose to inject and lastly every entry with a dump folder means like successfully stole data security teams contacted where possible at some points during the research I saw pipelines that would automate SQL map first using a crawler and then starting a new scan SQL map scan in a separate screen for every parameter but if you ever use a tool you know how slow it can be so we mainly saw threat actors manually browsing sites with burp in the background and then sending requests with any sort of ID parameter to ES map which had a significantly higher success
rate the log files were nice but not always necessary as we had as we always had bash history to fall back on I'm not going to talk about priv for too long it's difficult to get any interesting insights due to the type of data and also feeds nicely into the next section so I'll save it for that but you could see evidence of lpes wipes and l x suggester used by a few different Tas whether that's in the bash history files on the system or even the M log files speaking of c2s when it came to persistence I'm afraid to say there was one leading culprit to no one's surprise cracked versions of cobalt strike littered the
data 15 different directories had full Cobalt strike clocks for all the sea and looking in the batch histories we could see 252 instances of logging into their team server with all the passwords typed into the terminal it's good to see obser was a top priority for these threat actors with passwords like admin 1 2 3 4 56 and abc23 despite the prominence of cobalt strike you could see every open source C2 Under the Sun being used Havoc silver Empire Mythic Metate I could go on forever they're all free easily accessible with great documentation well built and you don't have to risk running a backo version of cobalt strike that was a quick look at some of the
patterns in Recon initial access PR persistance techniques used in a broader sense but now let's look at some individual groups which stood out from the
[Music] crowd one of the first ever directores the script pulled in was actually during the testing phase of the tool and th it wasn't even saved to dis but I thought I thought this entry was so interesting I couldn't just ignore it and downloaded it manually this particular threat actor was hinted at earlier whilst we were talking about SQL injections and is responsible for leaking sensitive data from the Filipino government as well as travel and gambling sectors across the ape region two weeks after I discovered the open directory so did the threat intelligence company group IB who have since written a Blog on just the contents of this uh open directory and have dubbed the threat dubbed the threat
actor gamble Falls their analysis was good so I won't reinvent the wheel but I will quickly touch on the highlights and then the stuff they accidentally best as mentioned earlier this ta was able to perform an sqli on an outdated PHP page belonging to a department from the Filipino government the TA managed to steal the entire database including a table containing seven admin usernames and md5 passwords and another table containing the plain Tex contents of thousands of emails the other interesting thing that sets this ta apart from the others is the use of Mal Co Cobalt strike profiles I found two distinct profiles in the directory and they both showed various C2 domains used by the
TA for the next specific directory it highlights oget I have about the way I conducted This research being that there are only certain observations and Analysis that could be done at the time for example pivoting off known data to try to tie identities to breaches this is a screenshot I took when I first found the directory over a year ago it shows this ta poing various websites with SQL map and then at the time you could search for this data of dark forums and you would find just one person advertising the data within days of stealing it this was not an isolated instance at the time I checked quite a a few breaches and it was fairly common to see
leaks by smaller opportunistic teas with only a few posts on the Forum trying to gain credits and reputation the reason this needed to be done at the time and I can't go back and do it in hindsight is because within a week or so of that data being posted it would be reposted not only to that Forum but to many other data sharing marketplaces by different users who would pretend to have hacked the data themselves for credits and rep this obviously makes impossible to tie a hack to a user now the next directory we're going to poke at is in fact not lock bit I remember finding this directory seeing the file lock bit. zip opening the zip
up to see the source code for lockit Builder having a slight panic attack and thinking to myself well I had a good F you know luckily for me lock bit black had its source code leaked the year before by a discountable ransomware developer and this could be anywh this ta was l a curious character as well as using the box for hacking they also used it for their own personal learning on the box we saw them following along with an online tutorial about how to build applications with GPT with a local source code containing two of their own a open AI GPT Keys sadly at the time of data capture we didn't actually see them use GPT for
anything malicious which I was personally really curious to see how it would have been implemented this was one of the more advanced as I saw not necessarily in how successful they were but more in the sense that everything they did showed a lot of thought went behind their decisions and they weren't just using tools off the shelf and going for a low hanging fruit they also had the most interesting piece of malware in the whole data set in my opinion it hadn't been seen before on VT and I couldn't find it anywhere else online it started as a Powershell script decoding and running a b 64 blog which gave you this another Powershell script which decodes
a Bas 64 blog giving us a G file which immediately decompresses and executes that Gip file contained the final Powershell script which runs EXO overcoded B 64 blob revealing the file Shell Code Ram the whole process is done dynamically loaded into memory never touching disc and was just generally really good fun to reverse the last thing that stuck out to me about this threat actor in particular is a frequent and detailed use of cobalt strike logs which show the TA of loading lock back executables to the victim as well as having the default Cobalt strike language output be in Mandarin hinting where this ta could be from in the Cobalt strike directory there was also a
text file containing 10,000 South Korean sites which is like which is likely a Target list last but definitely not least um the final actor in our play looks a lot like a threat group in China paraphrasing from recorded future red fox drw is believed to be a Chinese state sponsored threat group they've been operating since at least 2014 and predominantly Target governments defense and telecoms specifically in Central Asia India Pakistan and Afghanistan they use both bespoke and publicly available malware including plug X and are motivated by cyber Espionage one researcher on Twitter claims this AP has been using the infrastructure seen in the tweet and one of those IPS looked a little too familiar the first thing I did was run
it through VT and other third party sources all of them marking it as malicious that's great and not only that but some of them classified that specifically as a plug xc2 a known TTP of the group in the directory we find a folder called Dropbox inside inside it is a legitimate executable a malicious module and a malicious payload these are the three classic elements of plug X malware plugging the hash of the executable and dll into VT showed us that these two components have been used before in Prior plug X malware campaigns however the payload itself has not previously been seen who is the target the TA targeting though looking through the bash history we can see them running targeting 29
different IPS with the various CP exploit scripts running those Target IPS through IP info we can get a clue as to who they are specifically when we look at the results we get 19 kazakhstani telecoms five kazakstan isps one Indian ISP one Indian telicon one taiese infrastructure site one Afghan ISP and one more kazakhstani site for good luck so almost all the targets are from Central Asia India and Afghanistan and all in Telecom and infrastructure sectors sounds a little familiar inside the open directory we also saw the TA using FRP FP is a really nice lightweight reverse proxy that allows you to access a local machine from the internet even if it's behind a firew on net kindly the thre act ran the
server component ofp and then piped all of the logs into a noop file meaning we can see who accessed this box in the legitimate intended way in the log file there were 27 different IPS further investigation into these IPS showed us the users of this threat actor machine were mostly from Shanghai the coordinates drawn from the IPS point to specific public networks generic office building and by far the most interesting official government agency
buildings this is where the talk begins to wind down and I let you carry on with your day I'm not going to be taking questions at the end of this but if you do have questions please come and find me I'll be around for the rest of the con and I'm not too difficult to spot the original purpose of this of this talk was for me to give out the data so more people equipped could do do some analysis I'm a student in my final year of University and I've given it a good shot but I'm not exactly the best suited for this task originally my plan was just to have a link on the slide to
download the data and a few friends pointed out that would be breaking the CMA in 101 ways so if there's anything specific that's caught your attention and you want for research purposes reach out to me and I'll see if I can get you a sanitized version of the data this talk was a really really really brief overview of the data and there's so much more I couldn't show off or analyze High just 3 months ago census the wonderful internet scanning company of presource released a feature and a subsequent blog that allows you to just search by threat actor owned open directories which I would have loved a year ago and would have made this so
much easier um so please go check that out I'm super excited to see what research that breakthrough leads to and lastly thank you all so much for listening I hope you enjoy the rest of your conference