← All talks

Automating Threat Hunting on the Dark Web

BSides Philly · 202034:04764 viewsPublished 2020-12Watch on YouTube ↗
Speakers
Tags
About this talk
Explores systematic threat hunting on the dark web, covering why organizations need this capability and how to perform it efficiently. Examines methodologies including tool-based automation with Scrapy, human intelligence gathering, and data analysis techniques; discusses operational security practices essential for researchers conducting dark web investigations.
Show original YouTube description
Threat Hunting on the Dark Web and other nitty-gritty things What's the hype with the dark web? Why are security researchers focusing more on the dark web? How to perform threat hunting on the dark web? If you are curious about the answers to these questions, then this talk is for you. Dark web hosts several sites where criminals buy, sell, and trade goods and services like drugs, weapons, exploits, etc. Hunting on the dark web can help identify, profile, and mitigate any organization risks if done timely and appropriately. This is why threat intelligence obtained from the dark web can be crucial for any organization. In this presentation, you will learn why threat hunting on the dark web is necessary, different methodologies to perform hunting, the process after hunting, and how hunted data is analyzed. The main focus of this talk will be automating the threat hunting on the dark web. You will also get to know what operational security (OpSec) is and why it is essential while performing hunting on the dark web and how you can employ it in your daily life.
Show transcript [en]

[Music]

hello everyone i would like to thank besides philly for having me here today i will be talking about automating threat hunting on the dark web and the things that surrounds it uh so a little about me my name is apoor singh gotham i'm a security researcher i started into threat until two years back currently i'm doing my master's in cyber security from georgia tech this summer i did i was a research intern at cuc berkeley doing research in threat intelligence towards dark website i pretty much play rainbow six is every day i'm a part of my college team i do stream it sometimes uh i love hiking i recently started into lock picking and i've been liking it so

far i do contribute to the security community i'm a senior here at cyberate i'm a ta at station x and i also contribute to local security meetup groups so what we will talk about today so we'll talk uh we'll start with an introduction to dark web uh what dark web is how you can access dark web uh then we will move towards what do you mean by threat hunting uh how do you hunt from the dark web uh what are you gaining if you are hunting on the dark web and what are you losing if you are not running on the docker then we will discuss about a few methods uh we will discuss some tools that you

can use to hunt from the docker and we will discuss about how you can utilize humans to get data from the dark web or get intelligence from the dark web then we will discuss one use case of how you can automate this uh whole tool based hunting architecture and after that we will discuss so by overall picture i mean uh the steps that you should follow uh right starting from your threat modeling tell your report generation and then we will discuss a little about operation security and why why it is necessary to follow operation security while hunting or while doing backup monitoring so starting with uh introduction dark web i'm sure you must have seen this image a lot of time

on the internet so basically there are three parts to the web uh surface web deep and the dark web surface of these are all the sites that are indexed by the search engines like google bing yahoo etc so these are sites that you can directly just search on the google and get it and access it deep so these are all the sites that are there on the search engines but they are not directly indexed uh so they must they they might be behind some kind of pay wall or some kind of login system so the exam examples can include your database your college database where you can like you put your login information to view results or view

your information uh another example can be your server so if you have a server on distribution or google and you are accessing it using an ip so it is not indexed by the uh search engines so these all things include these all things are part of dpap now coming to the dark web that we will talk that we will talk mostly about in this presentation so dark web is basically the part of the web where you need some kind of specialized software to access uh the sites that are on the docker bar majority of the sites are marketplaces and dump and forums so people talk about different kinds of stuff there and people sell different kinds of products

and we will talk about what uh products are being sold on the rockup in the coming in the coming slides uh so moving a little deep into the docker so there are basically different organizations that offer their dark web systems some of them are tor i2p zero net freeness etc and the majority or the famous one is store and we will we will be discussing about tor specifically in this presentation so basically uh how tour works is it's a decentralized system so your uh basically it's a three layer proxy system so your route or your traffic goes through at most three proxies to reach the destination and the your traffic is encrypted so that's why tor is popular famous because

it is i would not say it's impossible but it is very hard to figure out who is accessing what and from which location and as you can see from the image itself i'm i'm sure you must have seen different uh tor domains so that is basically alphanumeric characters and it ends with dot onion in case of tor and specifically for tor there are two types of domains so that is v2 domain and v3 domain and it depends on the cryptographic algorithm that is behind that and v2 domain is 16 character and v3 domain is 56 character so moving uh on talking about uh dark web in general so there are a lot of misconceptions about dark web

uh the first one is uh people think that it is really big uh like the darker part is really big and if you talk about if you compare it to the clear web or the surface web and we complete if we compare if you compare the availability of the website then uh then the websites on the clearweb or the surfacer they are available 24x7 that is 99.999 availability uh but the websites on the dark web they are not available all the time only a few websites are every available 24x7 but uh rest of them are sometimes available sometimes not reachable so it's not that big if we compare to the dark if you compare to the surface web

another thing is if someone doesn't know about document if you talk with them about the dark web they think that it's place only for cyber criminals and yes that is true cyber criminals do uh cyber crimes are there on the dark web they talk but it's place for good people also like activists whistleblowers journalists who like to be who like to access certain services without being pride on and that's why we have many websites like facebook and my times etc as uh like on the docker and they have their own doctor counterpart so that is so it is a place for good people also and the last thing is people think that it's illegal to access docker

but that's not the case uh it's legal to access the rock web uh yes your isp might block dark web routes or but you can access dark web using some kind of vpn or sox proxy yes it is illegal to access document if you like indulge in illicit activities but it is completely legal to accept daca if you are not indulging in those kinds of activities uh so for a researcher or as a researcher as a security researcher if you are researching on the dark web you come across several uh types of sites that you can focus on based on your organization's threat model or based on the organization's requirement and these are some of the types these

are some types of sites that are there on the document so starting from general forums or general markets to credit card forums that uh specifically talk about credit cards or we call dumb shops that where credit cards are being dumped or insider threat forums where inside a threat or people of that kind of people who are insiders of some companies they talk about amongst each other so there are different types of sites that you can focus on these are some of the sites on the dark web that i took from one wiki page but there are a lot more uh other forums and marketplaces uh now coming to what kind of things are sold on the darkroom

so as you can see from this list list you can easily uh purchase some or all of these things if you have that kind of money so you can purchase an ssn in one dollar or credit card in 20 so and if you have that kind of money you can purchase exploits or zero days because people are selling those things and actors are selling those things i'm sure you must have seen several news uh me yeah showing a hundred thousand or five hundred thousand zoom accounts being sold on the dark web or fb user profiles being sold on the record so these kind of news uh comes into light and the actors the first place the

actors sell uh these type of things are on the dark room and then from there it comes to the clear web so that's why researching on the document is important uh these are so this is uh these are the examples from different forums and this is the type of listing that is being done on different kinds of forums now coming to how you can hunt on the darker first of all what do you mean by hunting then uh how you can hunt from the darker and why you should hunt on the dark so starting with what do you mean by threatening so threat hunting is basically uh proactively searching for threats by proactively i mean

you search for threats before it even happens or you search for parts or clues of threats before the attack even happens uh by searching i mean you search into uh you look through logs or you go through logs or different iocs that's indicators of compromise including domains emails uh phone numbers etc and then uh in case of dark web you go through a lot of text-based data and so you there are a lot of text involved here that's why we utilize analytics like advanced analytics like machine learning and natural language processing or deep learning concepts to trim down the data according to our requirement and majority of the thing is on the racquet is unknown so you don't

actually know what you are looking for and that's why it's a hypothesis-based so you take one hypothesis you take one use case and then you go for it and then it goes on and that's why it's iterative in nature uh so why why hunting on the raku is important so like i told you uh before there are a lot of forums and marketplaces on the dark web uh these are the places where actors or criminals or you you can call it uh users also they learn about new attacks they learn about new ttps that's tactics techniques and procedures for different types of attacks uh they trade or sell their exploits they trade their tools and if if you do dark web research

correctly or if you do darker monitoring correctly you can identify these type of attacks you can identify tdps uh even you can identify the actors that are doing these kind of stuff and another thing to talk about is if you if you do it correctly and if you are if you are doing it like yeah if you are doing it correctly then you can identify certain threads or certain data breaches before hand like suppose if you are if you are an organization and someone is selling uh some database some data set of your organization on the docker and if you are doing uh docker research correctly then you get to know about these things beforehand before it gets

to surface web or before some news outlets or some news agency released that that they found this and you can reduce impacts like reputation or revenue loss or legal penalties that comes with it so that's why doing a research on the document is really important uh now i will discuss i will show you a few examples uh a few recent examples that you can see and know for yourself like why doing research on the document is important so this first example a russian bank someone is selling a rce for uh for an australian bank the second example is a baseband related site on the docker where uh actor is talking about some kind of vulnerability on

u.s different u.s hospitals and the last is from a russian forum where an actor is selling a rdp credential for a u.s hospital so if you are suppose for example if i take example of this particular use case if you are a hospital and if you are researching on the docker and you get to know about these things even though you it is not about your hospital but you can go and try doing vulnerability analysis of your hospital and see if you have some kind of rdp vulnerability or not so it helps in this case and again on the similar lines you can keep up with the latest trends of the attacks even though you don't

you're not being attacked but you already know if you are being attacked then you have your stocks and incident responders already prepared on how to deal with the attacks and you can get to know about new ttps you can identify insider threats and you can get to know about data breaches beforehand now coming to the methods of how you can hunt on the docker so first i will discuss about few tools that you can utilize to hunt on the dockup and then we will go on uh to see how we can utilize humans to get data or get intelligence from the docker so there are a lot of tools uh available online or there are a lot of yeah there are a lot

of tools that you can utilize these are some of the tools that i use and i've been using for two years and it's been working fine you can utilize any tools the main point should be getting data from the dark web without much effort so starting with first tool scripty script is basically a python web crawling framework that has multi-threading capability we will talk about the multi-threading part in detail uh second tour obviously if you want to access store then you need or tool installed onion scan is another great tool that you can utilize to basically see whether an onion site is up or not and the correlation between different union websites uh so before going to privoxy uh

for accessing tor you should always like i told you before your isp might have blocked or not so you should always access or through some vpn or through some socks proxy and there are a lot of tools that you can use to uh route your tour traffic through some socks proxy uh some of them are privoxy t-socks poli-po i've been using privacy you can utilize any tool uh but you should always use some tool to route your traffic through socks now coming to a database part so obviously you need some kind of database to store your data uh when after collecting it and the best one is elastic because of that kibana support and you can easily

search or analyze data in kibana and that's why i've been using elastic you can use any other database and then store it store it into elastic but you can directly store it into elastic so that's good point of scraping then so radius is basically an in-memory database that's that acts as a cache database and we'll talk about why we are using redis here uh going little deep into scripting because that's the main or important tool that you can use to get data from the dark web without much effort so i will i will go through each step of how scrippy works and before that let's assume the things like spider downloader pipeline scheduler engine or middleware these are all different

python programs uh now starting with spider you give your onion url to spider now spider sends the request to the engine and engine engine is basically a program that manages every other program in scrippy now engine gives it to the scheduler now schedule is a place where your multithreading comes into the play and you can tune the threading accordingly i think it's from 8 to 32 or 64. and it basically it gives each request to different threads and it manages that part now scheduler gives back to the engine engine gives it to the middleware middleware is the part where you have your login code and proxy code so by proxy code i mean you have your tor ip and privoxy ip or

your socks ip like you write it here by login code i mean is measure majority of the website on the dark web you require some kind of registration or login so you need to create an account first and then put login details and output cookies details based on different websites in this part another thing to note is there will be capture and majority of the sites on the dark web they use text based capture so it can be bypass uh you can either use some ocr techniques or you can use third-party websites like death by capture anti capture etc and you put that code in this part in middleware now middleware it sends the request to

downloader so downloader basically gets your html page and then it sends back to the spider now you have another function in spider program where you get the data or where you get the elements that you require from the html so obviously you don't need all the data from the full html page you require either some kind of table data or some kind of rows so you extract those uh html elements and then you store it into items so items basically in scrippy uh you can call items items are the item it's basically a data structure which stores all the information that you collect so all the information from the html elements it goes into items and then items it

goes into the item pipeline now here you can store either in any database are there sql nosql or you can dump directly to the uh file format any file format like json jl etc or you can do both so scripty has a functionality uh so that you can do simultaneously both the things dump your data into some kind of format or store into elasticsearch or some kind of database so basically this is how scrippy works and why it is really useful to use scrappy is creepy because you don't want to waste your time in like in getting the data and you you would want to focus uh analyzing the data rather than getting the data so that's why scrapy can be

useful in this use case now coming to the human element part so human human or human intelligence it's basically a process of gathering intelligence through interpersonal contact rather than tool based process or technical process by interpersonal contact i mean you go on the dark web and you directly talk to the actors or directly talk to different users and this is basically you do this basically because you want to know the intent of the actors why that while an actor is attacking some organization and also you can know about many things like what tools they are using what new attack techniques they are going to use which organization they are going to attack next if you have that kind of

relationship and you can think of this as a high-tech equivalent of an fbi agent going undercover and infiltrating a criminal organization so you are doing the same thing but online and uh it is uh risky it's not easy task not everyone can do this and if if done correctly you can identify things like again i told you before you can identify new attack ttps you can identify new attack vectors and uh second thing you can do is you can also do post attack investigation so by post attack investigation i mean if suppose your tool based method identified that someone is selling a data related to your organization then you can activate your human or human intelligence to

go and talk to the actor and see whether the data is real or fake because many actors they do selfie data uh moving on now we will discuss one use case of how you can automate your dark web thread hunting uh there are a lot more ways that you can automate i will be discussing one use case in this image itself you can change the way of how you get data so talking starting with the first thing so obviously if you want to get data from the rocket then you need onion links and you can write a script to get any links from different sources either from uh surface web or dark web and you need different socks proxies so

you can write script to get those now moving on to scripty setup so scripty setup includes all the steps that you need to follow uh to start your script basically so it includes going to different uh also yeah there are many there are some manual parts two script setup uh it includes oh so there are many manual parts to scrapey setup and it includes basically uh like you go to you go to different websites you register you register your account on different websites and again another thing to note is you go through the architecture html architecture because you are creating different scripty sites for different forums or different scripty scripts for different forums so you can go through that and uh

so basically you put all those things you put your tor ip you put your scrapy ip and then you start your script setup and then you start your scripting now crawler parser analyzer these are basically parts of scrippy so crawler is basically you get the html data parser you parse the html elements you get the data and then analyzer is basically analyze the data in script itself and then you put it into some kind of elastic elasticsearch or json data or a json format now you can put data into a database and then analyze it separately or you can analyze it in script itself so it's up to you i analyze it in script itself

so in this case i'm putting the data into a json format in script itself and then after that i'm doing an lp processing here so like i told you before there are a lot of textual or text based data and you can include you can use a different machine learning or nlp or deep learning concepts to go through the data and get only the data that you want based on your organization threat model so you can train your uh in this case you can train your an energy model before uh for if once you get enough amount of data you can train your np model and then you can put that model into place to get the data that you want and then

you can send it to elasticsearch now the point of using red is here is basically scripty it gets a lot of data in a lot in a less amount of time because of that multi-threading capability and yes escapee has a duplicate filter but uh it might if your script stops in between it might not work and that's why you need some kind of cache database or in-memory database to deal with duplicates and you basically store some kind of unique identifier in redis and escape it takes data or it takes what to scrape next from lettuce so this is basically one use case how you can utilize uh different tools uh to scrape data from the rocker

and you can yeah you can think of different use cases of how you can do this now moving to uh all the steps that you need to take to get data or starting so let's let's talk about uh uh threat handing a threat intelligence life cycle so this is basically all the steps that you take uh starting from your threat modeling till your report generation so uh starting with direction you basically identify different darkwater forums you uh acquire xs you create accounts and then with collection you start your script you start collecting data you start processing those html and html elements and then we talked about analysis you use nlp techniques to analyze the data

and then by dissemination it means you visualize or create reports you visualize those in different dashboards and then you get feedback from your managers so this is basically a crux of how you can utilize these steps to get data and then analyze it from the dark web uh going a little deep into that is starting with threat modeling like you must have heard a lot of threat modeling terms in this presentation and by threat modeling i mean is you define your critical assets and uh basically you if you figure out what parts what things you have in your organization that the attacker can attack or the attacker wants so it includes anything whether you have different products or

you have different data sets and you figure those things out and then you figure out what different on which different websites you want to target or which different websites you want to focus on while researching on the dark web this is basically uh so uh for example you can utilize uh parameter pain to prioritize your uh targets like if you want to target only on uh iocs like ipaddress hash values domain names then you can target those kind of websites or if you want to target ttps then you can target those kind of websites or forums where actors talk about different ttps uh moving on for data collection and processing you should collect data from

both surfacer and the darker i already talked about the rock web let's talk about surfaces so uh many actors they do talk about different uh attacks on the surface web also so sites like baseband twitter reddit and nowadays different actors are talking over telegram so you could like you can utilize these sites also to get data from the doc get data from those places and then you can combine both the data from services and the document to get more intelligence out of it uh moving on to data analysis so again you can utilize many nlp mldl techniques to analyze those data you can do social network analysis classification clustering uh so by social network analysis i mean

you can analyze different users or actors from different forums if you have that requirement or you can do classification of different uh forums you can do clustering or different kinds of products that people sell on the dark web uh i will talk a little about mater attack here so my what my attack basically is it's a knowledge base of adversary ttps uh that is taken from real world uh what you call observations till date and if you are getting that rich amount of data from the dark web uh then you can map those uh ttps to the miter attack and it generates really good report uh now uh the last part is uh operational security or opsec so let's

first talk about what you mean by offset so opsec is basically a practice of hiding yourself online or this is disassociating your online self from your real self so it basically means you hide information like your name or your organization's name or any such information that can correlate back to you and your organization if you are researching on the dark web and not just the dark web if you are researching anywhere on surface or clear web you don't want your organization's name or your name to come out and you want to hide those things so that's why you follow different steps uh to like do those things uh so that you don't really so that you

don't basically compromise you compromise your operation and or compromise your research and that is why it is hard thing to do uh like you have to always think uh of those steps while doing this kind of research and i will discuss few steps of how you can maintain this opsec and there are a lot more other steps also so starting with first thing uh you have to obviously you are using some kind of system vm or lab or cloud system to access data or to start your tool-based docker thing dark web monitoring or also doing like manual monitoring so you should never store personal data or data related to your organization on that system uh another thing to notice as i already

talked about you should always use store over some kind of socks proxy or preferably over vpn to get that encryption i added encryption now as i told uh this is a heart equivalent of an fbi agent going undercover so he has some kind of persona he has some kind of backstory so you should have uh that person you should have different personas and different backstories for different websites that you are accessing or different forums or marketplace that you are accessing on the dark web and if if you are doing this for a long time then the personas they stay in your mind so you should always take extensive notes so that you don't mix the personas

because the last thing you would want is to mix personas and blow your cover and uh another thing to note is you should always change time zones so by change time zones i mean suppose you if you are in u.s and you are accessing russian forum then it should change the time zone to russia just for that added layer of protection or added layer of peace of mind what you call and on the similar line there are many forums in different languages like russian forum german forum arabic forum so you should all you should if you are doing this kind of research then you need to learn those slang skills or language skills to access those for or to understand

those things so that was it i we talked a little about dark web what do you mean by dark web we talked about different forums and marketplaces we talked about what do you mean by threatening why you should hunt on the dark web and what are you gaining or losing if you are not hunting on the dark web and if you are hunting on the dark room uh we discuss about scrapy and why and how you can utilize creepy to get data from the darker and why it is important we discuss about human or human intelligence and how you can utilize human to bolster your tool-based hunting and we discuss one use case of how you

can automate this dark web hunting we discuss we also discuss threatened this life cycle and how uh everything maps into the life cycle steps and we discussed about operational security and how you can how you should why you should use optic while doing rocker monitoring or research uh if you liked at least if you liked one thing from this stock then i would uh suggest you to go back and follow these steps like first figure out your assets uh figure out the your organization's requirements then create the hunting pipeline like i talked about start your scrippy start getting data into last six hours or anything any other database and put into elasticsearch and then search for your company's name

or company's products in kibana if you found like if you find anything then you should analyze those results analyze uh where in which forum your data is being talked about or your uh company's name or company's products are being talked about and what their what the actors are talking about and you should always or you should do this on a monthly basis and then report it report that in or create reports and then report it to your team and you will see good results if you are doing this correctly uh yes dark up threat hunting is hard but it's worth the effort you don't get intelligence uh anywhere else that you get from the docker

you should always keep operational security in mind again as i told you before you should look at more than one resource you you should look at a resource from the surface of as well as the dark web because uh on combining those resources you get more intelligence out of it and yes these all things takes a lot of effort and team a lot of resources and you cannot do this alone you should always have some kind of team doing this and then uh if you have that right or a rich amount of the data then you can realize might attack to basically get that good report out of it uh these are some of the resources that you can follow

to if you want to start into dark web research or dark web hunting companies like recorded future insights crowdstrike digital shadows they release their whitepapers and blogs regularly so you can go there and read through it if you want to know more about this stuff with that i hope you all like my presentation if you have if you want to talk about this stuff more you can hit me up on twitter or linkedin and yeah thank you

you