← All talks

An investigation into the state of web-based Crypto mining - Robert Len

BSides Cape Town37:5332 viewsPublished 2023-09Watch on YouTube ↗
About this talk
BSIDES Cape Town 2022 Conference Track 2 An investigation into the state of web-based Crypto mining - Robert Len
Show transcript [en]

all right hello everybody thank you for joining me for the pre-t shift I'm going to do my very best to keep you all awake but I offer absolutely no guarantees so before we kick off the perfunctory who am I my name is Robert Lynn I work for Mobius binary I live in Cape Town I like random facts retro video games and beating the system and I know I've got a cue from my director just to mention Mobius binary is hiring if anybody is interested I am actually a returning speaker I spoke at a B size in 2016 on a wildly unrelated topic um so yeah coming back here it is still absolutely as nerve-wracking being up

here and I'm still filmed filled with an overwhelming sense of imposter syndrome so here we go but from 2016 until now I still have my beard so in a never-changing world that has remained constant so this feels like a sort of dating website you know where I'm telling you all about myself like I like um video games and long walks on the beach and pina coladas and all of that so once we're talking about confessions you've heard some of my personal confessions confessions and caveats let's talk about some regarding this presentation so first up the data in this presentation was part of academic research so please don't hold that against me if anything put to me for it

the presentation is not going to be all the research thankfully for you and for me there will be Snippets of it although as the heading is got to do with crypto I am not a crypto Enthusiast actually and the idea for this talk like all good or and bad ideas came out over a couple of beers is this possible can we get an idea about the world of crypto jacking it's going to sound a little academic-y because it is but I'm going to do my best to not make it sound that much and while I will share the results it's really more about the process and and with that process hopefully I can help somebody else with a similar

project or a similar idea or something like that so the state unfortunately is not the current state as of the 22 as of 2022 it is a little bit older and we can thank covert to that because I would have really liked to present this back at the time that I actually did the research but it is entirely repeatable so if anybody else wants to please go ahead and it's not an in-depth talk about crypto not at all if anything it's an inch deep than a mile wide and not the other way around so when we're talking about the prevalence of crypto Mining and crypto jacking what exactly are we talking about and in the context of this talk

it is a monetization process in which the visitor of a website Computer computational Resources are used to mine cryptocurrencies for the owner of the website or somebody else so it is an attack can be an attack but it can also be legitimate and so therefore a process so with anything like this there's always a couple of ethical questions and first up how did this crypto mining stuff get onto the website so there's a couple ways it could have got there that somebody could have breached the site and embedded it in the code the webmaster himself could have put it there without any consent or the webmaster could have put it there and asked for consent do you mind me mining

crypto while you visit my site so consent is a tricky one especially in this regard because it's really difficult to ascertain whether or not the end user even knows what they're consenting to in the case of it being one of those sites where the webmaster openly puts it there and tells you that he's doing it and asks you to consent so a couple of Studies have been done around it bleeping computer did one where they revealed an enormous percentage of users actually didn't mind their resources being used to mine a bit of cryptocurrency in the background as long as those pesky ads aren't being displayed but once again do they really know what they're consenting to

The Pirate Bay torrent search engine which I'm sure nobody here has ever used they were caught out using some crypto mining stuff quite a while back without even amending their privacy policy and after that was released they put a little cheeky caveat on the site saying well do you want ads to display or can we do mind a little CPU Cycles being used for you know for us to mine so the ethical questions are there if anybody really wants to talk about them we can chat about them a little later but back to the actual talk what are we talking about and so it really is how prevalent browser-based crypto mining is on the internet

which mining variants are preferred for crypto jacking which internet service providers which countries they're in which currencies are used and most importantly the process as to how I got to all of the above and of course it fit into the b-sides 2022 Mantra it's all done from home so the result which hopefully is a data set containing varied and hopefully accurate and relevant crypto Mining and crypto jacking data a repeatable process that anybody else can facilitate and continue enriching such a data set and a data set that can be used for a visual representation

so some of the technologies that were used and when I was making the slide I was like damn I'm sure I use so many more but actually pretty simple pretty pretty basic stuff and we've got bash we've got python we've got python pandas library multigo Jupiter notebooks neo4j virus total and the last icon is the max mined geolocation database used in math IP addresses to your locations

so when we talk about they in the context of this talk what are they talking about what are what are they that we're talking about they really is browser-based crypto mining scripts and we can go really deep talking about these crypto mining scripts and and there's a lot of depth to it that's not the intention of this talk it really in a simplified version It's mining code embedded into a website source code that makes it makes use of the visited CPU during that visit so now that we know what we're looking for kind of I mean the embedded mining code in websites how do we find all these different variants and all these different crypto mining scripts how do we know

what we're even looking for and the last thing I wanted to do was go on a large study and try and find every single one of these scripts out there I think it would take me forever so luckily I came across a really really useful list called the the no coin ad block list if anybody's heard of it very useful and it's commonly used in a lot of these plugins extensions browser plugins even for phone devices and what these do these plugins you install them and when you visit a site that has any of the embedded crypto mining scripts it will pop up and say hey this site has a script it's going to mine cryptocurrency

off your browser while you've visited and these these extensions are powered by this wonderful little text file known as the nocoin list the bottom of the site you can just have a look at it and for reference sake that's what it looks like I know that's that's really difficult to see but it is a text file just with lists and lists of different mining variants and yeah these text files plug into the into the extensions and they warn you if you're hitting a site to them so fantastic now I've got a list of these of these variants these mining variants I know what I'm looking for but here I get hit with the real big

problem that was the easy part how on Earth am I going to find them in the world of the internet and and this part started really bothering me I thought oh god what have I done my options and I started searching around and checking out for some vendors very sign and PIR they will give you DNS Zone files for DOT com.net and Dot org domains you do have to fill out a couple of forms to prove that it's for academic research so that was the option that I was going to go with then I'm thinking okay I've got to now crawl all these websites download all the HTML I mean the landing pages at a minimum

and then match all the source code with my entries in the nocoin list and I mean that that that's horrible that's horrible so thinking about this oh my god what have I done I've got I've got a crawlia a total of 128 million domains um I mean the bandwidth required to do that I couldn't even start figuring out the storage space the actual computing power to sift through all of this and at the time that I did the research that no coin list had 587 variants that I was looking for so now looking for 587 variants in 128 million Pages source code how am I going to do all of this how am I going to pay for all of this

and most importantly I'm just freaking lazy man I don't really want to do this so poking around I came across an amazing resource I don't know how many of you have used it but yeah really saved my life called public www.com and that's me in the middle there thinking I'm going to be stuck with manually crawling the site and then I see public WWE coming along and I'm happy and most importantly this was supervisor approved from a research perspective so I was given the go ahead to use it which which really made my day so having a look at the site itself just to give some context it has the source code index for 504 million

websites which I mean even kills what I would have been able to do in terms of 128 million and it's got some other really interesting things that you can you can use it for you can search for specific Technologies in source code sites with similar analytics ID names WordPress themes references to code or comments even sites that mention your name and no I do not work for them I have no affiliate deals none of that just really really useful so before we get too stuck in and too involved in the whole process just a quick note on coin Hive because it comes up quite a lot in coin hive really was the crypto Miner crypto

jacket in this case is best friend because they provided these amazing services for you to embed your mining scripts into sites you own or sites you don't and as people visit and the resources get used you get paid and literally their slogan was monetize your business with your user CPU power so there was there was no hiding it and also before we get too stuck in Monero and coin Hive just like perfect marriage coin I was built for Monero which is really the most useful or at least one of the most useful currencies for browser-based mining being Asic resistant and enabling CPUs it was perfect for it so we could once again we can go really deep on this

topic not what I'd like to do at least at least not right now but yeah so the two of them perfect perfect match so the way coin I would work is you'd sign up get a nice token include that in the API calls when a user comes to your site that script is loaded connects to coin hive authorizes the user's token to receive input for hashing once a valid hash is found it's committed to the coin I've pull and then eventually they pay you 70 of that reward and pocket 30 sort of like the poker table at a casino I also had a couple other interesting Services uh capture service which was quickly shut out and the short link

service all of these with delays on pages so that mining can take place in the background but back to the actual research so kicking off I've got I've got this access now I've paid for this access to public ww API access which is about fifty dollars and I've got my list of from nocoin of the variants that's literally how I'm starting off I've got nothing but a list of no coin variants and this access and by just running my for Loop of every single my folder basically simply just takes every single entry in the nocoin list queries it and what I get back is a CSV file of each of those domains sorry each of those variants so

in theory they could have been 587 that's how many entries there were in the nokoi list and this is an example of one so it gives you all the sites that have that source code in it and it gives you a little rank as well after the comma like an Alexa rank so this is where I start plugging everything into Jupiter and python pandas and from that initial kickoff I found twenty seven thousand nine hundred eighty one instances of crypto mining scripts from that list and of them there were two twenty five thousand two hundred and four unique sites hosting these Scripts of that 587 files entries in the file 305 unique scripts were noted so 285 I

didn't even come across at all and the top 10 which I refer to quite a lot the top tens here have really accounted for 76 so a small amount of them really being the bulk of it and if I compare the problems I had when I was worrying about how I'm going to do this and what am I going to use and crawling and downloading I mean this took 20 hours over five days and resulted 10 Meg of raw data so it was an absolute pleasure so taking a look at some of these sites that had crypto mining I thought let's match them up and see if there's any decent Alexa rankings any any popular

sites that we might know that have crypto mining scripts in them that we didn't even realize not so much but the Wiz products and the wizmarketing.com bizarrely enough were pretty high ranked on Alexa how many top thousand is is very impressive and and they were hosting some some crypto mining scripts there I've noticed I had a quick look through these again it's been a couple it's been a while and farm easy.in number seven is still still running and still doing its thing so the most prevalent miners no surprise we did speak about was was coin Hive and auth mine and the two of them together make up 38 of the entire population of the miners and

fourth mine is run by coin Hive as well at least was and that's the consensual version so if you want to run the site and you want users to consent to having their resources used for mining you can sign up for auth mind authmind interestingly neither of them exist anymore and in fact as I was finishing up this research they didn't exist either the head goes down so 38 of the market when you no longer exist is pretty good going um previous research showed them having about 75 to 80 of the market tied down so they were definitely the the ones to go to if you were looking for your your end-to-end crypto mining solution browser-based

so taking the 25 000 unique URLs that I had I thought now it's time to have a look at the IP addresses see what I can see what I can dig out of that so DNS resolution was my my obvious choice and mass DNS is what I use for that of that list of unique URLs about 15 were unresolvable totally um over three thousand so left me with 21 984 IP address is valid and that I can now have a look at

so I plugged those into the the max mind the geolight database and I basically did a simple count of the number of occurrences of each of those Geographic locations and with that I was able to find servers the number of servers 91 different countries hosting crypto mining servers South Africa had 106 IP addresses hosting crypto mining variants which rank 25th out of the 91 and just 12 of the countries were noted as just having one IP address

so the actual geolocation itself having a look through the count and the top 10 the USA Top in that list with 42 percent followed by interestingly Iran Germany Russia some of the usual suspects there that were those are the guys are the most servers hosting crypto mining variants so at this point it's worth just having a little checkpoint as to the data set I've got so I've got domains what the miners found on those domains I've got the IP addresses and I've got the geolocation and that's the data set that I had at this point in time so my goal now is to take this data set and enrich it a little bit more and pull

some more data out of it see what I can find see where that goes

so the first thing I did was integrated with virustotal um just like some of the other services virus turtle is a really interesting and useful API service for academic research so if you can prove that you're doing this for academic research they'll give you free access to their API and with that API you can do some some interesting stuff you can get domain categorizations you can check whether or not it actually exists in the virus total database can determine if there's any known malware samples associated with it and you can get some DNS and who is information

so once again just a simple for Loop going through all my domains and hitting the virus total endpoint gives me examples like the following so once again I got a bit of a problem now whereas before doing my initial loop I had CSV files loads of them 587. I thought it was a lot now I've got 20 000 nested Json files one for each domain and the fact that I'm working with python pandas data frames and I'm pretty much CSV Centric makes things a little tricky how am I going to get this back in how am I going to add this to my data set and so with enough messing around with bash and enough grip and awk and TR I

was able to chunk it all together and put it back into a workable CSV file and import it back into my data frame so that's just an example of what that data frame now looks like the enriched data frame you can see I've got the bitdefender category which is one of the domain categorizations I've got the domain which is really the URL or at least the the first part of the URL that the force the force Point categorization the URL the rank the Mac the miner found the IP address and the geo-located country so already a bit more of an enriched data set to work with so looking at some of the actual domain categorizations I know this is really

difficult to see but the main one for forcepoint which surprisingly is potentially unwanted software so no surprises there but other than that we've got business and economy newly registered sites information technology and actually compromise websites which is this one here which piqued my interest because that's most likely well sites that have not put it there on purpose sites that have been hacked and have that have had mining variants embedded anywhere but Defender categorizations were a little bit sparser not nearly as much data um but similar type of of categories business parked blogs computers and software porn and yeah a couple others with not as many categorizations so because these two were sort of vastly different in terms of

data set sizes I just looked at percentages and with that I could see that across both there were definitely similarities where but Defender and force Point both had about 16 to 18 percent computer software and information technology between six and nine sex travel news and media similar percentages of those domain categorizations so we start getting an idea of what type of sites are actually hosting these mining variants

so firstly apologize for the the horrible annotation but I am fiercely loyal to Microsoft Paint and I really that's the one I use and so these are the top 10 endpoints essentially the top 10 IP addresses that have domains with crypto mining embedded on them so that first IP I mean has an enormous chunk has you know over 2 700 domains hosting crypto mining on it on one single IP address and the reason I've circled them and the reason I've used this for colors um the red I'm going to show visually from an IP address perspective and the green I'm going to show from a data categorization perspective and with this I'm going to use neo4j

yeah so first up this is what this is for the neo4j prep these are the nodes so taking that that python data frame I was able to turn each single entry into a different node into a neo4j node and then write the queries in order to visually show me what it looks like so what we'll see next is basically the world according to me if you could visualize what crypto mining or at least crypto jacking campaigns look like so this is the first example on that list case number one this was the IP address that had over 2 000 domains on it so the red dots are the actual domains the green being the mining variant and

the blue being the IP address and so from a visual perspective this is what that server looks like and how it's mining the little blocks are the areas that I'll zoom in on just to give a bit of context so you can see what we're actually talking about that being the IP address and that being that over 2000 domains hosted on it and interestingly these are all Iranian and the other side of the coin showing the mining variant in this case it was fisacral.com so continuing to visualize and to to show the world what I think if crypto mining looks like from my my patent perspective I took a look at one of the other IP addresses

and this one had an interesting strategy so it was a single IP address of course with 238 domains split perfectly across two mining variants and just to zoom in and give that context we see the IP address all the domains and all the domains with that specific variant on it

another pretty picture of it all and this one actually being an endpoint that's a Google CDN so it's an expected output in terms of all the different domains and all the vastly different variants used across these domains so as opposed to it being a single IP address it's actually a Google endpoint and here you can see all those domains and all those different variants tied to it and yeah the closer screenshot giving that context

so instead of actually showing you know more patterns of IP addresses and hosting from an IP perspective this is it from a domain categorization perspective and this one I found particularly interesting so this is a single IP address with 116 domains on it hosting crypto mining but all the variants on this being authmind which is the consensual one so whoever's running the server and running these sites is doing it ethically as possible as everybody would have to consent to it and that makes sense if you look at that domain categorization at least in the top picture it's all around shopping and so these sites are less transient less dodgy been around a lot longer a bit categorized

valid and so it makes a lot more sense having a look at these patterns on the other side a domain categorization example of a site that's certainly not running anything consensual shows you what that pattern would look like where you've got 92 different domains and all sorts of different categorizations and different variants within them and yeah definitely not uncommon due to the transient nature of these domains most of them being uncategorized and the ones being categorized being categorized as potentially unwanted software and and pornography

next I thought I'm going to plug this into multigo and have a look at these IP addresses themselves and see what they've been up to so we know they're running crypto mining variants but what other dodginess has been going on and so using that the virus total API key I can get two types of data back detected communicating samples samples that have communicated with IP address or actually downloaded samples and plug it in gives me some interesting stuff and so this IP address in particular is a particularly naughty one it was number six on that list of the top ten and it hosted 111 domains with crypto mining but it actually had about 12 different hashes detected to it so

whoever was running the server was was up to a lot more been just hosting crypto mining they had all sorts of other malicious campaigns going on there and I thought this is probably an area where there's potential for more research seeing what's going on with these IP addresses and the ones that are diversifying their campaigns so the last area I'm going to talk about is particular coin itself particular area to dig into and that being JSC coin so JC coin was third highest on the list of most common variants and this one required a little bit more effort outside of my data frame so I just created a new data frame just with the JSC coin mining variant

and full HTML from all of those URLs there were 1 700 of them and the way that the way the JSC coin Miner loads it actually has the users account number in it so you can start tracking the user themselves so we can see the bottom four all have the same account number in one one two one seven three and so this gives me some insight into the campaigns running not so much the use of himself but I can see how many different sites they have working for them and whether or not there's actually any value to it so we're going to have a look at one one two one seven three as an account holder

and see if this is actually profitable as well as a couple others so it turns out one one two one seven three had eight domains working for him whereas 15838 had 228 different domains with his user code so this guy's got a lot of domains running you know running sites code embedded and I got wondering is he making any money of this he or she is this is as profitable and quite interestingly jse coin who's also since closed down offer a developer API key that lets you query balances so having a look at 15838 um well before I even get to his balances I took a look at every single one of those domains and every single one of the

domains was this landing page with that domain for sale an embedded Bitcoin exchange link an advertising Banner and the mining script embedded so whoever's running this was really trying to diversify there Revenue not very pretty not very uh good looking and that really proved it in these balances because 15838 had a USD balance of 55 cents regardless of those 228 sites with the with the mining script embedded maybe he or she is withdrawing those funds I'm not too sure the other one we saw the one one two one seven three had a negative balance so definitely not not making a whole lot of cash doing this so I thought what I would do then is

have a look and see actually who is out of the out of the user accounts that I do have access to who does have some money in the bank and one of the instances that being account number 9250 had two and a half thousand dollars in his in his JSC balance which really got me wondering what what site is this guy or girl running and it turns out it was the the brand money can't buy that no longer exists and it was ranked yeah over 300 million 30 million sorry in Alexa so not a very popular site but somehow that balance was pretty high so the conclusion to draw from this is who knows where the rest of those funds

are coming from because it's very unlikely that it was from crypto mining so with that I will send everybody off to T with a couple of wild well a couple of conclusions and next steps so the world of crypto jacking is is wild it's decentralized there are so many different variants and so many different miners out there that there's really a lot of space to explore and research more yeah yeah especially since coin I've and JSC coin are gone there's there's a lot of place to rerun this and see how what the ecosystem looks like now and who's taking up that space there's also areas to look at in terms of crypto mining on phones mobile phones

as well as into the profitability we saw at least JSC coin didn't seem to be as well as deeper trails around those individual IP addresses what they're up to what they're hosting and are they diversifying with other malware as well as into any specific variants and so any questions comments or insults

um I could see it from the source code so I could pull all the different URLs Associated to each user account yeah but now yeah they've closed down too so all right that's that

yeah yeah

well so what's really important to me was really the data analysis of it was being able to have the problem and to how am I going to figure this out and how am I going to crawl the internet for a specific concept a specific term without using Google and take that and put it into a data frame manipulate data so believe it or not these are the highlights from my research could have gone into a lot of more other areas that I didn't include that would have knocked everybody out so yeah these were the highlights from that yes

no not from my from the limitations of my research so there are lots of things I left out I couldn't cover I couldn't cover assembly I couldn't cover obfuscation this was really the bare minimum and from here it's a point to potentially research more

sure through University through University through academic

um no not really unfortunately unfortunately not yes

so I mean that's really decentralized and that's on an ISP level and I don't think anybody I don't think anybody's actively hunting and looking for them it's really sort of up to the individual to get those browser extensions and those lists to protect yourself but from what what I covered it and what I found nobody's nobody's really looking at that level yes

yep

exactly yeah

then you miss it so the no coin updates periodically in fact even daily I think that list so this is definitely point in time and that is that game of cat and mouse signature based what you find today you won't find tomorrow

yeah well you wouldn't really be aware of it unless you are running some sort of detector or some sort of blocker that's why a lot of these sites here are out there the Alexa rank sort of even though Alexa's gone too A lot of these Technologies are done was the area I was looking are these popular are these sites that we know of what did I know and the answer was really no besides the Pirate Bay

so it was from what I found it was webmasters who had signed up to the consensual service to use the consensual variant so no one seemed to be offering consensual services with an unconsensual variant so some Queen Ivan Monero being by far the most prevalent gives you the option go coin Hive and do it unconsensual we'll go auth mine same currency same rate but doing it ethically and consensual

um I don't only the jse one that I looked at in that one said user that had one banner and one one embedded script for every single endpoint but other than that unfortunately not believe it or not yeah I mean this research took me in the bullpac of eight nine months two to cover it all so lots of areas to to go deeper if if anyone is interested and and if anybody wants all the commands all the the pandas the Jupiter notebooks via the neo4j queries everything is is open and free I'll share that happily should anybody be interested

all right thank you [Music]