Insights on Using a Cloud Telescope to Observe Internet-Wide Botnet Propagation Activity

Name: Insights on Using a Cloud Telescope to Observe Internet-Wide Botnet Propagation Activity
Uploaded: 2024-09-04
Duration: 37 min 30 s
Description: This talk presents the Cloud Telescope, a reproducible cloud-native architecture for globally distributed capture of internet background radiation across AWS regions. The speaker describes the system's design, its use of ephemeral Terraform-based infrastructure, and findings from 45 days of monitori

BSides Las Vegas · 202437:3098 viewsPublished 2024-09Watch on YouTube ↗

Speakers

Fabricio Bortoluzzi

Tags

CategoryResearch

TopicCloud IAM Malware Analysis Threat Intel

ResearchEmpirical Research Technical Deep-dives

StyleTalk

Mentioned in this talk

Tools used

Terraform Wireshark

Platforms

Alpine Linux AWS

Languages

bash

About this talk

This talk presents the Cloud Telescope, a reproducible cloud-native architecture for globally distributed capture of internet background radiation across AWS regions. The speaker describes the system's design, its use of ephemeral Terraform-based infrastructure, and findings from 45 days of monitoring that captured 10 billion packets revealing Mirai and other botnet infection patterns, vulnerable service exploitation, and geographically distributed attack sources.

Show original YouTube description

Breaking Ground, Tue, Aug 6, 16:00 - Tue, Aug 6, 16:45 CDT This presentation introduces the Cloud Telescope: a reproducible and ephemeral cloud-native architecture for globally distributed capture of cybernetic activity. The Cloud Telescope comprises a Terraform infrastructure-as-code architecture currently compatible with Amazon Web Services in their twenty-six commercially available regions. We present the Cloud Telescope’s architecture alongside with the results from three experiments conducted in 2023. For experiment number 2, we were able to describe Mirai infection patterns, the commands that are executed upon infection and the most active countries providing infrastructure for botnet payload propagation. People Fabricio Bortoluzzi

Show transcript [en]

all right good afternoon welcome to the breaking ground afternoon session my name is Fabricio balozi and I will be presenting This Cloud telescope research to you in the next 40 minutes thanks for choosing this track among so many awesome tracks out there I hope audio is clear if audio is not good enough please shout complain and I will address well my name is Fabricio and this is a presentation on this Cloud architecture CL uh called Cloud telescope and it allows for observing internet wide activity in this case the presentation today refers to describing malware spreading activity botnet spreading activity as part of the results we can find with this development and research this is myself I am a computer

scientist I currently work at norof University College in Norway and it's quite nice to be here in Las Vegas much warmer huh um I am a cloud Solutions architect I teach cloud computing and cyber security for undergraduate students among other activities that I hold and this is part of the research myself Barry Lucas and KLA have been doing for the last three years I have five topics to show you internet background radiation the net the standard Network telescope the cloud telescope the experiments we have been conducting with item number three and the discussion of these bot Nets that we have been detecting by the method not sure if you're familiar with this terminology internet background

radiation seems like a fancy terminology but it's pretty much an analogy to describe all the malicious often malicious activity we can catch we can capture by capturing unsolicited packets either on a domestic router a standard unfire wallet host or in this case a sensor Network that is deployed within cloud service providers so by analyzing this we can learn from vulnerability SC activity we can learn warm botn net propagation and more recently even benign scanners such as senses Shan and much more um the back scatter is one of the most prevalent activity found in Internet background radiation in this wild traffic that you can capture as long as you don't filter by using standard mechanisms such as a

firewall we can say that this intense activity arriving to Internet connected hosts is long duration it happens all the time it's low intensity it's not really a huge bandwidth concern and it has been studied for at least 20 years mostly in a research center very close to Las Vegas within kaaa the Center for Applied internet data analysis here in the United States mostly held by by the internet or the network telescope deployed at the University of California San Diego they are our Benchmark they are our reference to the work that I have been conducting vulnerability SCS you know what this is but this is also something you can capture if you deploy sensors to listen to the internet background

radiation um you can learn how noisy it is how frequent it is and the most attacked or most scanned parts so that you can start to trace some sort of behavioral pattern not affecting a specific company or a specific web server but affecting the internet at Wild gross if you will it's mainly malicious activity as I said but also the newer internet sensus tools have been changing the pattern of this traffic ES captured by the method especially show them and senses worm infection is also something that researchers can learn by listening to the radiation we can learn how frequent they are what are the most scanned ports or infected ports this is the type of research that led to not

really to the Discovery but to the quantification of how bad Cod red was at the time Blaster was Sasser these are old names for sure in most and let's say mid-2010s configure as a major uh mare spreading activity that led to major problems at the internet at the time bot net activity has becoming of growing interest around starting around 2015 B Nets of course are networks of computers infected with malicious software aiming at remote control remote control of the infected device ultimately leading to a attack orchestration against in this case most of the time against famous targets such as big companies more recently GitHub and many other companies out there so radiation or this internet traffic

pattern has been changed changing from M Centric to bot net Centric according to our observations and according to kaida telescope and that that led to the to the the following research interest what if we could listen to this traffic and how what if we could describe the patterns that are that are effectively used by this bot net infection and what are the most vulnerable devices even though there are many devices out there if we could profile them what would be the the most relevant characteristics the field is completing 20 years of active research if you would like to join this field there are many ways to join it as a bachelor student Masters and doctoral level or PhD level

studies it's ultimately an interest of scientists the cyber security industry Armed Forces many times because it seems that is a geopolitical influence driving many types of attacks as captured so everyone could have some sort of interest in at learning from the radiation the network telescope is the standard mechanism this exists for nearly 20 years let's have a look at it usually PE usually businesses deploy firewalls to protect against the most obvious malicious activity in this case it's the very opposite we want to to let traffic in on purpose unfiltered and to that machine the more IP addresses you can bind to the device the better so huge telescopes can listen to let's say an entire slash eight 16 million

packets uh the most the newer telescopes they only listen to a fraction of this amount sometimes uh the equivalent to to a Block C Network as l24 200 hosts something of the sort so and that's related to how scars ipv4 addresses currently are in comparison to previous times so that's a machine this is usually built with FreeBSD Linux TCP dump IPFW Linux default firewall if you prefer and TCP dump actively records passively records but they're saying passively records the incoming traffic most of the times it won't answer back the traffic so it's passive so to say sometimes you can Implement a Honeypot style behavior and then you will capture application layer message exchanged with the sensor so methods vary most of the

time researchers only want the incoming traffic so they can profile it not even answering the TCP three-way handshake is a priority to this studies the network telescope is passive it doesn't respond traffic's allowed in it's usually deployed at universities not really probably not at companies probably mostly related to people trying to learn from this uh cyber threat acquisition mechanism so far so good any question uh real time okay leaving it for later and the cloud telescope this is our contribution to the field mine berries Lucas and Carlos the idea is that instead of deploying a single telescope in a single region of the globe such as the University of California in San Diego such as in at roads in South Africa such

as in s Paulo in Brazil we want to deploy a fleet of sensors that will distributedly capture the traffic so we can learn if there is um some sort of geopolitical influence over the type of traffic that hits the United States versus the traffic that hits Norway or China or any other country the reason is that I mostly spoke about the topics but cloud service service providers currently enable us to deploy a distributed uh Fleet that is budget friendly that allows want to launch worldwide sensors without having to move from their chair meaning deploying via software instead of deploying a physical computer as it's usually [Music] done the cloud telescope is therefore described by this architecture

containing the internet gateway router and a forcefully open security group so instead of taking advantage of AWS or Google clouds default to deny policy we have to open all parts we have to allow all traffic in so to say and then the sensor is deployed there for example in the United States in AWS you can cover four different regions of the country two in Canada one in Brazil and and so so on and so forth and that's how the cloud telescope can be deployed traffic is also required in peap using a demonized version of TCP dump recordings are rotated and they are also uploaded to a cloud bucket so that recordings are centralized instead of distributed

across the many telescopes out there and it makes it easier for later processing the data common common tools and stack that is used tshark is the def facto standard for dealing with largest amount of pickup traffic if you just want a sample that contains less than 1 million packets it should be okay to look at it on wire shark for small samples for learning in a more friendly way and you can also index packets using very interesting news uh security Stacks such as security onion or the well-known elastic search and kibana frontend to interact with the data and eventually learn learn from learn new patterns from this indexation radiation looks like this pretty much a standard

peap but in this case because it's a distributed sensor Fleet you can see for example in one sensus scanning our sensor in ch you can see showd done and this quickly appear so you can even profile how frequently uh friendly S sensus tools are scanning the sensor Fleet in three you can see some sort of distributed back scatter it's back scatter because TCP is resetting the connection we didn't really initiate that connection but it apparently looks like we did which is also which is the very definition for internet back scatter and it's ultimately interesting to learn from this unexpected traffic moreover you can see let's say in one again public scanners are out there in

two many ping requests arriving from sensors in another region of the world in this case it's a a US sensor being queried with icmp packets coming from Amazon in China you can see old style sep asterisk VIP attacks using ID uh UDP communication on on the Sip part and that's that's how a sample looks like the architecture is deployed as this at the we we were able to capture this traffic during 45 days last year at the time AWS had 26 commercially available regions um each one of the regions was added or contemplated with 10 sensors 10 ec2 virtual machines each one of them tailored to be as small as possible and therefore as cheap as possible as well so we could

keep the experiment running for the largest amount of time and this answers to because we want because this is ephemeral in the sense there is no need to keep the architecture running after the capture we made use of the cheapest price Model made available by AWS which is called the spot pricing model using this one can save up to 70 sometimes 90% of the standard costs related to a sensor but in response to that you are making use of the idle capacity of the cloud service provider therefore you have to handle an instance termination notice or notification meaning that your instance can be destroyed if you no longer wins the bid for having that instance allocated to

your account so we Implement some sort of listener that will launch a new sensor upon request ter upon termination request terraform is used to describe the architecture so anyone wanting to reproduce the experiment can do it and Bash is used for Automation and that's how it's usually deployed

um we use Alpine Linux because of the smaller footprint this is related to the fact this distribution doesn't Implement a full GBC um userland Library it implements a a much tighter smaller footprint um intermediate or middleware as we can call it meaning it's um it can operate with half a gigabyte RAM virtual machine and that's the main characteristics of the the device whenever we receive determination notice we will stop the capture and launch on new one but in your minds imagine you have up to 260 um virtual machines 10 per AWS region in this case and they are allowing in capturing and recording unsolicited packets aring to the sensor that's the the first takeaway of this

methodology the Emeral nature means that it exists as a terraform artifact um the GitHub repository is maintained by Lucas Baylor and that's the that's where the results I'm going to present you come from we deployed 26 260 sensors in AWS starting August we wanted to keep it running let's say for six months but after 45 days of capture even though we weren't serving no service 10 billion packets were captured we had to stop because this was becoming quite huge to process it resulted in 200 GB of P apps that any anyone out there can download and do your own studies there are many patterns there that we only know exist no one ever got into them to see what

they really look like we are still looking for answers on is there any sort of geopolitical influence affecting the attack patterns we capture with the telescope this is an open question and for this experiment one key characteristic is that for parts SSH tnet web and https we implemented Lucas implemented some sort of application layer responder um answering back to attackers if they wanted to get into the machine they could do so but we were actually recording their commands upon infection other than obeying to the commands other than really exposing a vulnerable shell back to the attacker so that's one key characteristic that is that makes this presentation unique what does one learn if one launches such an architecture what can

you learn what did we learn for 45 days after capturing 10 billion packets according to the experiment and this is quite interesting 98% of the traffic unsolicited traffic arriving to the sensor fleet was TCP meaning no footprint no Footprints of denial of service attacks or any other form of UDP exploitation or icmp fluing were saw only by a very small extent most was TCP we captured almost 1 million IP sources from all all parts of the world which I will profile to you in a in a minute but that's the range of the if you could call it the telescope resolution or aperture that's what we can learn or see by deploying it currently we even though I only launched

260 sensors they were recycling um their IP addresses according to AWS policies and therefore we had 603 IP addresses captured on our side as honey pots traffic distribution across the world was fairly even the Baseline here is 4% per region in this case Asia Pacific Southeast 3 saw 6% of the radiation and Asia Pacific Southeast two saw only one only 1% of the radiation that's also the newest at AWS region that could be linked to this fact uh the most attacking countries not not now not looking at it from an AWS perspective but by the country that owns that IP address or the country that is linked to the radiation Source or the attacking Source we saw the Netherlands

as the most prevalent country and there is a curiosity there for can you imagine the reason why Netherlands is the most apparent source of this random ta arriving to the sensor Fleet sorry it's coming from many anonymization services I uh VPN Services which seems that the country highly they have a culture eventually of Hosting these services this is openly available but that's the pattern openly available anonymization services including vpns that's our guess we cannot fully endorse the statement but that's what it look like considering the autonomous system number holder the owner um and the least attacking Source was Taiwan Thailand Pakistan Poland and so on the most frequently attacked sensors were residing in the United States this is is

slightly biased in the sense that United States has four AWS regions four different places we had had 40 sensors in the country but even if you split by 40 you will still see some sort of average activity hitting the US and the least attacked sensor fleet was in Germany Canada and the United Arab Emirates but it's also fairly even with a small bias towards the US and India now if I had to tell you in 2024 that the most attacked TCP port in is actually the taet port this would be like big news in the '90s in the early 2000s but it's still the most prevalent I would say there was a shift around 2015 it wasn't the most prevalent

according to kaida telescope but it has become again the most attacked Port by far do you have a guess on why to tet is currently the most prevalent destination port for random attacks on the Wild it's a three-letter uh word or a a keyword iot thanks for the answer precisely meaning the iot introduced a new generation of low power low cost fast time to market devices that are not necessarily as secure as recently we see like modern operating systems becoming more and more secure so that's our conclusion to the why but as you can see from the patterns that I'm going to show you it's actually they're actually exploiting Linux Kel 2.2 2.4 which are highly

related to embedded devices not really ordinary servers followed by GTA but to a far less extent Minecraft VNC pretty interesting to see VNC exploit as a top Target and then the classics SS JDP DNS over DCP which is not really the common standard uh SMTP and and so on RDP is is also there if you think about UDP exploitation which is only seen by a small amount in the experiment it mostly relates to the recent LPS exploitation or the recent LP vulnerabilities that are actively exploited and of course the amplification attack related to M cach redis which is connected to some of the talks presented earlier today is also there somewhere and also DNS and the classic often

exploited ports icmp is mostly ping 99% of the times Echo request Echo reply to a far less amount and other not so popular icmp types being captured when it comes to pinging and ping flooding the most active Source was actually China followed by the United States So speaking about profiling the source of the attack this is what I wanted to get meaning the the IP sources they really belong to uh companies in the Netherlands that hold business related chony they sell anonymization Services most of the time Belgium also shows up there followed by China and Japan so that's the profile in all cases this is probably tunneled before hitting the sensor using some sort of gr

encapsulation or any other equivalent protocol for the same purpose now speaking about the analysis the mware analysis let's analyze it together even though uh it could be small small font there let's see if we can decode together this is uh text version of wide shark if you will T shark the coding we can see at the application layer well it's TCP at the transport layer Source Port not um in this case it's it shows reversed so Source Port is actually 39,000 mostly irrelevant but destination is 23 um and then the iot records any command that the attacking Source wants to inject without actually executing the commands but acknowledging say yes you are successful with the command proceed

that's how it pretends to be so then it starts with an attempt of running a a w get script that we can assume we will try to download some sort of payload with the infection commands control and command or turning the target into a zombie very likely um this also reveals the vunerable server often vulnerable server serving the payload so we can also study who is out there vulnerable in a vulnerable manner serving payloads for attackers on the internet eventually it's just an unprotected website that got hacked and they are the Apparent Source of the attack many times it asks for some sort of busy box exploitation they hope our honey pot runs busy box which reveals the embedded

nature of the the attacker's expectation upon the target we can also see trival FTP this is not really FTP But A variation trying to download data from somewhere else trying to run shell scripts related to the tftp download attempts to ex ex cute buy box FTP G and then this is from from an attacker's perspective this is an attempt to ensure that they have really infected machine and turned them into a zombie we also see curl and then connection attempts arriving next on the next packets so that's pretty much the way it looks like once you deploy the telescope and start learning the attack patterns now let's go into a some sort of it's a binary analysis but not in the

sense of dissecting the binary we want to know which binaries are most frequently linked to the payloads being delivered to the Honeypot many so the names are not really the most exciting ones this one calls I but you can hash it you can hash the binary and compare it against publicly publicly available uh sources and then you learn exactly what this binary wants to deploy what it's what is its intent and then you see many funny names out there but notice one pattern reinforcing the iot behavior it's most of the time 32-bit binaries or they carry the 86 uh appendix this is not anyway that the techer can call it as they want but but it reveals some sort of pattern

um and then we can link the binary with the very botn net in this case it's the botn net profiling and the bot Nets are I believe this is this is popular names by now right marai is it popular have you heard of meai before it's probably the most active botnet on the wild they're not really aiming at a single Target but they are the most prevalent in quantitative terms and there are many variations out there including Mozzy detected 10 million times in the experiment quite frequently and Sora among other names so we can we can try to learn how many strains are are out there and by learning from where they are downloading the payloads from we can

try to guess if they belong to a same hacking group or if or if they could belong to to different hacking groups deploying their zombie Nets or their bot nets for later use against Targets in all cases for the top 10 they carry a 32bit signature they are 2.x Linux kernel often related to embedded

systems um after probing the the honey pot if it's attacking Port 80 it won't speak shell commands it will speak htpp commands right in that case we will see gets posts and this kind of pattern in this experiment six million times Mirai was trying to infect the Honeypot with this single Source 185. 224 something hosting the Cutie binary to a smaller extent Mirai again mozi they were the most prevalent on so these are the busy according to the experiment these are the busiest sources serving if we were to quantitatively attack this problem tackle the problem we should start by addressing what's going on with the services what are the vulnerable Services exploited by attackers that we should promote

awareness about this kind of outcome the commands also reveal what kind of vunerable service HTTP based service could be running on the vunerable hosts many times it's the an attempt to get the environment variables then it depends it depends on the type of Technology use it it either be a Windows server with i a PHP server on an Apache many other options but you see there are predictable patterns a lot of exploitation uh trying to leak GitHub or git authentication credentials this is the kind of cyber threat intelligence acquisition you can get by running the cloud telescope um of course because this is deployed in AWS we could expect a lot of health checks coming from the the cloud

provider itself towards the a web ports 8443 1 million times that was the case but to a smaller extent in other cases it was this projects this also reveals that like for example Palo Alo has a a project related to mapping the Internet it's benign it's not an attack but it's interesting to quantify another Cloud mapping experiment at PDR labs.net this is unknown to me census is inspecting part A and signing because this is totally upon the attackers to rewrite as they want this is not a standard that must be followed truly and of course python browsers or re mappers go mappers much faster and others so this is again something one can learn now if we were to speak

okay you detected a lot of binaries we see that it's an i iot style infection but what else we can tell you that in this experiment most attacks are exploiting the majority are exploiting CV 2016 216 this is related to embedded CCTV devices so from an attacker's perspective it's a good business to launch infection attempts against CCTV as they are the most generally speaking they are the most vulnerable followed by a CMS system used mostly in China think cmf I only knew about it by trying to reverse engineer what was going on and also total link which is behind billions of iot devices a common middleware that is used use it for anyone wanting fast time to market

iot devices in our cases score 9ate that's so according to what is the Insight here the Insight here is that the most popular popularly exploited systems are iot systems currently according to the method and they're exploiting highly critical vulnerabilities which goal is to increase the fleet of Bot net sensors on behalf of the attacking group they they belong to so expanding the zombie Fleet if you will that's the conclusion um that's what you get currently by analyzing um unsolicited traffic arriving to a sensor Fleet on a cloudbased experiment we have many questions there requiring so that's an invite if you want to help us to find better answers in terms of geopolitical influence but okay this is B net

expansion but is there a political intent behind attacker is it a nation sponsored nation state sponsored attack or is it just an ordinary attempt for business for money for profit both could be true we need to investigate here you will find the most relevant links we also have companion Publications if you want to learn this from a more academic a more let's say yeah academic research perspective but it should be interesting for both worlds the research Community but also the industry should have an interest at the results we have been finding and there is of course more to come that's all thank you very much for your [Applause] time thanks again we should have three

minutes if you have a question does anyone have a question no okay that's all thanks

Insights on Using a Cloud Telescope to Observe Internet-Wide Botnet Propagation Activity

Related talks