
Hello everyone. Thanks for coming. Um, this talk is entitled Gorillas in our midst, the runtime secrets of evasive IoT malware. So, first I'd like to introduce myself. Uh, my name is Cart Fitzpatrick. Um, I have had a an unusual path uh into cyber security. I uh initially did my undergrad in music technology at the Sonic Arts Research Center. I then spent 20 years in the music industry as a recording engineer um and a touring musician and DJ. Um during COVID I pivoted to um software development um as a lot of people in my industry did and I graduated there uh last June and came straight into uh PhD at the center for secure information technology at Queens down in Titanic
Quarter. Um so um out today we'll be discussing um the gorilla bot um my sorry I should say that um uh my dissertation and my masters was uh focused on IoT um and building um an embedded system. So the PhD came up um and it's IoT for um malware and dynamic analysis of IoT malware. So first up I'm just going to talk about the IoT threat landscape. Uh there are currently 15 billion IoT devices. Um many of these uh lack very basic security features. There's no encryption. Um hardcoded passwords, little to no patching. Um IoT malware is also getting smarter, more evasive, more adaptive. And it is now crossplatform um capable of running on everything from ARM to MIPS to x86. That
means that traditional antivirus or endpoint protection isn't lightweight enough um for these devices. So if we think about where these devices are um they're in our homes, they're in um industry, they're in our critical infrastructure, they're in our hospitals. Um some of them are just simple sensors with internet connection. So they have very limited memory um processing power and storage. So this gap is being exploited um by malicious actors um especially the likes of like botn nets which we're going to be looking at today. Um so during my uh PhD I've been using two approaches to malware analysis which we're going to be looking at today. Um the first is static analysis um which is
involves inspecting the code without actually running it. So it's fast, it's safe um and it effectively maps capabilities without requiring to be run in a sandbox. Um however static analysis won't reveal what the code actually does at runtime especially if it's offiscated or packed or encrypted. So this is where dynamic analysis comes in. Um we execute the code in a controlled environment and observe the actual behavior of the code. For example, system calls, network activity, uh runtime op code traces. This is particularly useful against polymorphic or metamorphic malware which change the code structure to evade static detection but always behave the same way when executed. So I'm going to uh talk a we bit about
the mai botnet architecture. So the Gorilla Bot, which we're going to be looking at today, um is basically an evolution of Mera. It's a a new variant. Um so basically, um infection begins when bots scan for new devices with default or weak credentials, usually over TNET or SSH. Once a device is compromised, its IP address is then sent to a report server. The report server coordinates with load servers to deliver the appropriate malware variant based on the devices architecture. The victim then joins the botnet and begins communicating with the C2 server. C2 server sends instructions to the bots uh to which targets to attack, how to spread further etc. Um the bots can then also perform coordinated attacks and the
loop then continues. So discovery, infection, reporting, control and attack and this enables the botnet to grow rapidly and to maintain control. So looking at gorilla bot which is an evolution of miri. Um it's a recent MIBS botnet first seen in September 2024 and in in the first month of September um it was responsible for hundreds of thousands of attacks. Um so through our dynamic and um static analysis we will be looking um at the code for to show examples of four aspects um of the gorilla bot. So the first is its environment aware anti-detection strategies. Next we'll be looking at the multiarchitecture payloads which are downloaded through a custom shell script. Uh next we'll be looking at its
adaptive propagation strategies and then it's XTA cipher for C2. So first uh looking at the environment aware sandbox evasion. So firstly it'll uh check for tracer P in proc self status. Um this is um to detect whether or not it's actually being debugged. Um next it detects for Q pods to see if it's being run in a Kubernetes container um by looking at proc 1 C group and this is so you can see here this is from our static analysis. Um this here is uh from our dynamic system call run trace and you can actually see the tracer P system call check there. Um so the first line there it opens prox self status. Um it
uh then reads from FD3 into a buffer looking for the tracer P ID. Um and then it closes it there to allow for the next process. Um which is its cube pod system call check. Um so you can see there are very very similar kind of process. Um if either of these conditions are met, it'll terminate immediately. Um, so there's no payloads or no persistence with this cheeky little message. File system not found. Exiting GorillaNet did not like this honeypot. Okay, so this environment aware behavior makes Gorillaot evasive and it also renders basic sandboxing techniques ineffective. Okay, next we'll look at its multiarchchitecture delivery. So, Gorilla Bot keeps Mai's scan brute force uh attempt at logging in, but it changes
the installation model slightly. So, loaderless staging involves fetching a pre-built multi-architecture kit in one shot and self- selects the correct binary. The bot will then pick locally either by reading the CPU type through proc CPU info or simply cycling through and using trial and error uh on each binary until the correct one runs. So this cuts out the loader reporter dependency and it makes propagation easier. So if we look at its adaptive propagation strategies um Gorilla initially brute force default credentials um over tet or ssh which is similar to mi. However it has multiple payload execution strategies. Um so it begins with fireless infection pipes the payload directly into the shell using wget and then you can see that if that
isn't successful or wget fails it adapts and falls back to alternative protocols like TFTP FTP get or curl. Um we also see a hard-coded reference to busy box there. Um this is shipped to uh ensure that it's compatible with uh severely resource constrained devices. Um so it uh finally if the fileus approach fails uh gorilla bot will download the payload uh to disk and execute it in the background. So we can see here um is uh again our dynamic system call run trace and you can see there uh my binary the last process um it drops in a new initial script um and then wget pulls a remote script and and execute it. Um so looking at adaptive propagation
after infection um Gorillaot performs local area network discovery using march and multiccast DNS to identify nearby devices such as smart TVs, cameras and other IoT equipment on the network. U any suitable candidate is then subjected to the same infection and propagation process. The botnet also issues lightweight probes to game and voice service protocols. So you can see their team speak. Um and it builds uh to build a list of DDoS reflectors. Um then when fetching payloads over HTTP, it spoofs uh Windows Chrome user agent to make these requests appear like normal traffic which is another of its uh detection evasion measures. Hey, finally just talking uh briefly about its XTA cipher for C2. Um we can
see here from our static analysis that um Gorilla masks it C2 endpoints um with an XTA style cipher string. However, dynamic analysis can help us see what's actually going on. So uh after decryption, it learns its outward IP via UDB poke to 8888. Um and then it opens a TCP listener on 384 uh 242. So this uh reinforces the fact that dynamic analysis um will show things that even uh that static analysis can miss especially if it's noisy or offiscated or encrypted. Um so the dynamic analysis will reveal the true runtime behavior of the malware. >> [snorts] >> Um so this leads me on to uh my research. Um I recently had a paper published. It's entitled lightweight AI
based malware detection using upcode and system call analysis across IoT architectures uh by myself and my supervisors Dr. Carlin and Dr. McGlaclin over at CESIT. Um and we compare static upcode analysis techniques with dynamic upcoded and system call analysis. So the work moves beyond static inspection. Our dynamic analysis pipeline captures the op codes and system calls of the malware binaries and learns the learns these behaviors across various architectures. Um so I'm just going to have uh I'm just going to talk about how our uh dynamic analysis pipeline works. Um our malware samples are collected from various IoT architectures. Uh for this particular one we used the four main uh ones that we had samples for. So it was ARM, x86,
meips and meipsel. Um binaries are executed in isolated virtual environments for the dynamic run traces. Um static and dynamic op codes and system calls are captured within uh a KU environment. The data is then used to train uh random forest models and evaluated using cross validation and uh synthetic minority oversampling technique to address the class imbalance. So the one of the issues that we ran into um was that there are many more malware samples than there are benign samples. So we were talking a ratio of maybe 10 to one. Um so uh using smoke prevents the model from overfitting towards the majority class. Um the outputs were then labeled as malicious or benign and fed into a
random forest model. So I'm just going to show you a we bit about uh the results here. So we can see here um um in case you can't read that the static analysis is in red and the dynamic analysis is in green. So dynamic of upcode analysis achieved 100% accuracy across multiple architectures. Um static models performed slightly less favorably especially on x86 and myips. um ARM binaries showed equally high performance under both methods but in general dynamic analysis outperformed the static analysis uh suggesting that runtime behavior is a better indicator of malicious intent and again it reinforces the benefit of dynamic execution traces in detecting malware. So the next question is can our model successfully identify gorilla bot as
malware and thankfully the answer is yes. Um so we took the gorilla bot example that we had performed the static and dynamic analysis of um we did a runtime up code trace. Um so the times that we've chosen for all of the traces was three minutes. um that was a trade-off between the amount of data that we could get from them um and also computational expense. So in total there was around 60,000 samples to analyze um if you think it run them in three minutes in a sandbox. So the um the the VM has to start up and then um run the malware and then it has to shut down and then restore from a snapshot a previous
snapshot um to prevent like in further infection or um compromisation uh of the system. So um when you're talking about 60,000 samples uh that was going to take about three months of of continuous analysis. Um so for the um so for the analysis we only selected about say 2 and a half thousand from each architecture but we're currently developed an automated pipeline where we can split it across multiple uh machines and we can analyze a lot more at the um like simultaneously. So we ran our our gorilla bot um and then we fed it into the saved random forest model um and you can see some of the uh uh metrics here. So it predicted it with 100% confidence
um which just further confirms our dynamic analysis pipeline as a predictor of uh unseen malware. Um so we'll just talk a we bit about future work. Um so the idea next is to um optimize these models for embedded deployment. Uh like I said before um all a lot of these devices are severely resource constrained. Um so to be able to run these models um on something like an Arduino or an ESP32 uh what we need to do is uh prune the the models initially to reduce the size of them. Um so pruning the models will reduce some of the nodes um without that much reduction um in the accuracy of the predictions. Um also we can save a lot
of memory um by using quantization um from uh floating point 32-bit to integer 8 bit. [snorts] um you can actually reduce the um memory um requirements by a factor of eight. Um so these will then be deployed um using uh tiny ML inference engines such as TensorFlow light. Um and the idea is eventually to implement some kind of cascading classification pipeline um where you have um so maybe some kind of like an edge device for um fast detection on device and then escalating it to an intermediate device or a fog server or cloud for advanced analysis. Um so this hopefully will enable uh quicker zeroday detection um of unseen malware. Okay. Um, and thank you very
much for listening. There's my LinkedIn if anybody wants to get in touch.
[applause] Does anyone have any questions? Okay, thank you.