
All right. Hello everybody. Uh, our next speaker is Kiriel Poenko and his session is has a very cool name, very on brand for Bides this year. Slaying the dragons, a security professional guide to malic malicious packages. Uh, reminder, Kuriel would love to get questions uh, if you can submit them through Slido. You got to go to the Slido website and type in besides SF or use uh, bsidesf.org/ or slq&a. Over to you, Guriel. Hi everyone. My name is Kuriel Boychenko. Um I am a senior threat intelligence analyst at Socket where we protect open source and software supply chain at scale. Um we we're meeting at the movie theater and this means that you guys have comfortable chairs which I
which I tried. They are really comfortable but also I get very bright light and I I cannot see anything. So I will try to understand your responses to something that I'm saying but I may not see it very clearly. Um thank you. So yeah today's presentation and thank you very much for your interest and coming to this talk a security professionals guide to malicious packages and of course slaying the dragons since here be dragons is this year's uh beside San Francisco theme I would like to introduce you to a uh fire breathing animal. So before you guys uh get worried because I would if I were in your shoes, this presentation will not be about graphic design. There are way
more qualified people than me in in that domain. So um and you can you can clearly see from from what I did to this uh picture, but the dragon dog, as you can see, she likes many different things. I uh I think I like them, too. all of them except for sitting on a high perch. That's not really something that I do often. But um let's get to um to some shared interests with the dragon dog, which is defending open source. The way that we build our software applications today is that we don't start from zero. we uh take different building blocks, different pieces. We can think about this as gluing something from necessary parts.
Um and that's the beauty of open source. We don't have to be experts in every single feature for our software application. Here on this slide, you can see uh an example of a software application somewhere in the center and then many different dependencies or libraries um that are included or that our application depends on. And this image is not an exaggeration. This is something that uh is very common for open source for software development today. Um and um and that's that's not a bad thing. That's a good thing. That's that's why we don't have to be experts on every single thing. We can we can borrow, we can plug and play, we can build upon. And uh to mention some numbers uh
the average application today has a lot of open source in it. Uh from 70 to 90% open source in a software application is a lot. Um we we have over 400 million code repositories on GitHub with over 100 million uh users on GitHub. And there are massive uh ecosystems for programming languages. Um we have open source ecosystems for npm for JavaScript and TypeScript. We have Pippi for Python. And uh just these two ecosystems they uh accumulate millions of packages and have billions of downloads. Even smaller ecosystems like Ruby gems will still have uh millions of downloads uh over 170,000 um over 170 uh,000 uh packages in in Ruby Gems ecosystem. So you can you can
see that this is um that this is a big um this these are big data sets. These are big ecosystems and the bad guys know that too. uh they understand that today's um building of application is designed in such a way that uh the the bar for publishing a package a new library at one of the major ecosystems is not very high. It's pretty easy for us to publish a package on those ecosystems. And so the bad guys are trying to infiltrate uh these major e ecosystems with malicious packages. Then meant that we would build custom applications. We would spend weeks or months building our um software applications. We would start from scratch. um and uh the value was in this custom
and uh optimized code. now means that we assemble uh from multiple different building blocks and we also use AI um driven automation to help to help us code to suggest uh to provide boiler code boilerplate code and um and the bad guys know that um they know how how we're assembling the the applications and they are trying to use it to their advantage. So this is the bluff or bottom line up front for our time together today. These are some of the things that uh we see at socket that our threat research team that I'm part of uh something that we see on daily basis. Um this is not an exhaustive list of um different attacking techniques and
methodologies. um rather this is something that we we see quite quite a lot in different malicious campaigns that we identify and track. Um and I will give you more illustrations and examples and walk you through different examples for each of these. Uh we often see that these techniques and methodologies would be combined. So the bad guys will will do XYZ all of them some of them. Um we see code repositories abuse. Um that's basically when the bad guys are infiltrating ecosystems with malicious packages. We see a lot of type of squatting uh which is creating deceptive uh similarly named packages that appear benign but they're actually malicious. Um and type of squatting is a problem
not for just one ecosystem such as npm. We see it across many different ecosystems. We see it across major ecosystems. We certainly see it in all ecosystems that we protect at socket. So we will see it in npm and pippi. Ruby gms will see it in go, we'll see it in nougat. Uh so this is uh definitely a ubiquitous problem for open-source and software supply chains. Ofuscation um comes as a red flag comes a lot in malicious packages that we detect. Does make it hard. It doesn't make it impossible. Um but uh it's something that we keep an eye on. Uh multi-stage malware. We see that uh a malicious package will have uh first and
second stage payloads which may prolong the malicious campaign or just be designed in this way to um change payloads. Uh we see that the bad guys use automation and AI. Um we'll provide you with some examples of that. And then legitimate tools and services. Um, not a shocker there, but of course the bad guys, the threat actors will use legitimate tools and services um that we use for ethical research, but the bad guys will use it for things like gaining initial foothold, things like um establishing persistence and for things like exfiltration. Let's start from an example of uh the Go ecosystem and this this example something that we discovered um this year is um uh was based on code
repository abuse. So the uh the bad guy uh found a way how to abuse mechanism within the go module mirror. Um and the go module mirror is is the standard for um for how to um consume and how to distribute and how to proxy Go ecosystem modules, Go ecosystem ecosystem packages. Um what the bad guy did is they uh back in 2021 they published a GitHub repository and they cloned a widely used module called Bold DB. Uh they make made a clone of that. They typos quoted it. Um they just added a few letters to the name. Um and at that point the go module mirror that um is taking GitHub repositories and caches them especially
those that belong to the Go ecosystem. At that point the Go module mirror cached that malicious package. Um later the bad guy changed their code in the GitHub repo so it was no longer malicious. um they they pointed it to the legitimate version of Bold DB. So the repository looked clean. If you were to manually inspect the repository, you wouldn't find any problems. But the Go module mirror already proxs uh already uh cached the uh malicious version. So if you were to pull directly from the official Go module mirror, you would be pulling that malicious version. Um so here we saw a number of things that the bad guy did. You know they abused this caching mechanism of the go
module mirror. And this mechanism we something that we need to remember this is uh this is built for convenience. It's it's built for efficiency. The bad guy found a way how to leverage this for uh their nefarious uh reasons and purposes. But the go module mirror and the caching mechanism is actually designed this way to be helpful for us. Um but the bad guy found a way to s circumvent this and yes they did typos quot squatting. They found a module that was very popular. Um also the legitimate bold DB module was archived. uh and we as developers when we would like to pull a module a a dependency a library we often would go for something
that is not archived but looks like actively maintained. So that would also be potentially confusing to developers who would look to install bold DB package and others may have may just be confused because of the lookalike names of packages. Also this package used a auscation method that that is interesting but it's it's not that uh complex. Um so bold DB malicious package had dozens of files in it and uh a number of files two different files they were designed to uh work together interact so to speak to assemble a backdoor um that was um uh that was included there by by the thread actor one file was called db B.Go and it was designed to initiate the backdoor um transmission
and communication process for this backdoor. Another file called corsor.go um had some um random looking numbers in it. You can see at the top just just some numbers. But then there was a um a program programmatically included logic to transform certain things. For example, fives uh would become dots and sixes and seventh sevens would be completely removed. And so we would get an IP address with a port number um which was uh which was that uh C2 for for the backd dooror that the thread actor embedded in the malicious package. So, not uh anything earthshattering, but it did help obfuscate this package and bypass certain detections by the go module mirror and potentially by some um
signature based scanners. Here's another example uh of typos quoting. This is from another malicious campaign that we identified. Um here you can see um package named hypert and go pun intended figure which hypert is the one that you actually like you've got to ask yourself a question do you feel lucky which one is the one that you actually want like they look so similar I added some this is not about graphic design guys right but I added a little bit here So, one of them is legitimate, the others are not, but they look so similar. It's uh this is not that easy. It's uh it's easy to mistype. It's easy to get a module that will not
be not only not helpful, but will actually be malicious. Here's another example. This one is from npm ecosystem. So uh last year um towards the end of October right for the time for Halloween our AI scanner detected a large number of npm packages being published just in uh rapid fire um succession and we um we found that uh the packages were malicious. There was an interesting twist to it. The command and control C2 mechanism uh that was designed for these malicious packages relied on Ethereum smart contracts. Um and that allowed the thread actor to have this like decentralized nature of their C2 mechanism and also ability to change their payload. um it added resiliency to their C2
mechanism and we as security professionals we we we know how to block suspicious IPs and domains that's uh that's something that we're well equipped to do but with uh this decentralized uh C2 it was quite quite difficult to um to to see that there that there is a AC2 two mechanism that that's included. Um, also this uh this malicious campaign, it targeted multiple operating systems uh Windows, Linux and Mac OS. And as we were looking into into this malicious campaign, we were able to find the malicious actor. We found a threat actor who was behind this campaign. They uh they did a number of things. uh going back to that bluff bottom line up front
on different attacking methodologies and techniques they did type of squatting they relied on post install scripts and those run automatically after packages installed in the npm ecosystem especially that's the case um they also used automation for a number of things they use automation for obfiscating their packages and they had many packages they also used uh automation to masspublish packages to npm ecosystem. Uh when we found there were already dozens of packages coming in and then um about 280 of them um within a very short time period. The thread actor that we found uh they uh the reason why we found them is because they were advertising their techniques on the underground uh dark web forum. Um they created videos on how
they did that. They wrote some articles. They made some posts. Um and um they they were very excited um uh on what they achieved with infiltrating the npm ecosystem. uh they provided this screenshot uh on the left here you can see it's a screenshot from uh chat GPT um they instead of you know doing something on their own they were recommending you know just create typo variants of most popular packages using chat GBT you you provide it with express one of the um widely used npm packages and then just give me give me typos squatted versions of it. And so they created a long list of uh uh typo variants like this and then they registered um some uh
packages with uh with those names that were not yet taken. And you can see on the right hand side some legitimate packages most of them related to crypto theme um and then type of quoted versions right just like you can you can see there's a a letter missing or added here and there um and u this may seem like a not not the highest sophistication but it works it works uh in npm it works in other ecosystems it it just works works a lot. Typo squatting is such a ubiquitous problem. We also found that u authors get typos squatted. Um that was just a hypothesis for us. Uh so I created a script that would pull
um prolific uh maintainers on npm and their names and then creating typos quoted versions of their names and there were some hits um that we found from that exercise and the one that you're looking at was most interesting. So, uh, prolific npm maintainer, Cinderos. Um, the legitimate one is to the left and the malicious uh to the right. Um, will the real Cinder source please stand up? The the problem that we're going to have here is that um they the impersonator they included an info stealer in the package that they published. And you can see that the thread actor used u very similar profile design the same same picture and even the package that they published that
they called chalk node um is typos quoting a very popular package called chalk for terminal coloring. And you can see the legitimate package chalk to the left. Um and you can see the um the file expo explorer with a directory and files beneath. And then to the right you can see chuck node malicious package with a few more files there. And uh one of the files specifically index.esm.js. It had info stealer functionality um accessing files on victim system and exfiltrating them uh sensitive information. So pretty uh classic info stealer uh functionality. Here's another campaign that we uh identified and track and um it's also on npm ecosystem. Uh this is an example of multi-stage payload that I wanted to show you. Uh we
constantly uh identify North Korean attackers infiltrating npm ecosystem. Um they are doing this as part of their contagious interview operation. uh they're trying to infect developers uh under pretense of uh testing, debugging or interviewing candidates or developers for uh various job vacancies. And here the multi-stage payload consisted of a number of different payloads. Um there was aation. The thread actors also used automation for aiscation in this case. And here's the first payload uh which was info stealer and and malware uh loader. And uh what it did is it was stealing sensitive information from browser storage some popular browsers like Chrome, Brave, Firefox. It was also stealing uh Solana uh wallet credentials. And uh I should say that we
see um theft of cryp crypto assets as a common thread. There is a lot of targeting of uh crypto uh assets that we find in in malicious campaigns in various ecosystems not just npm. Then it also had logic to steal Mac OS uh keychain credentials. And for the second stage, it would download the invisible farad back door. So it would start from doing all this bad stuff as an info stealer on the infected system, but then it would just add more bad stuff. This additional backdoor and the back door would be installed for persistence and for additional malicious activities. Uh here you can see the uh the URL in the deafiscated code and it would rename
itself and unarchive itself and do a few things to also make detections more complex like that renaming. Um and it it was offiscated. You're looking at the obfiscated code and code with some comments here. As for obfiscation, the bad guy uh the bad guys used a very popular uh JavaScript obfiscator tool. Uh on the left hand side you can see aiscated uh it's a long line of aiscated JavaScript. That's what it is. It's just one one very long line. Um and you know there's nothing good about this line. you're looking at it like this this line is not going to do anything good in my system. And then on the right hand side you can see a deopiscated code with some
some of the um some of the code that you've seen from previous slides on on the same from the same malicious campaign. Um the bad guys would use it one off. They can use it as part of automated offiscation. We also uh we also know that there are uh multiple free deopiscators for this very afloiscator and some of them are online. You're looking at one right now. Um so it's um it seems seems like seems like an easy thing to do for for the bad guys, but it works. It works to bypass uh more basic uh signature based uh scanners and um obfuscation is a common theme that we see in in malicious campaigns. Also this uh uh this example is from npm
ecosystem and I wanted to show this to you today because uh it relies on legitimate services and in this case the bad guy uh used Gmail for Xfiltration. Uh the target was Solana private keys uh to drain uh user wallets. Um you can see some uh search results for the two packages. Uh on the top you can see async mutx is a legitimate package and on the bottom you see async mutx that is a malicious package from this campaign that used gmail for xfiltration uh purposes and um you can see number of downloads which sometimes can be helpful and I'll explain why sometimes um here we can see that the legitimate package async mutx on top it has 1
million weekly downloads and total 93 million downloads and then on the bottom we see malicious async mutx that at the time when I made the screenshot uh there were zero zero weekly downloads but overall only 240 downloads. So downloads can tell us uh if the package is in heavy use or not but it this won't always tell us whether a package is legitimate or not. The reason is because uh download counts can be inflated in npm. We we've seen those uh cases. Um but this this is one of the one of the warning signals for for us as threat hunters as developers um to understand you know what's the history of this package uh what's the versioning
of this package why the sudden spike can it be explained why all of a sudden there are many downloads for this package um we we know that the bad guys can inflate those numbers so they should be taken with a grain of salt. Then we tried to well we we just Googled we googled malicious async mutx package and we got uh we got results from uh Google AI overview and the results were not helpful not helpful at all. Um it it just gave the stamp of approval that this package is safe that we can install it. um but the package is malicious and shouldn't be installed. The problem with uh AI generated results like this is because
LLMs sometimes tend to be overconfident and sometimes tend to be overeager to help us with what we're asking. And here Google's AI overview is trying to be helpful pulling information um about this package from readme of a legitimate package, but we're asking about this malicious one and we get a result on how to install it and information uh from the legitimate one. So not not helpful at all. Uh and this goes back to the beginning of of my presentation on AI and and the use of AI. Um and we will come to this again. So back to this Gmail exfiltration. Um so the way the bad guy did this is they provided they hardcoded um Gmail accounts and Gmail um addresses
uh for exfiltration mechanism but they also hardcoded the password. That was that was really strange. You just don't see this a lot in libraries. Like why would someone hardcode their email address and their password? Well, yeah, for this reason to uh exfiltrate sensitive information from infected systems. In in a similar attack, the the thread actors went one step further and uh they in addition to providing their uh Gmail addresses and hard-coded uh password um they they were also um draining 98% of of the funds. they would leave 2% likely for uh transaction fees, but they would um they would use Gmail for Xfiltration and that they would also transfer funds to a provided Solana wallet
address. This example I wanted to show you because this is yet from another ecosystem and this is from the Java ecosystem and the Maven central repository. So, uh, the bad guy here, uh, they cloned a, um, widely used, uh, package called XC for Java. Um, after cloning that package, they included a backdoor code in it. They also used AI and automation to generate malicious code and their malicious script. Um, and they also did a few things that that we see a lot. first publishing benign version and then in a later iteration later release including malicious functionality. Uh here's the uh here's the code. All the code that you've seen in this presentation before, all the
comments, those are mine. But these are not mine. These are thread actors. Um and um I believe not not even uh their comments. It's more likely AI generated code, AI generated obiscation that um the thread actor used uh to help them generate this malicious code. obfuscated basic array um by obuscation but again works for some uh to bypass some basic scanners um and they um yeah of course there's 1337 for LEAT uh well almost too many digits I think but you know it has to be bad if it's LEAT 1337 I guess and very helpful comments very AI like suggestions to this to this code. Another example from uh Python um um package index or
Pippi. And here the thread actor type quoted um widely used browser uh cookie uh three package. They just added a letter S. Um, and what's interesting about this package is they included an unexplained binary called client.exe. And so we decompiled their their code and came back to the original Python code and found some um some modules that it it would use and it would screenshot victim uh system uh uh victim screen. It would uh steal passwords. It would access user camera and then it would exfiltrate sensitive data to a Discord web hook. We see the bad guys are using um things that we can also use legitimately, right? Like uh Sentry instance um used for error
tracking but will be used for Xfiltration. Uh Discord web hook would be used for Xfiltration in this case. So um this is or Gmail in in that other example because perhaps well SMTP traffic may be totally normal for for that environment. And uh um this at last this is uh an example from uh Ruby gems and um once again about thread actors using legitimate services and tools. We find that thread actors will use outofband application security testing or o um something that we use for security research and for testing purposes. Thread actors will use for things like exfiltration. Um or as in case with this malicious campaign, they use it as initial reconnaissance and as an initial recon
tool. uh they use it to profile victims. They um they got information about the system, about the user. They would exfiltrate it to oast related domain oifi.com um to later position their attack better. Um which could be you know for pivot to understand privilege level to understand the environment um that uh that installed this package. So, we're circling back to where we started. Um, I gave you some examples of threat hunting opportunities, but uh things to look out for and we do in our work, which is, uh, scripts that will run automatically like post install scripts and npm. Uh, does it always mean that we're dealing with a malicious package? No, it doesn't. But it may be a
signal for us. And the more signals, the marrier. Um, uh, hard-coded network calls, some unexplained stuff going to endpoints that that shouldn't be going, um, should raise some alarm bells. Dropping executables as not unexplained, not uh not described in any way, something that later can infect the system. We see those packages too. And for defenses, um the list can be very very long. I didn't want to overwhelm you. Um but certain things that we need to know is that we are already at that point in time when developing our applications and relying on external libraries which is a good thing. um open source that's why open source is such a wonderful thing and that's why we want
to protect it is because we can we can build and share and build upon on on on work that has been done uh and include certain features. We don't have to re reinvent the bicycle. Um but there is no way that we can vet all of those dependencies and libraries manually. There's just too many of them. Um, and there's there's a lot that may be happening inside that code. So, we need to be able to scan the code. We need to be able to vet it. We need to be able to approve it. We should have an allow list for uh for our critical um infrastructure. We should have an allow list for production apps. Um we should
be able to restrict packages that deemed malicious and with that um I would like to thank you for your interest and for attending today's session since um I think this session is being recorded. I also want to thank my team at socket. Uh I want to thank Kush Olivia. I want to thank Sarah especially Philip and Feras. But most importantly, uh, thanks to you guys for coming to this talk today. I appreciate it. Thanks, Guriel. Uh, a couple of questions for you. So, first one, since malicious packages don't have a CVE standardization, how are you unifying your malicious package research with the broader community? Yeah, we um, that's a really good question. So we and I should have said that uh all
the packages that that I presented to you today uh all of the malicious packages we we report them to to the respective uh package registries because we believe that know after we analyze them and as we u as we pub publish our findings we should be able to also work with package registries so they can analyze malicious packages too and they can block them and stop them And we we believe that collectively we can stop the compromise. So we're sharing with the community also. Um you can see the way we standardize um hunting for malicious behavior. It's uh it's open information. It's on our website. We have more than 70 indicators. how we uh verdict and
determine that there are certain malicious indicators or signals um in in packages. And we have also a number of free tools that any developer can use and it's very easy to install just just a few clicks and um the information is is there. We're pretty open about it. Another question for you. How prevalent are these malicious threats? Why don't package managers do more to prevent these malicious campaigns? Excellent question. Um I think that this is this is from one hand you want to encourage innovation and keep open source open. So we can um we can take advantage of all those wonderful tools that different developers uh build and package registries. My understanding is are trying to encourage this
innovation and and open-source nature of of this work. Um at the same time those are massive data sets like we were talking about at the beginning of the presentation. we have like millions of of packages in in a given ecosystem and billions of downloads and then there are updates and new versions. This is not a trivial task. Um and um we we know that uh security teams at various ecosystems they're doing their best. It may not be good enough for us. Um we we certainly need need to trust code that we allow into our systems. So um they're they're doing some work. It's not good enough. We're trying better from from our perspective. Um yeah, why doesn't npm do more to address
abuse of their platform? Fake download stats, ability to claim namespaces without validation, trusting metadata from package.json, JSON for author and collaborators. These are all things that npm could fix. I agree with you. The answer would be I don't know why. Um somebody else said they do. It's just slow. Okay. Uh perhaps that that's a good response. I should take it for uh for when I have a question like this. Um yeah, I don't know why I'm not with uh Node.js JS and I'm not from npm security team but I agree with with u your sentiment is dependency confusion still a concern yes yes it is uh we found that uh blackbuster ransomware gang was actively
testing this theory of uh dependency confusion is um finding uh packages package names for that companies use internally and then publishing something to um public registry in hopes that at some point uh the tooling of that company will go and fetch that package from um uh from the ecosystem and there will be uh bad code in it the malicious stuff. Yeah, still still a thing. Final one for you, Curiel. Um how well does the takeown process for known malicious packages go? We have very good um we have very good feedback from Pippi and their security team. Um whenever we publish uh our research prior to the publication when we when we do our research and discovery
we send take down requests to Pippi and they they respond very very promptly very professionally. The same goes to go module mirror. As for npm, they take time. Probably uh that's to the point of um one one of the people who asked the question, they do it, but they do it slow. Um we see those packages get um um suspended, but it's not as fast as we would like to see it. Awesome. That's it. Thank you so much, Kirio. Thank you.