
Let's all welcome Paul Mardi with uh Pandemir. How the Chinese CCP manipulates NPM? Thank you. >> Uh is it still morning? No, it's not. Good afternoon, CRA. How's it going? >> We ready to rock and roll? >> Yeah. Right on. I like this. Um just a boring little subject here today. Um how to how to manipulate the um uh the world's largest software registry. Um my name is Paul McCartney. Many of you probably know me from Secure Stack, my startup, um that I started here in 2017. Um right now I'm the head of research for safety, which is a Canadian software supply chain company. I'm always, you know, working on little things on the side too as well. And so I got some side
projects going on, do some advisory. Um but I've been in IT for a long, long time. 32 years this year. Um I know the baby I've used this joke so many times. Glenn's heard this joke like 14,000 times, but uh I know I don't look it, but I've been in the IT industry for a long, long time. I've worked for some really large organizations, but I've also worked for a lot of small startups, and that's good because that gives you kind of flavor across the whole breadth of the uh industry. So, this is going to be a very compact 20 minutes. This is like a 50inute uh talk that I've cut down by like more than
half to deliver here in 20 minutes. I want to leave lots of time at the end to have spirited conversations about what I'm going to talk about today. If I say something and you don't understand it, hold up your your hand and stop me because I don't want you to like hear something that you don't understand. I'm going to try not to use some acronyms, but sometimes I do. Um, yeah, that's that's where the end of uh my whatever qualifications end. Who's heard of MPM? Who's heard of MPM in the last month thinking what a [ __ ] show this is? There we go. I wish I had like um I wish I could take a picture then. That's that's
the best part right there. npm's been having a tough month, right? Um, but here's the thing is I've been bitching about npm for years talking about what a problem it is and I just feel very I mean it sucks for the for all of us and you know what's going on but the reality is I feel a bit vindicated in my being the squeaky wheel. I am American in Australia so I have to be careful about coming across as like you know too much of the guy complaining, right? I've been doing some math lately. 134 new packages are deployed to npm every single day. 134,000 uh updates happen to the registry every day. The vast majority of those 134,000
are in fact new versions of existing packages, right? But the small percentage would be deletions and other kind of weird things as well. This registry dwarfs all other registries, right? Like it's bigger than the Maven by several factors. It's bigger than Pi Pi by several factors. The only thing they would even come close is maybe DockerHub and that's kind of a different thing entirely. Um, so it's really busy and there's eight global mirrors, right? Um, those eight global mirrors, where do we think they are or some percent who wants to guess what percentage I'm going off script already. Who wants to guess what percentage of the global npm mirrors are in China? >> Anybody? >> 25.
>> How much? >> 25. >> 25. All right, let's keep that there, Duke. We'll keep that there. 25. All right, here we go. The first one here, I'm going to do my Chris Rock where I walk back and forth on stage and drop funny things. The first one is the registry. This is the main uh core registry. Who owns npm? GitHub. Who owns GitHub? Microsoft. Um, this uh core registry is actually what feeds out to all the other npm mirrors, right? So, this is the this is the original. Um, and that original stores it historically stored its metadata in something called Couch DB. Who's worked with Couch before? Yeah, it's a [ __ ] show, right?
So, the guy that wrote this, so here's the weird thing about npm. npm is three things all at once. It's a local client that you run as a package manager on your machine, right? It is also a registry that stores millions of packages. It is also a SAS platform that you log into and you can create role-based access and all kinds of other stuff, right? Well, the guy that originally wrote npm, Isaac Slutter, used CouchDB cuz, you know, [ __ ] it, why not? It's free. I might as well use it, right? Um, and unfortunately, we've been stuck with it for 15 years since then. until February of this year, they just replaced couch with Postgress, but
because they can't break the the ecosystem, they put a couch uh compatibility layer over the top of it. Um, ironically, even though that's owned by Microsoft, where do they store their their tarballs? >> S3. That's right. There's another talk I gave at um Melbourne Bides last year, the year before that, talking about some funny stuff I did in there. This one right here, there's nothing really special about it. It's just a small commercial mirror based out of the UK. Refinery.dev. They just basically do DevOps services. The next one is the newest of the big dogs. This one's in Russia, hence the RU. I haven't done anything with this one. I think there's probably some cool things that we can
look at in terms of this one, but we're not going to talk about it today. Cool. We're going to talk about You ready for it? You said Duke, you said 25%. Five of the eight global mirrors are in China. Um, five. That is, just in case you're not good at math, that's 62.5%. Right. Um, good guess, mate. Um, this the uh the reality too is I've been So, each one of these each one of these mirrors is its own thing, right? It's its own tech stack. It's its own they can choose to to use couch, they can choose to use something else. Each one of them are their own thing and the reality is that each one of these kind
of you can ask it about its statistics and it will tell you and I was looking at um npm mirror no I was looking at um CNPMJS the other day and um it was reporting that it had 300,000 events in the day. So we have a disparity in the in the number of updates that are happening inside of these. I haven't really looked into that yet, but that's there's more to do. Or if you guys want to help me in my journey, um I'm open to it. Why do we give a [ __ ] Well, we care because one, npm is the world's largest software registry, but it's also the most insecure. And the last month has
kind of proved that to us again and again and again. I'm not going to go through all the reasons it's insecure because I've done other talks about that. I just did a talk in Defcon where I talk about this. Um but some of the things I just want to highlight, one is that npm still allows install scripts. So pre and post install script. So anytime that you install an npm package, right before it installs the content, it can do something and right after it installs the content, it can do something, right? And that's just perfect for attackers. It's rce by design. Next, coffee. Next, the ease of access. You can sign up to npm without doing anything really.
There's no real validation, right, except for an email address. And you can be publishing packages in like 2 minutes. I've tested it. I know. Um the reality is that ease of access makes it really uh attractive to bad guys. And here's the really crappy part. Um is that basically JavaScript payloads, they basically bypass most of the security tools that we use, both the infosc ones, edr and antiirus and stuff like that, but also the the appsc tools, SAS and SCA. They're not built, you know, to identify malicious packages. Um so I talk about this a lot in my talk at Defcon. If you want to, there's a link. All the slides from my Defcon talk are
there. You can hit the QR code as well. But um why is npm Oh, wait. Did you Oh, sorry. Sorry. This is where you raise your hand AND YOU SAY, "PAUL, GO BACKWARDS." SORRY, my bad. Five, four, three. Even for the slow picture takers, you should be good. All right, cool. Um, Sonotype did a study last month. I should have asked this too as well. 98 they found that 98.5 of all the malicious packages that they've seen all are hosted or served by npm. Pi is less than 1%. Now pi is gaining but it's not going to be it's not going to be in the same category ever. NPM is in its own world of maliciousness, right? So
98.5% of the malicious packages that Sonicite looked at were in uh npm. Now you you cross that with or you add to that the fact that JavaScript this is actually data from GitHub. JavaScript what what GitHub did here in the study is they said they wanted to track how many dependencies things had and they found that the average JavaScript project on GitHub had 10 first level direct dependencies and 683 indirect. It turns out the number now I just heard the number now it's over a thousand right? I heard um Frost from Saka talking about it's over a thousand now. I'm not surprised. So if you take the fact that this language is more promiscuous and uses more dependencies
than any other language and you count you cross that with the fact that like most of the mushless packages are there already it just increases the attack surface exponentially. Now we want to talk about today why and how npm packages are removed from the registry because this is what I I basically found out. So basically there's a several ways that this can happen. How are we doing for time here? Okay. Um, a security researcher like me can identify a malicious package and then I can disclose it, right? Which is what I do. A lot of other people at other companies don't do this because they're just in it for the marketing, but whatever. Um, if I disclose it to
OSV, um, I'm not going to take any prisoners today. Um, if I disclose it to OSV, typically historically, it would take 3 to four days for that to kind of ripple out to an npm deletion. So OSV, if anybody's not familiar with it, go to osv.dev. It's an amazing website. It's a collaboration between Google and the open source uh open SSF foundation which is part of the Linux Foundation. Um they have what's right now the world's best malicious package uh database. I'm I'm the lead but anyhow uh flex. Um so that typically took three to four days to do that right but historically if I disclosed directly I [ __ ] you not if I
disclosed directly to npm it would take weeks. Famously I had one express.exp EXP, which I did a whole blog post about and I got a lot of people from GitHub reaching out to me about this. It took five and a half months. I told them in Jan, sorry, in December of 2024 that this thing was malicious and they didn't remove it until May. That's five and a half months where a malicious payload can be pwning people, right? All right. So, that's if you disclose it. Another way that you can get this kind of into the system is OSV has an automatic scanner that can identify things and again that typically gets into GitHub and then removed from npm within 3 to 4
days. That all has changed recently because of what's been happening over the last month. Now time to from when I disclose to when something's removed from the npm is is about two hours, three hours, right? One of them recently took a day, but most of the ones I've done in the last couple weeks, uh, week and a half, I guess, are like in the in the hours. So that's amazing. That's one good thing to come out of this. Uh, beyond that, um, the last way I want to talk about is npm's own internal team. They have a team that identifies malicious packages and then what they do is they remove it from npm but they
don't really do anything except they make this really terrible GHSA. GHSA is GitHub Security Advisory. It's another database. It's owned by GitHub. It's [ __ ] Um but uh the they they will remove it from GHSA for from npm and then and uh they don't talk about it. They do it really quietly. And and that last scenario is the one that I really really was interested in because what I wanted I wanted to find out can I get access to packages that npm is removed that maybe not many people have seen, right? Um and that's where we're going with today's conversation if you didn't pick up what I was putting down there. Um this is what it looks like when a
package when when um npm identifies that every single version of a package is malicious, which is what most of them 90 plus% of them are like that. What they do is they remove every single malicious version and they replace it with one version which ends in this 0.0.1- security, right? So that's a placeholder. They call that a placeholder. They do that for a number of reasons. The main reason, which is kind of weird, is they do that so that nobody can ever use that name again. So they basically don't want to let somebody use that name again, right? And there's a reason why, but you have to understand the ecosystem a little bit to understand why. Um, alternatively, um,
another way for you to remove, uh, something from npm is for an npm author to remove it themselves. And this is something that's very unique like to the last maybe three or four months. I used to see it every once in a while, but it's increased dramatically. What'll happen is a researcher like me will identify that a package is malicious. We will start the process, typically go to OSV and create an advisory there, and you know what the bad guy does? They remove it themselves. Why do they remove it themselves? Because it gives them this unintended benefit which is it breaks npm's automation, right? It usually doesn't even go all the way to the GHSA advisory
and it definitely allows them to use that name again. So this if they remove it, that name like express.exp, that's still usable in the future. Does that make sense? All right, cool. Um, so bad guys now are removing their own packages. Um, and the other reason they do that is because they don't want a lot of people looking at it and figuring out what the the payload does. This has increased dramatically over the last 2 weeks. Um, this is what happens when somebody removes their own package. Basically, you just get a 404 when you go to website. However, if you hit the API, you get this one. Uh, yeah, is not in this registry. Oh, unpublished.
That's what I'm looking for. You get an unpublished. I just noticed that I figured out something else there too as well. there's a different message if they made it private. Yeah. Um sorry sharing. Um so those are the two those are the ways that you can get um stuff removed from npm. Now what does China have anything to do with this? Well there's my there's my Gemini one shot. One shot. Uh son. All right. So has anybody heard about this thing where the Chinese government has created this rule that says if you find a vulnerability in China, you cannot disclose it. You have to disclose it to them first. We've all heard of this. Yeah. Or many of us have heard of this.
This is called the RMSV or the regulations on the management of network product security and vulnerabilities. I don't know that much about it, but the uh effectively it controls the the you know what you can do about vulnerability data, right? Um and disclosures. The Atlantic Council wrote this amazing white paper about this called Slight of Hand: How China Weaponizes Software Vulnerabilities. And in that they do a I highly recommend you go and read this. It's open. You don't have to sign in. There's no payw wall, no nothing. Um, and I've been talking to one of the um the authors of this, Dakota u from Sentinel 1, and uh he's already seen a bunch of my data. Um,
you know, we're assuming that what I'm about to show you has something to do with this, but we have no proof of that. I just want to say this right now. I have no proof of what of the fact that the the Chinese CCP has done this directly. But basically the reason this is important, this is the Chinese vulnerability database. So this is basically like if you were to go to NVD and go through a uh a SAS platform. These are my English translations of these. This finding classification down here, if you click on that drop down, this allows you to do file hashes in IP addresses, right? And URLs and packages and a couple other things. So, what that
tells me right now is that they have the capacity for someone to disclose to them these malicious components. Does that make sense? Including packages. So, they give a [ __ ] right? Um, by the way, you have to be a Chinese, you have to be a a citizen and you have to have a phone number to to be able to get access to this. So, somehow um the guys that wrote that article got access to that. Um but unlike unlike other real vulnerability databases, the whole purpose of a vulnerability database is to take that data in and share it with us, right? So we can protect ourselves. Well, guess what? That doesn't do it, right? They don't
actually share the vulnerabilities they find. So in calling it a vulner vulnerability database is kind of [ __ ] Really, what it is, it's just like a collection mechanism to then funnel it. And that's exactly what they talk about in this article. They basically just funnel this stuff to the different departments inside of the Chinese government and they use it uh in some parts for uh offensive security work. Here's the TLDDR. I figured out that Chinese MPM mirrors have been modified to store deleted malicious packages. Why is that a big deal? Well, I'm going to go into some detail here in my remaining few minutes. Um, so basically there's several uh of the five Chinese mirrors, three of them
consistently will serve you up packages that are malicious and and have already been deleted from the npm global mirroring system. One of those is the 10-centent server. Now 10-centent, like I said earlier, each one of these is it own text stack. This happens to be using something called Verdakio, which is a npm mechanism. It's not a couch implementation. It's hosted in one of Alibaba's like 18,000 CDNs. This CDN is called the Alibaba cloud CDN. Um, what I found was that when npm marks a package as malicious, this mirror will often continue to serve the malicious versions of the packages. Why do we care about that? Well, so this is what happens when somebody uh when npm marks a package as malicious.
Remember how they replace it with that that placeholder, right? So that's what the main registry is saying. So this is a by the way this is a Python script that I wrote. Basically just goes and checks uh the availability of a particular package in each one of those eight global um uh mirrors. So that's a correct response. The next two are two Chinese uh npm mirrors and for some reason they say they don't know anything about that package at all. They do not have the placeholder package. Okay, that's weird but we'll circle back. Then the rest of the global npm mirrors they answer the right way and then 10 cent gives me the malicious version
right so I can then go and download that version and do stuff with it. Now, similarly, but different, npmir.com and then CNPM. In a second here, I'm going to show you these two. One of them, this one, um, npm mirror is running in Alibaba's, uh, global traffic manager CDN. It's a full implementation of Apache Couch, unlike, um, Tencent. Um, it's got a W in front of it. It's got some other stuff going on. I found that this consistently does not respect the package deletion. And so basically the way this distributed database is supposed to work is that when the core registry sends a deletion event to all the mirrors, they're supposed to follow that event. Well, in this case, only
these two do that. Now the 10-centent one doesn't do this. Only this one and this one do this. So only two of the five uh uh still retain the deleted uh package. And you can see here that this one, the CNPN, that's a totally different uh implementation, different text stack, yada yada, doing the same thing. So let's see what that looks like. So first I'm going to ask for a package that the author has has deleted themselves, right? So this is what I'm doing. I'm asking for a package. I know the author has been has deleted this. So first the first response from the registry, the main registry is yeah, I don't I have no idea about that. Never
heard of that package. Then these two go, oh yeah, here you go. So, they're basically serving up packages that have been deleted from the npm registry. Um, and it turns out when I was doing research for this, Truffle Security, I identified this back in 2022. But from their perspective, they were looking at this from like how do I get secrets and credentials out of this? So basically what they did is that if they identified that somebody had put credentials inside of an npm package and then gone and deleted that version of it, which is very common. You're like, "Oh [ __ ] I put AWS credentials in here. I need to delete it." Right? What they found is
they can go to these Chinese servers and they can ask for the versions of those packages that have been deleted everywhere else and they can get the secrets. And that's how they were using them. Well, guess what? I don't give a [ __ ] about that because Lucas has got that covered. I give a [ __ ] about it from a perspective. How can I get access to deleted npm packages? Sorry, malicious deleted npm packages. And there's just all that shows you is that you can successfully download these things. This is the one time it worked. Usually I have a bunch of pictures in the the I'm going to share with you guys a GitHub repo. Typically, this takes I
don't know somewhere between three and 20 times to make work because you're basically trying to wiggle your way to like one machine behind a load balancer group. In this case, it was a oneshot. Why would I want to do this? I've been using this to collect npm mailware for a year and a half now. Um, and what's great about that is that that npm mailware a lot of times roughly 25% of it like basically was deleted by npm. So what that means is that only the npm researchers have seen that and the author of course obviously the author has seen it and what that means is almost nobody else has understood it has hasn't analyzed it. So I got a lot of
this deleted stuff and I analyzed it and what am I looking for? I'm looking for IoC's. I'm looking for IP addresses. I'm looking for C2. I'm looking for all these things that other threat intel teams haven't found. And I built a bunch of rules to identify and detect those same things in other packages. So basically, by using the access and the downloads that I pulled of these of these packages that other people hadn't seen, I was pulling these IOC's out. And this is why I've consistently been able to report on packages that other people haven't been able to report on, right? is because I'm looking at stuff um that they weren't looking at until this week.
I I have I have dire news to report. Um then what I did is I took all those IOC's from all these things that nobody else had seen or almost nobody else had seen and I built a CTI portal and I allowed and I I basically built a relational uh database and what I did is that allowed me to track threat actors, npm authors, GitHub accounts connected to npm packages and this allowed me to to to threat hunt across these large groups and this allowed me to connect when socket was finding like one or two packages I was taking that same group and I was finding 40 or 50 packages. Why? Because I was using author and I
was using these contributors and all these other things that were in my database to connect these thread actors. And let me tell you, a lot of their ops is bad. So if you just look, you're going to find stuff. And finally, you ready for this? I sold it. I made some loop. I bought a car. I had taken my wife and kids on a ski holiday. Um, my Chris Rock is coming out. Um uh so I go into a lot more detail about my the way that I'm working with CTI threat intel coming out of software supply chain attacks. Um I talked about it in Berlin at the first CTI conference. That's a QR code to that. There's a link
directly to uh that's actually the that's a video. How am I doing for time? Yeah, sweet. That's a video. Um I went into a lot of detail about how you can capture IoC's um from these software. And this is just to be clear, the reason this is awesome is because for the first time you can collect things out of software type attacks the infosc teams can consume natively. They can bring into their seams, they can bring into their logging solutions, right? These are things that you know how to do things with IP addresses, domains, file hashes and what have you. And on that note, this is just some of the things that I I found. This is from the
express.exp. I found domain names tied to this. I use those domain names and other, you know, threat intel platforms to connect this to a larger group. Um, uh, and this was obviously a Russian criminal group, not a state actor, but basically all those other packages, the file hashes, I used each one of those things to pivot and find in some cases other parts of the the overall attack uh, surface. And this is the uh, repo. My um, my Python script is in there. Now, I do want to report that unfortunately it appears like this is mostly not working. now it's been up and down and up and down for like the last month or so I would report to my friends
I'd be like a [ __ ] it's not working and then it would start working later on that day I'd be like oh it's back but the reality is that npm and GitHub are actively addressing this and so um I think the Chinese mirrors are part of that overall piece of work um there's my contact details feel free to reach out to me and would love to take some questions in the remaining two minutes good cool sick Thank you.
>> to try and get downloads from traing through. >> No, my suspicion and all this is all it is, Glenn. My suspicion is that they configure because you manually have to go and adjust those. They don't do that by default, right? you have to go and tell them don't delete these things cuz they're built to the both Verdakio and the couch DB are built to take those those signals in. My assumption and talking to Dakota and and some of the the other people that I've talked to, my suspicion is this is the cloud provers way of kind of proactively saying, "Hey, here's some malicious stuff the Chinese government might give a [ __ ] about.
Let's just not delete it in case they want it." And who knows, it might be more formal than that, right? It might not be them proactively doing it. Might be. I'm never going to know. All I can tell you is that the Chinese mirrors did it consistently for years when none of the other ones did it. None. And I've hit these APIs millions, literally millions of times. So like none of the other global mirrors did it and only the Chinese mirrors did it. Is that a good answer? >> It's the best one I got. >> It's either [ __ ] or malicious. >> Yeah. It's it's it's either it's either them being proactively [ __ ] bad or just
overtly malicious. Yeah. What else? Yeah mate. >> Thank you for thank you for amazing presentation. Um um I've been uh observing the uh Chinese internet um a lot and I found the uh malicious activities are so systematic and it's uh pushed by the government um in a very aggressive way and used for nefarious purpose. Uh it's very different even from the western uh offensive cyber operations. Yeah. >> Um and it's uh the scales is also massive. Um it's implemented in a very irresponsible way. So um I wonder why would the um we keep the uh supply chain hygiene keep the development community uh hygiene from any uh Chinese actors? >> Why why would why shouldn't we hide our
development teams from Chinese actors? Is >> uh keep maintain a hygiene from >> Oh hygiene. >> Yes. Oh, so you're asking why don't developers do a better job of >> uh basically excluding >> I mean I've been asking that question for 25 years homie and I don't have an answer. >> I mean it's complicated like >> we're out. >> Are we out of time? >> We're Yeah, we're out of time but I bet Paul's more than willing to >> Yeah. I'm going to go outside. You can ask me any questions you want to. >> Cool. >> Thank you very much. >> Thank you guys. I appreciate it. Talk.