BSidesSF 2026 - Who Watches the NPM Watchers? (Paul McCarty)

Name: BSidesSF 2026 - Who Watches the NPM Watchers? (Paul McCarty)
Uploaded: 2026-05-12
Duration: 37 min 42 s
Description: Who Watches the NPM Watchers? Paul McCarty This research examines who monitors the NPM ecosystem for malicious packages and security issues by using custom canary tokens, called “canary packages” strategically embedded in published packages to track and analyze security vendors' detection capabili

BSidesSF37:4246 viewsPublished 2026-05Watch on YouTube ↗

Mentioned in this talk

Service

open-source malware

Concepts

Canary Packages

About this talk

Who Watches the NPM Watchers? Paul McCarty This research examines who monitors the NPM ecosystem for malicious packages and security issues by using custom canary tokens, called “canary packages” strategically embedded in published packages to track and analyze security vendors' detection capabilities, blind spots, and scanning behaviors. https://bsidessf2026.sched.com/event/7670c0b5ce65d66720f66d89d2b64285

Show transcript [en]

So, let's start with our first talk for today. We have Paul McCarthy and he will be talking about who watches the NPM watchers. Uh this talk we have the last 10 minutes reserved for questions. So, if you have any questions, make sure that uh you post them in Slido, which you can find at besides.com/ Q&A. Yeah. Let's start with a round of applause. Uh good day, everybody. We're here to talk about who watches the NPM watchers. I'm actually particularly fond of this artwork that I did cuz I'm a big Watchmen fan, but anyhow, my name is Paul McCarthy. Um I'm the founder and creator of one of the two founders of open-source malware, which is the

world's largest database of open-source malicious bad things. Um I've had a couple of other startups. Um I've also worked for a number of startups throughout the years. I've worked for some some big organizations too as well, including Boeing and JPL and some others. Uh did some work in the 2000s for in the beltway as well. Um So, who am I? Uh who's come across my work before on LinkedIn? Anybody? Oh, sick. Oh, yeah, bro. I see like 10 hands, yo. Um So, yeah, I'm well known for being a kind of NPM security researcher. I do a lot of stuff in in GitHub too as well. Um I kind of came across this because when I was building Secure Stack many

years ago, um ran into a number of issues where uh some of our customers downloaded malicious packages and uh were compromised and there weren't really any ways to kind of protect them from these things. So, this was before OSV and and other things uh existed and kind of tracked malicious packages. Um so, I do a lot of research in that space. Like I said, I've also uh created I'm one of the two co-founders of open-source malware. And I had this research question in the back of my head. I was saying to myself, "Who is scanning these NPM packages? And who, if anyone, is watching those organizations that are scanning those NPM packages? What are they looking for? What's their

intent? So on and so forth. Now, NPM is the world's largest software registry. It holds almost 4 million uh packages. Uh 140,000 updates are made to NPM every day. 3,800 new packages are added every day. 2 years ago, that number was less than 1,000. So, in the space of a couple years, um the number of new packages has tripled. Um And that's created a kind of this crazy, you know, perfect storm of badness. Now, here's the thing is that NPM, thank you. Um I usually I usually say use another word there, but I don't know what the the tolerance level is here. So, anyhow, the NPM is owned by Microsoft, right? So, my frustration is that NPM continues

to be a [ __ ] show, and yet at the same time it's owned by, you know, one of the world's largest organizations, highly capitalized. Um and that's a problem. We need to call that out. So, like I said, there's a lot of stuff going on in NPM. Um 20 billion, yes, B, billion weekly downloads of NPM packages. Um And unfortunately, every one of those packages is a potential vector for attack. Um and there's a number of reasons for that, and I've talked about this in some of my other talks, so I'm not going to go into a lot of detail about why NPM in particular is uh particularly bad. But just to just to

highlight a couple things right here, right now. One of the things that bothers me the most about NPM is that much of the metadata surrounding NPM packages can be faked. It basically just accepts whatever you tell it. So, for example, if I tell it I'm Linus Torvalds, guess what? It's going to post that as Linus Torvalds. Now, luckily, NPM matches a verified email address to an NPM user. Now, that doesn't stop bad guys cuz they just create new accounts, but the point is at least that's something there, right? You can pivot on that. PyPI doesn't do that, but that's another story. The dependency chains, I think we all kind of come across this before, the

transitive nature of JavaScript. JavaScript was a language built with no batteries included. And what that means is that basically you have to import everything from other libraries. And those libraries might be 10 lines of code, they might be 100 lines of code, but the point is that by design that's how a JavaScript JavaScript was built. It was built so that there wasn't that much kind of included inside of it and you had to bring all these things in. And so, what's the what that's done is that's actually created a uh ecosystem where NPM has way more JavaScript packages have way more transitive dependencies or third-party dependencies than any other language on the face of the planet.

Um on the order of 10x above the number two. So, uh Python has, sorry, not Python, uh PHP has, I think, roughly 68 or 70 transitive dependencies per package. Uh I want to say that NPM JavaScript right now is around 680, 700 transitive uh dependencies per package. I mean, that's an order of 10x, right? Um and that creates a problem. Um and if you look at uh a really active package like debug, debug is like the fourth or fifth most downloaded package in NPM, it's got 380 million downloads itself and it's controlled by like a small group of people, right? And those small group of people have email addresses and can be fished. And that's

why we see so many of these kind of account takeover attacks inside of NPM. Now, who knows how many NPM mirrors, I want to say this is interactive. Who knows how many NPM mirrors there are globally? The big ones, not the little tiny ones, the big ones. A hundred? No way. All right. Oh, you've seen this before though, Alan, so I'm not going to I'm not going to give this to you. He's cheating. Um he saw this at Global AppSec. So, the first one the first one is the first one is here in in California. This is the main registry. Um the next one is a small one in the UK. The next one's a relatively new one in

Russia. And then guess what? The rest of them are in China. That's right. Five of the eight major NPM mirrors are in China. This will come back to haunt us. Trust me. All right. So, why did I want to know who is scanning NPM packages? I've actually done a fair bit of like red teaming for software supply chain stuff. That's kind of you know, was my thing for a while. I don't do it much anymore cuz I've got open source malware to work on, but the reality is that I was one of the few people that was doing like kind of red teaming inside of that space. And when I was doing that, I realized that

there are a lot of gaps in what people are scanning for. And I just wanted to kind of understand that better in more depth. I wanted to you know, understand what security vendors were scanning what, how they were doing it. I wanted to understand is NPM scanning NPM itself. And finally, are criminals scanning NPM? Are they learning from it? I don't know. Let's find out. So, I came up with this idea. It's called a canary package. Again, very proud of this art. I'm uh if if this whole security researcher thing doesn't work out for me, I'm going to go into graphic design. I know you guys agree with me. Um the idea of a canary package is basically a package

that has a canary token, at least a single canary token inside of it, and it's built to either automatically deploy that or allow someone to hit that canary token. Who's heard of canary tokens? Yeah, there we go. That's right. Um Think Canary's got a great platform. They offer free canary tokens. Um I reached out to Haroon and his team and they gave me access to their uh, enterprise console and so I used about 60% of the Canaries that I used in this, uh, research came from Think Canary. And by embedding those inside of NPM packages, we can detect and fingerprint a lot of what's going on. Uh, big shout out to Haroon and his team. Um, I love Canary, um, Thinkst and

I go and get my green shirt every year at RSA. It's one of the main reasons I go to the floor. Now, what kind of Canary packages was I using? Here's a subset, but these are the eight kind of most common ones that I used. The first is a JavaScript payload. And when I use a JavaScript payload, what it is, basically, it's a little bit of JavaScript that runs when the, uh, NPM package is installed or imported in some cases. And it does three things. The kind of commonly, who's got a bug bounty background? Anybody here got a bug bounty background? All right, yeah, yeah. So, we know what the three things you're allowed to do

before you get in trouble, right? One is, uh, find out what the host name is, the public IP, um, and that's about it, right? Um, and this kind of what we've agreed on so we can we can prove that you you've actually had an impact and you've actually run that package on the quote compromised source. Now, the reality is that a lot of this data comes in anyhow from user agent strings and other things. So, I didn't have to go and build a lot of like payloads. It kind of was built into a lot of what the Canaries did. But, um, the second is a binary, uh, so basically a EXE that I deployed in an NPM package. I did not

did not set it up to automatically run. I just put it as a package inside of the package. So, somebody would have to go and explicitly run that EXE. Word and Excel files, um, you know, same kind of thing. Those are not automatically set to run. Basically, they just are sitting there inside of the package. AWS credentials, um, uh, Think Canary has a great, um, set of Canaries for those. DNS, um, and local credential files. So, basically, .env files and some of these other kind of files that are very commonly actually found in real npm packages accidentally because developers accidentally do a git add dot and it shows up in the package. PDFs and web images. Now, specifically with

web images, I like to embed those in the readme. So, basically if anybody opens up the readme, it's going to hit the web image, it's going to hit the endpoint, and I'm going to get a trigger. That makes sense? Cool. So, what I do next? Well, I went and vibe coded myself up a a pew pew map, right? My I got three kids and my kids were like, "Dad, build a pew pew map." And I was like, "Okay." Except the problem is I don't have like choo choo, I don't actually have the pew pews going across the map. Um, but what you see here is a a screenshot of a live I've actually got it open here and if we

have time, I'll kind of play around with it at the end of the talk, but basically what I do is I'm capturing the kind of data coming in from these canary packages and applying them to this map so I can understand where and who these different triggers and events are coming from. Now, the other thing I want to say, I very specifically built this so that I it was only going to trigger for basically for organizations like the people you're going to see here in a second. I was not targeting random Joe Schmo at home running npm install. I I went through a lot of work to kind of make it so that people would not do

that. Unfortunately, a couple times it happened um, and uh, luckily, you know, the only thing that I got was their user agent string, but um, what's the first thing that I learned? Well first npm's doing a pretty crappy job of scanning npm itself. Is anybody surprised by that? Anybody Anybody brave enough to say they were surprised by that? All right, yeah. npm's not doing a great job of it. Um, in fact, they do have a internal uh, Microsoft service that's scanning npm packages. It finds a very very very small number of them. Um unfortunately, NPM basically um I'm going to choose my words carefully here. NPM is basically uh leaning heavily uh leaning on the

researcher community like me and a bunch of other people to find this stuff and turn it in. And we typically turn it into OSV or into GHSA, the GitHub security advisory, right? So, they're not doing a lot of the work themselves. Now, remember, this is Microsoft, right? Microsoft owns NPM, right? It's like the second depending on which week we're talking about. It's um it's the second largest company on the face of the planet, highly capitalized. And yet, what for whatever reason they've decided they're not going to spend the time to actually do that themselves. AWS like AWS actually expends a lot of energy scanning the cloud. Microsoft cloud Azure also spends a lot of time scanning

the cloud. Now, we'll talk about in a minute here the differences between those two and about what they do with that data that they find. And finally, I found that there are security companies scanning uh NPM, but uh they have they have their own agendas and um those agendas basically uh define what they look for um and gaps therein. Let's just drill into some of the cloud providers really quickly here. Um so, we've got Google. Um Google scan scans uh out of their main data center in Council Bluffs, Iowa. Um anybody from Iowa here? Yeah, right on. This guy. He's proud as. Me. Uh Detroit. Um Uh the other thing is that OSV actually um uses Google as well to do their

scanning for using their OSV. Has anybody Does anybody here uh do any submissions for OSV? Yeah, we got one guy up there. Sweet. Um OSV also uses uh uh an automated scanner that runs out of Google uh which is called the OSV scanner. Very uh very uniquely named. Microsoft scans out of a couple different IPs out of Virginia. AWS scans from multiple places across the US and occasionally from India. Alibaba scans from a whole lot of places in China. We'll talk about that in more depth here in a second. Tencent Volcano Engine. Volcano Engine is actually the cloud component of ByteDance. Right? The TikTok the company that owns TikTok. There's a lot of scanning coming out of China.

And in my earlier So this talk is like part two of a three-part talk about the kind of NPM ecosystem and what's happening in it. And then my first talk and at the end of this I've got a link to that. In my first talk I talked about why China spends so much time scanning NPM. So I like I said at the end of this there's going to be a link and I hope that you go and check it out. Let's drill in here for a second to AWS. So AWS um scans basically every new NPM package that ever comes out. And they typically do that pretty quickly in less than 5 minutes. They typically do that from

several different IPs um and scanning locations. I don't know why they do it from multiple places but they do. I don't know if it's different teams kind of you know running kind of duplicate kind of processes. But here's the great thing about AWS. There's a team at AWS that actually submits a ton of malicious package submissions to OSV. And I got to call that team out because Chai Tran and his team do a great job of doing that. So if anybody is in the audience from AWS if you ever get a chance to talk to that team, tell them that Paul shouted them out in this this talk because I really do appreciate the hard work that they go

through to do that. Next, let's talk about some security vendors. ReversingLabs ReversingLabs has been scanning NPM for a long time. But they care about one thing and only one thing. AWS credentials. They don't care about nothing else. They don't care about EXEs, they don't care about HTTP requests. Uh at least from a from a dynamic kind of active scanning perspective, they only really care about um uh AWS credentials. Something else that I noticed is that they scan So basically when I do my canary packages, um I try to keep the number of those versions down. I try to keep the imprint very, very low. But what will happen is if a package if if one of my canary packages has one

version, ReversingLabs will basically scan it almost every day. Now, here's the thing. Nothing has changed in that. The payload has not changed in that. Why do they keep scanning it? So I reached out to ReversingLabs and I said, "Hey, why do you keep scanning these? You're You're spending a lot of time and energy and compute scanning stuff that you don't have to scan." Um and they wrote back to me and they said, "We're testing various product capabilities." Which is probably true, but also probably that they've got it built and you know, it's just harder to change it and modify it and and make it more uh effective than it is to just burn extra compute. But they're

basically scanning many of the packages dozens of times without any change in the package. So it's just a lot of uh a lot of uh unnecessary compute. They do three to four scans uh daily on stuff that hasn't changed. Um but again, they're only doing this looking for AWS credentials. Kaspersky Now listen, I have a love-hate relationship with Kaspersky. Um I think that their research team is really A+ rock solid. I know uh at least one of them personally. Um also, they're in Russia and they've been sanctioned, right? The US government's no longer allowed to buy Kaspersky and there is genuine evidence to show that Kaspersky has been acting with the Russian government um in a number of ways which are all all of

them are bad. Um in including giving data about Kaspersky customers to the Russian government. Um they scan uh most packages. I actually they scan almost all packages typically within 10 or 12 minutes. Um they scan from three locations all in Russia. Um although it looks like they might have spun up a new uh Swiss uh scanning location recently. I just saw that coming in the last few days. Um and they only scan one time per release which is smart, right? They're not spending a bunch of compute on stuff they don't need to do. Kaspersky is only really interested in uh Word and Excel files. They will occasionally scan uh for uh kind of JavaScript payloads, but nothing

else. So, they really are interested in Word and Excel. Um and I've tested the Word and Excel canary packages really really well. They come back with a user agent string and you can actually discern some interesting information about the client that is requesting that package from the user agent string, right? Um like I said, they do very very targeted scanning only one hit per version. Um they uh uh Oh, yeah. Yeah, that's right. Um some of their IPs are actually in Shodan uh as and so these IPs in Shodan um and uh VirusTotal for that matter um have been associated with mass recon scanning. So, they're basically reusing the same assets for both outbound scanning of NPM as well as outbound mass

scanning of WordPress and uh Joomla and other kind of websites looking for I guess they're effectively kind of doing like mass nuclei scans or something along those lines. Um, which I thought was really really interesting that they're doing that. They're all doing that from the same IP. And because they keep using the same IP, is that means we can fingerprint those IPs, right? We can associate these user agent strings and other metadata that we find about these scans to those IPs. And that's something that I'm using in my own continuing research. And like I said, this is the second of three. There's an art an arc here to this NPM research. Um, and we'll be hearing

more about this in the third, the final installment. China. Now, China's not a security team, obviously. So, I just wanted to like there's a lot of scanning going on in China. Um, and I just wanted to kind of wrap it all up and bubble it all up as one kind of talking point. Um, and really what that comes down to is that they're scanning every single NPM package for everything all the time from like five to eight different places. They're just mass scanning it from all kinds of different places. And many of those IPs they're using have been in existence for a long time. In fact, the user agent string for one of those IPs includes,

for whatever reason, a bunch of information about the client which points out that it's a sent seven. Do we got any Linux people here that know the sent backstory? Sent died a number of years ago, right? But, sent seven came out in the in the mid 2010s, and this version clearly says in it sent seven 2018. And this kind of tracks for China, right? They use old stuff, and they don't update it, and they just reuse it. And their opsec, they're not really kind of thinking about opsec when they do that, because you're learning a lot from that. You can tie that long-lasting long-living IP to that user agent and some of the other metadata you can pull. Um

but uh they are basically I don't want to say coordinating, but it feels like coordination amongst all the major cloud providers in China and some of the universities. So, there's a lot of cloud providers in China. There's there's Ali Yun, which is Alibaba. There is Tencent. There's Huawei. There is the ByteDance uh volcano um uh the volcano engine. All of those organizations are all scanning npm all the time. And finally, Russia. Uh again, like China, I just wanted to kind of bubble this up. I don't have I'm not going to pretend that I have a lot of like uh insight into the the specific groups inside of Russia that are doing the scanning. But basically, they use a number of IPs

coming from ASNs inside of Russia. Um uh typically always in Moscow, occasionally from Tula. Um those scans all originate from an ISP called VimpelCom. Um and VimpelCom originally was a Dutch company that um was uh bought out in 2023 and it now is entirely owned by Russian interests. All these scans come from VimpelCom. Now, they're a little bit slower than the Chinese and the Americans. They All the American cloud providers and the Chinese cloud providers, they're all hitting the stuff within 10 minutes of it coming out, right? The Russians, they're a little bit more laid-back, right? They are They're typically hitting it between 2 hours and 10 hours. Sometimes it takes a day for them to scan new npm packages. Um

Now, here's the thing is you can look at those kind of those user agent strings and a couple of the other kind of metadata bits, and you can kind of start to figure out some things. And some of it looks like this is just kind of standard ISP type stuff. So, you can imagine this in the same kind of way that you would think about why is AWS scanning npm? Why is you know, Azure scanning npm? In the same way, this ISP in Russia is also scanning NPM. But then when you look at some of the other bits and pieces you think yeah, that's that's a little bit more automated in a way that makes me

think that's coming from something else. Again, I don't have any like, you know, smoking gun here about who all those actors are, but there's a lot of scanning coming out of Russia. Now, I want to talk a second here. I'm going to go sideways a little bit into credentials embedded in NPM. Unfortunately, when people publish NPM packages, it's really easy to do NPM publish and depending on how you've set up NPM, you sometimes suck in files that you probably shouldn't or you definitely shouldn't into the NPM publish action and then it goes up to the package and then you'd have no idea. I've done separate research around this and the reality is that right now all

around the world people are using NPM and publishing stuff they shouldn't. So as an example of this, I found last year a company in the Middle East, they specialize in doing visas for Indians that want to travel to Saudi Arabia for the for the Hajj, right? This is the the yearly pilgrimage to call the Hajj. You need to apply for a special visa to do that. Well, guess what? Inside the NPM packages was a complete database in the application and the credentials for that database and even worse passport photos, a database of passport photos and passport scan lines from passports, right? Now, that's probably the worst most egregious example of this that I found but I found lots and lots of other stuff

that's almost as bad and I've got a friend in Australia. He works for Truffle Hog. Do we have any Truffle Hog people in the in the audience? No okay. So my mate in Australia, he spends a lot of time scanning NPM and GitHub for embedded credentials and he's made a whole like uh side business about hitting up the bug bounty programs and submitting those for those. But the reality is that NPM is owned by Microsoft and effectively NPM is now kind of just sat in underneath GitHub. Now GitHub has a secret scanning program, right? And in 2023, GITHUB MADE THIS BIG DEAL about the fact that they were going to then starting in 2023, they were going to

start scanning everything in NPM using the secret scanning program that's existing in in GitHub. Do you think they actually did that? Anybody want to guess? They did not do that. They did not do that. So this post here which was sent from a friend inside of GitHub, they don't even realize that they're not doing. They think because they made the announcement that they're doing it, but they're not. So the reality is that in addition to the fact that we've got all this malicious stuff in NPM, right? The people who are scanning for you know, because they want to protect themselves. There's also a lot of organizations that are scanning NPM for credentials. Right? So here's the thing is that in this

moment we have to think about maybe the intent behind some of these organizations scanning NPM isn't necessarily just to protect protect the organization from malicious packages. Yes, ma'am. Maybe there's a little bit more to it. Really quickly cuz I've only got 10 minutes here. I want to talk about static analysis versus dynamic sandboxing. The reality is most of the data that I have in the database here comes from organizations that are using dynamic sandboxing. In other words, they pull an NPM package, they run it, and it does the thing. If it's got a post install or pre-install script, it does its thing, right? Now less and less organizations are doing that because it's more effective to

statically analyze much of the NPM space. Well, basically all of packages. So, basically everybody's moving to. So, for example, you know, Ox Security and Socket and Endor and all these companies are not in this data set. Why? Cuz they're not dynamically sandboxing and pulling these packages. Or if they are doing it, they're doing that inside of a VM that is blocking internet access. So, they run it inside of a closed loop, right? And that's smart. But the problem with that is that if you're not pulling the dynamic thing, what happens if there's a second stage loader? What happens if there's a third stage loader? You never see that. You never get to see the rest of the kill chain cuz all

you've done is you've observed statically that it's pulling a package or a payload, sorry, a pulling a payload from like a URL or an IP address, right? But unless you actually go and pull that thing, you don't know what the next step is. And I'm not saying none of those organizations do that. I'm just saying a lot of them do not. All right. So, what are some of the insights that I've learned along the way? Uh most security companies are using static analysis to scan NPM packages. They are not uh dynamically running them inside of a uh sandbox. The organizations that we talked about today on the screen, they are doing that. So, basically most of

those organizations uh AWS, Azure, all the Chinese companies, they're all running them in in a sandbox. Um uh again, I wanted to call out the AWS team that is scanning NPM and submitting that stuff to OSV. Big shout out to them. And the reality is, like I said earlier, security companies have agendas. They're just missing things. So, for example, Kaspersky spends all its time looking at Word and Excel files and ReversingLabs spends all its time looking at AWS credentials. So, is that Does that mean that ReversingLabs and some of these other organizations are not looking for these other things? I can't say that for sure, but it looks like to me right now, based on the data, they're not. And that

means there's gaps. So, that means there's some key takeaways here, right? First is that the watchers can be watched to detect patterns, and that's really what the third part of this arc is going to be, which hopefully, by the way, you'll be hearing uh this uh summer at Hacker Summer Camp, if I get lucky. Um if not, it'll come after there somewhere. But, um basically, what I'm doing now is I'm using this data to understand patterns inside of these organizations to figure out what the gaps are that you can get payloads through, right? And then I use that data to help, you know, organizations protect themselves from those gaps. And those coverage gaps absolutely do

ex- uh exist. Uh a lot of vendors are using resources they don't need to. ReversingLabs, like I said, is is using a lot of extra compute that they shouldn't be. Um this whole trust but verify, here's the reality is that we need to spend a lot of more time looking at the packages that are being downloaded and see what they do, but we don't. Our development teams don't. Um and that's that's a problem. Security implications really quickly here. Uh scanners can be tracked. I fingerprinted all these organizations. I understand what they do and what they're looking for. Uh I understand that uh the adversaries uh you know, know how to evade uh detection based on the same things that I'm

seeing, right? And the fact that there's a limited coverage set for many of these organizations, if they're statically scanning the code, maybe they're not seeing the whole kill chain, and that's a problem for all of us. And finally, that resource waste. So, that brings us to the end of my talk. I've left enough time for lots of time I got I left lots of time here for questions. What you see on the screen here is QR codes for three of my recent talks that all kind of deal with this area that I've I've been talking about today. So, the first one is my talk last year at DEF CON Adversary Village. Big shout-out to the Adversary Village team. Um that

one talked about malicious packages and how they bypass existing security tools. Um that link takes you to a a YouTube video. The second one is my first talk from Berlin CTI uh last year, which talked about how you can get uh Intel threat Intel out of the software supply chain attacks. And finally, my uh the first one of this three uh three arc, which was called Panda Mirror, uh which I thought was a great name. Um uh how Chinese uh CCP manipulates NPM to hoard malware. That's from Canberra B-Sides, which by the way is the largest B-Sides in the face of planet. A little Canberra, Australia, is the largest B-Sides. It it it got 3,800

people last year. So, anyhow, that's it for me. My name is Paul McCartney. Um opensourcemalware.com. I'm also known as 6mile. So, I'd love to take any questions now if anybody's got some. All right. And we do. So, again, uh go to Slido uh to enter your questions. We got some right here. So, let's start with uh The first one asks, "So, Kaspersky scans NPM libraries for Word and Excel. How often are people using NPM to distribute these types of files?" Good question. Um if I were to speculate, and I can only speculate, right? If I were to speculate, basically Kaspersky's thinking here is that they want to capture accidental exposure like I was talking about where people

accidentally put files that they shouldn't inside of um uh NPM packages, which happens all the time. You know, my whole story about that that Visa company. So, it's only speculation, but that's what I've got. All right. Oh, few more just popped up. All right. What are your thoughts on SCA companies such as I'm not going to name drop uh that are looking that are looking package malware. I guess looking for a package malware. Yeah. Ooh, this is a spicy one. Um I want to be very careful here. I think um I think there's two types of software risk, right? The first is accidental vulnerabilities. So, basically, developers accidentally leave vulnerabilities inside of source code, and that's what SCA is is

to address, right? There's a second risk, which is the intentionally malicious. And the reality is that SCA is not built to detect the latter as much as it is the former. And yet, somehow we all kind of assume that SCA companies can cover that off. And the reality is, you know, without kind of calling anybody out, they don't. I'm sorry, they just don't. Um so, SCA typically doesn't have good malware um detection uh built into it. Um which is something that we want to change um in my company, but that's This is not a sales thing, so. Next one. All right. Are there alternatives to NPM to consider or tips to safely use NPM libraries?

Oh man, that's a whole 'nother talk, yo. Um all right, well, first and foremost, NPM is a lot of things, right? NPM is a CLI that runs on your machine. It's also a CDN. It's a registry and it's a SaaS application. All those things, that's all NPM, right? So, like, how do you protect all that [ __ ] That's That's difficult, right? The reality is that NPM has made some changes lately for in terms of trusted publishing and requiring MFA now for every push and sorry, every publish event. Those are great things and that protects us all from these kind of large debug chalk style uh high impact ATOs. Um but there are some tools out there like

PNPM for example that you can replace on your local machine that does a lot of things that are more kind of secure by default. But, guess what? It uses the NPM registry, right? So, there's only so much that the CLI can kind of protect you from the local package manager binary can protect you from. Well, it actually is not a binary, but anyhow, um you're still ultimately using the same registry. And the problem is really the registry. I mean, all the things together are all a problem, but the registry is the biggest of all those problems so. Good question. All right. Oh, another All right, what are your thoughts on proactive prevention tooling such as P

and PM sockets developer firewall? Okay, so I just talked about PMPM, so I'll leave that one alone. The firewalls, like initially I was a really big fan of the firewalls, but guess what? The firewalls all are just wrappers, right? They basically wrap around NPM, they wrap around pip. And like any wrapper, you can bypass them really easily. So now imagine you're a developer, you got one of these software firewalls on your machine and it's not giving you something that you want. You're like, "Damn it, I want this package. I want it." So what do you do? You go around this really simplistic wrapper mechanism. And that's what people are doing. So I think that the idea behind these

software firewalls is great. If you can enforce their use and you can make them reactive enough so that when somebody says, "Hey, you know, something is a false positive or whatever." that you address that. So I guess my answer is I think in theory they're a great thing and I genuinely believe that. But right now, you know, it's kind of like the pull up the bulletproof vest only has like a little protection right here and the rest of it ain't protected, so you don't want to get shot. That was a weird Detroit thing, but anyhow. 6 Mile 6 Mile, man. Sean, come on, get it right. All right. How do you distinguish AWS versus AWS as a cloud service provider versus

AWS customer scanning or similar for Chinese cloud providers versus like gray or scanning? Great great question. Oh my gosh, this is a great question. You're right. Um I kind of lumped them all together here today. Uh basically what I've done is I've done a bunch of effort in the data set to to separate out those organizations that I think are individual uh uh you know, researchers using AWS resources or using Chinese cloud resources. So that's a great question. And I I more of that's going to come in the third version of this, but the reality is in the data set I have a ton of examples of individual developers using AWS and using well, mostly just

using AWS to do scanning and I've I've excluded them from this kind of overall high level view, but you're absolutely right. People are doing that and they stand out like a sore thumb, right? Because a bunch of the metadata user agent strings stuff like that that they're all you know, they they're all things that you can track especially at scale. Finding those anomalies in the midst and then tracking them. That's that's the important thing there between distinguish distinguishing between the cloud company and the customers. Good question. I think that's it. So I'm going to pass it over to Sonia.

Sonia's going to show me the one minute thing here in a second.

Thank you so much for it. Thank you. Thank you so much for it.

Thank you so much. Thank you guys. Appreciate it.

BSidesSF 2026 - Who Watches the NPM Watchers? (Paul McCarty)

Related talks