
Today we have Luke Marshall and he's going to walk us through why he is still finding secrets in your code. So, thank you Luke. >> CHEERS.
>> Hey everyone, thanks for coming. Um, yeah, today we're going to be talking about why I'm still finding secrets in your code. And you're probably thinking it's 2025, there's no way developers, engineers are still leaking secrets. But in 2024, GitG Guardian, who are an organization that sort of specialize in finding secrets in different open source ecosystems, uh they found over 23 million exposed credentials inside GitHub repos. So 4.6% of public repositories had exposed credentials in them. It's estimated that 31% of data breaches happen because of stolen credentials. So there's a bit of impact there. And it takes around 292 days to essentially identify and remediate uh these vulnerabilities in involving these credentials. Right? The interesting thing for me here is
that the credentials they found in 2022 uh 70% of them are still active today. So not many of them being revoked and a lot of them are still, you know, active and out there in the wild. But what are secrets? So secrets in this talk we're chatting about like API credentials, uh username passwords, database credentials, things like that. Um all of the services that you see up here are secrets that I found uh as part of this research in this talk. But one thing I wanted to highlight is that not all secrets are built the same. Some are just built different, you know. Uh so AWS keys, uh database credentials, things like that, GitHub pats are all
high impact secrets. Um, but a lot of the stuff that I find isn't that high impact and it's meant to be publicly shared. But why should you really care? Um, well, it affects some pretty big organizations, right? And the thing that I found most interesting was that the bigger an organization is, the more likely they're going to be affected by this because they just don't really know what they have out there in the wild. And we'll talk a little bit about the ecosystems that I targeted, but yeah, it sort of affects everyone. there's large organizations, small organizations, individuals. Uh there's no really rhyme or reason to it. So the results, so I targeted the npm
and pi registries for this research. If you guys don't know, pi is just a Python package index and npm is a node package manager. Basically, it's just a spot to put a whole heap of code and leak a whole heap of secrets. Um so in total, I scan 4.2 million packages and this is my initial research. So initially I set out and I was just curious to figure out what was in these ecosystems. So I decided I'd scan every single package. So on npm there's 3.5 million packages and on pi pie there's around 700,000 at the moment. So 4.2 million in total and I just scanned the latest version of them. Right. Initially I wasn't thinking
I would find anything because these package managers, these registries, they're out in the open. A lot of people are looking at them. I thought um but actually yeah after I did this initial research it sort of uncovered a lot of secrets for some big organizations and that's when I decided that I would focus on you know scanning every single version for bug bounty programs uh domains that had uh security programs and things like that and in total I found around 10,000 verified secrets um which is a fair chunk and it gave me 50 plus submissions across bug bounty programs, self-hosted security programs, VDPs, things like that. And it earned me around 25k in bounties, which was nice. And um these
are the different priority ratings are some of the bugs that I found. But when we look at the actual services, you can see as I mentioned before, not all secrets are built the same, right? So we have infur alchemy keys, things like Twitter consumer keys, ethscan keys. These are all low impact secrets. They're essentially just noise in the chain. But the next three that you see, MongoDB credentials, GitHub credentials, AWS credentials, they're all pretty high impact secrets. And the thing that I learned along the way is that it's pretty bad when your secrets are exposed publicly, obviously. Um, but it's even worse when they have pretty bad permissions. So, when you're exposing root AWS keys or GitHub ps that just
have legit access to everything, um, that's where things sort of turn a little bit bad. So in total 10,000 verified secrets across 4.2 million packages, 20 million versions. This is how it was split out across the different extensions um for each piece of software that I scanned. And obviously JSON, JavaScript, Python, that made up about 80% of all the extensions that I found these secrets in. And on the side there, you can see we have like PHP, AML, Java, TypeScript properties, um things like that. But yeah, a lot of it was made up of just the generic JavaScript and Python file extensions because that was, you know, the main registries that I was scanning there. Um, a quick shout out to a secret
I found in a nicemppmrc, you can see it up there, file. Um, this one is interesting because not only was it a pretty funny file extension, right? But this GitHub pat that I found actually gave me maintainer level access to a very well-known security tool. Um, and that was access to their public repos and would have given me the ability to push malicious code to these repos and then approve it using this token and essentially poison their supply chain for all their organizations. Right? So that was an interesting uh report to put through. But now we sort of like talked about where the secrets are um the impact that they have and things like that. I want
to just talk a little bit about why I think this is happening. And I was thinking about this a lot and sort of came up with this perfect storm of things that have to happen for exposed credentials to sort of go undiscovered. Right? So we have usability and that's essentially to me it's how easy it is for me as an engineer to push code to these repositories with npm and pipey it's really easy it's literally a CLI command you can do it by the CI/CD things like that and that means that sometimes if you have CI/CD set up to scan for secrets uh engineers can essentially bypass this by just publishing it themselves right it's very
difficult to track actually what's on these registries and we'll get back to that in a Another thing is visibility. So how visible is this ecosystem? How widely is it used? But also what methods exist for me to map out what's exactly in these packages? Cuz like a big thing of it is that you can scan all these packages, but if you can't relate them to an organization easily, then there's not really much use doing it, right? So you need a way to link it to an organization, to a domain, things like that. And where the main issues arise is when you move from a local instance of your code or a private repository and you
push at these public registries. So npm, pi, github, AWS, ECR, things like that. This is where you might be skipping your CCD pipelines that scan to secrets or you know you might not even be using them at all. So you're just bypassing all this and the usability of npm makes this really difficult to track. Well, so inside npm of pi, you can't actually easily determine your organization's footprint inside these registries because you can't search by domain, right? You can search by namespace and things like that. But if your dev's just publishing to a non-namespace registry or package, uh it's going to make it quite difficult to track. But I have a nice little trick for that later on that
we can chat about. But let's talk about like a traditional deployment workflow. So say you have a GitHub uh repository set up and it has some PR workflows, some deployment workflows, things like this. This is what like a traditional deployment would work would look like. Uh so you're going from a PR, you might have like a PR workflow that runs some secret scanning, linting, tests, things like that. And then you have like a build, deploy, publish, right? So you publish into maybe GitHub, Pi, Docker, AWS, ECR, npm, things like that. And you would think like okay in the PR we're checking for secrets. If there's no secrets inside the PR then anything that goes into main
is going to be fine right? You can also have an even cooler traditional deployment workflow something like this with a pre-commit hook at the start. So you can have a pre-commit hook um to check your commit before it gets sent to the git stream and just basically check it for secrets. So truffle hog which is a tool that I use for this research actually offer that as well. Um, and it's a really nice way for you to stop secrets from entering the git stream. It also requires a lot of additional setup on the developers machines and things like that. And it's not as automated as an action that pretty much does the same thing, right? But do we really need
this? Um, GitHub recently implemented push protection, which is basically native secret scanning for any pushes that you push to GitHub repositories, right? So they'll do the hard work for you. They'll try and figure out if there's secrets inside the commit that you have that you're pushing to your main or whatever and they'll let you know if you've messed up. So this is what it looks like in the CLI in the GUI. It looks exactly the same and they actually make you acknowledge it. Um I will say it is only for a certain amount of services and things like that that they partner with, but all the big ones like AWS, uh GitHub packs, different database credentials
and things like that are in there. And this is awesome. Um, GitHub sort of recognized that they were publishing well a lot of secrets were going into GitHub, right? And something needed to change. They needed to look after their users. So, they decided to do this. And it's not the first time an organization has done that. We have hugging face here um that implemented something similar. So, if you're not familiar with hugging face, it's pretty much just GitHub for AI, ML models, data sets, things like that. And back in 2023, the end of 2023, a lot of their users were publishing different secrets, API credentials, things like that to their repositories. And they identified this issue and then they
decided they needed to do something about it. So they actually implemented similar things to GitHub push protection where if you push keys to these repositories, you'll get an email and they'll let you know, hey, you've pushed your keys here. You need to fix this up, which is great. It's not. It's an email. It's like sort of reactive. So, the keys are already going to be there. If someone's looking at it, they're probably going to be found. But it's better than nothing. But what does npm do about this? Well, they do actually scan for secrets. Surprise, surprise. But it's just for npm tokens, which is good because npm tokens are sort of the keys to the
kingdom for your npm registry, right? If you have access to them, you can essentially push code to these packages, things like that. Um, and they revoke them immediately. So, I tested this and as soon as you publish an npm token that's valid to uh a package through the public package registry, they'll revoke it immediately, which is awesome. But remember when we talked about how CI/CD pipelines can be used to deploy packages? Do we really need um an MPM token at the end of the day if we can just use a GitHub pat, right? Oh, this is what an email looks like. So you sort of just get told off for publishing it. But yeah, we don't really
need an npm token. We can just have a GitHub pat. And I discovered an instance of this in one of Malaysia's um largest restaurant management software. They're like a SAS company, right? And I cons consistently saw their PAT coming into my login server every day. So, every new version that they were pumping out had their GitHub p hidden away in this massive uh JavaScript file like 2,000 lines long. Not really any rhyme or reason to why it was there that I could figure out. And after a while, they had no security program or anything like that. But after a while, I decided that I would responsibly disclose to them because the pat had access to their
entire GitHub registry and in turn their mpm registry, which meant essentially I could uh compromise their entire supply chain with this PAT, push it out to all their customers. I think it was 36 different packages, things like that. Uh 10,000 different customers that they had. Um so it would have been very bad for them.
And I I chatted with him a little bit about after I disclosed and I was trying to work out, you know, how this pat entered the um npm package because to me it looked like that it was just getting pushed up from the GitHub repo and then getting published and that's how it was leaking. But after a while of chatting them with them, they actually let me know that they did have secret scanning similar to this. So in their PR workflow, they had secret scanning on any pushes that went to different branches, things like that. So they checking out um any secrets that are happening in their GitHub repositories and then they had a deployment workflow
that would build and deploy and publish to the npm registry. And they figured out that it was in this deployment workflow that they were actually injecting secrets into their packages and then they were getting pushed up to the npm package registry. So the GitHub repo looked clean. It was fine. But it was the package registry itself, npm, the package on npm that had the secrets in them. Oh sorry. Yes. Sweet. So what can you do about it? So we talked about um different ways that we could sort of detect secrets in CI/CD pipelines, things like that, but they don't really work because, you know, you're not really looking at the end product. You should be looking at
what's being actually published these public registries and trying to figure out if there's any secrets being leaked there. There may not be. The previous example was probably a fault of their CI/CD pipelines, but you can never be sure, right? So, you can take three different approaches here. An endpoint focus, so pre-commit hooks using truffle hog, git leaks, things like that. Um, you can also implement CI/CD pipelines. So, GitHub actions, GitLab CI, things like that. just implementing secret scanners inside your CI/CD pipelines is a pretty nice way to sort of solve this issue. And then you can also rely on the platform themselves, right? So you can rely on GitHub hugging face to figure out that you have secrets being leaked
and then remediate them for you. But this is not a foolproof plan. Um in my opinion, I think you should be looking at the registry themselves because it just doesn't lie. Um, and although you may have GitHub repositories linked to npm packages, pi pi packages, different things you're pushing out, um, you can't be certain that there's nothing entering your package that shouldn't, right? So, you need a way that you can essentially monitor the monitor these package registries where you're pushing this source code and trying to figure out not only if you have any secrets there, but also your footprint inside these registries. And one way I did that um for this research was actually implementing a
continuous monitoring system. So basically I leveraged the RSS scanners for npm and pi. And the way I did this was just pull the RSS feeds which had the created and updated package names every minute. And then I would resolve the maintainers and authors and just send this to an AWS uh simple Q service which was just a nice durable way for me to track state and it was separate to my main package scanner um container. So it meant that I wasn't getting bottlenecked anywhere, right? And then basically the package scanner would just take all these names from package names from the SQS and scan the package for secrets and then we would just log the results. And arguably, this
has probably been the most amount of secrets that I found from this research is using this because what you'll find is that a lot of people will publish packages and they'll figure out that they have secrets in them, but then they'll just push through a new version that won't have the secret in them. So, it's still in the previous version, but I've already caught it, right? Um, and it's it works pretty well. And I've actually got that up on my GitHub. So, uh, if you guys want to utilize that, you can to scan your organization continuously to work out, you know, what's being published. And it's really cheap to run. Um, I spend about $1.50
in AWS, uh, each month, which is nice. And it's like $12 for the VPS to run both these containers on, which is good. But that's continuous monitoring. So, what's a way that you can sort of, I guess, clean house and figure out your footprint inside these registries? And this is a main issue with mpm is that you don't have a good way to really query what your organization might have in these registries because obviously you can't query by maintainer domain or anything like that. So a way that you can actually do this is you can replicate the mpm registry inside a database yourself. Uh I actually did this using a custom script and I later
on found out you can do it via their couch DB uh instance and things like that but it was a little bit buggy. Um, I tried to do it, but essentially what you want to be able to do is is have a more queryable registry so you can figure out what your devs are publishing. You know, who's maintainers on different packages, things like that. You can see an example like this with PayPal, right? Um, there's a lot of weird little packages in there that aren't scoped to the PayPal name space. Um, which I'm sure they don't maybe don't know about, but a lot of the organizations that I talked to, their first thing was like, oh, we didn't even
know that these packages existed or we didn't know that our de developers were publishing these packages to the public registries. So you can utilize this and then once you do this you can just scan every package that you find through your organization. Maybe you're doing it for bug bounty or something like that for different scopes you have and then just work on remediating it. And then if I was you, I would implement continuous monitoring. Cost nothing. And at the end of the day, you sort of are able to really drill down into what's inside the registries and what your developers and things like that are publishing. And that's also um up on my GitHub. So you
can just go there and grab it if you want. Uh play around with it, change it, modify it. But yeah, as we sort of finish up, I guess um the key takeaways is if I was you in secret scanning, it's very difficult to do because there's lots of different sources and things like that and there's lots of different ways that you can bypass it and it's very hard to detect. Um it's although it's simple to do, it's sort of hard to detect and to do well. So I think you have to sort of take a multi-layered approach to secret scanning and I would implement secret scanning in your CI/CD pipelines and then also trying to monitor any packages
entering npm and pipey but also monitoring maybe different ecosystems that your developers are publishing to because you're not sure you know you can't it's tough to map out exactly where your devs are publishing unless you have great access controls and things like that. But yeah that's the end of the talk. THANKS FOR COMING.
EXCELLENT. SO, WE HAVE PLENTY OF TIME FOR QUESTIONS. CLASSIC. >> UH, great talk. Um, no. Awesome work. Uh do you think that rather than scanning the registries themselves, do you perhaps think it's best to have a step where you scan the artifacts that you've produced rather than the registries? Cuz I think that like the idea is that you don't want them to be published, right? Not. Yeah. >> So like maybe >> taking that secret scanning and not just scanning the source code, but also >> before you go zip up or t up that thing that you're going to publish to npm, maybe scanning that the output of that step first before you go upload it.
>> Do you think that makes sense? >> Yeah, for sure. I think that's a great way to do it. Um I guess the thing I would be sort of concerned about is if there's any like bypasses of that. Like npm is really easy to use, right? And it's very hard to block developers from, you know, publishing to these public registries easily or you might don't want to be that blocker. Um, but that's definitely a good thought. I think definitely could implement that for sure. That's one way to catch it before you zip it up and chuck it to the public registry. >> Cool. Thank you. >> Good job, mate. Um, have you noticed a decrease or an increase in the total
number of credentials that you're seeing in these large registries? >> I would say as we move to more like AI MCP, you know, gentic code generation, I think there has been a little uptake. I've noticed a lot more being um, published and committed by different agents, right, like cursor and things like that. But overall it's seems pretty steady and I haven't noticed any massive decrease or any different organizations scanning them as well.
Um, when you're trying to claim a bug bounty for a secret that you found, are you reporting that to companies whose software relies on that package as a dependency or someone else? >> Mainly the organization that like the maintainer or the author belongs to. But then at that point I'm sort of I'm trying to validate whether that token is just you know an individual's GitHub p that they might have accidentally leaked or if it belongs to an organization and has access to different repositories and things like that. So there is a bit of like validation you need to do. >> Um but yeah >> so like what percentage of um findings actually do offer bug bounty pro like
programs? Is that like rare or is it somewhat common? Yeah, it's I mean I've had a lot of like NAS and out of scopes and things like that because it's not really your typical bug bounty findings and things like that. Um but yeah, about probably 30% of the findings actually have bug bounty programs and things like that. The other 70% no security program at all. So it's very can't really do anything with the keys. >> Thank you.
Uh thanks very much. Great talk. Um so I used to work with a lot of teams implementing things like secret scanning and so forth. Um one of the things in the industry that's difficult is um the verification. Did you have any friction with that verification? There aren't endpoints to do it and were you looking like an attacker by trying the credentials? >> Yeah. Um yeah, great question. >> Yeah, so I use truffle hog. Um they're an open source secret scanning tool and the beauty of them is that they actually offer verification. So they'll hit like an arbitrary API endpoint. So for AWS it's like get caller identity and they'll return that which is generally noninvasive. That's what they try to do
just to validate the keys. Um but no I haven't had any push back from any organization saying like you shouldn't be using these keys or it's being targeted as malicious and things like that. But I always heir on the side of caution where I will just try and do the bare minimal to prove that I have access to these keys and that they're valid. Um I won't go doing anything, you know, like deleting or anything like that. Yeah. >> Yeah. Thank you. >> Uh have you found that there's like a fair amount of disparity between a result between truffle hog and git leaks? >> Yeah, good question. I haven't actually used git leaks before. I would highly
recommend run both because the the findings I'm pretty sure they have similar upstreams >> but >> every single time so I've done this for like other ecosystems and I've always found additional crates with git leaks and I don't know why. >> Yeah. Okay, cool. Yeah, I'll implement that for sure. Thank you. >> Good time off for one more if anybody's keen. >> No. Awesome. Thank you very much, Luke. >> Thank you.