Where Did I Put My Keys? Preventing Data Leaks at Scale with Automation

Name: Where Did I Put My Keys? Preventing Data Leaks at Scale with Automation
Uploaded: 2025-12-18
Duration: 42 min 52 s
Description: Joshua Padman shares Red Hat's experience detecting, mitigating, and preventing secrets leaks across a complex ecosystem of development and deployment systems. The talk covers practical automation strategies, organizational practices that balance security with developer freedom, and proposes industr

BSides Canberra · 202542:5299 viewsPublished 2025-12Watch on YouTube ↗

Speakers

Joshua Padman

Tags

CategoryTechnical

StyleTalk

About this talk

Joshua Padman shares Red Hat's experience detecting, mitigating, and preventing secrets leaks across a complex ecosystem of development and deployment systems. The talk covers practical automation strategies, organizational practices that balance security with developer freedom, and proposes industry-wide improvements—from standardized key formats to API-driven remediation—alongside the new Secret Scanning SIG initiative.

Show original YouTube description

BSides Canberra 2025

Show transcript [en]

Uh the next talk I'm really looking forward to. I don't know if anyone's can recognize Josh or or Wizzy as he likes to be called. Uh he's been on the Rejo desk all morning. Um he's also probably answered about a thousand emails uh for Bides camera. So not only is he presenting this year, but he's been a long-term volunteer. So thanks. Thanks Josh. Um yeah, round of applause for Josh. but at the moment he's here as a presenter and he has an amazing talk uh where did I put my keys preventing data leaks at scale with automation. So, another big round of applause for Josh, the presenter. Awesome. Cool. All right. So, um Kylie's done a bit of

an introduction. Uh yeah, this is me, long-term besides Camber attendee and volunteer. Uh yeah, you might remember me from roles such as Rejoin or um Awkward MC. Uh if not uh come check me out in the off track a bit later but um yeah wait until the end of this talk. Um I've worked at Red Hat for the last eight years. Uh the first five of which I spent in product security and now I'm in the information security team. Uh it's been a pretty awesome experience to work for one of the largest open source companies in the world and straddling the necessities of security uh with the open nature of of how we work and the

company's fairly unique culture. My most recent role has me focused on incident response and alongside that a lot of automation work and one of the projects that I work on is a topic of discussion today. Anyway, enough about me and let's get on to it. So, what's a presentation without an agenda? Uh, no point beating around the bush here. My goal is to get you interested and involved in an open source project and a group that I'm part of. Uh, to do so, we'll look quickly at what a secret is, why you should care. Uh we'll also look at different uh things that different companies are doing, including mine, and uh also looking at uh why this simply isn't

enough. Then we'll look at some future improvements that we can make in the industry and ways that you might potentially want to get involved. Uh so yeah, let's strap in and get going. Fair warning though, a lot of the images are AI generated. I am uh not a prompt engineer and not really a an AI person but uh yeah these are the least ridiculous options that I came up with. So what is a secret? How we define a secret is pretty simple. Any piece of sensitive information that if exposed could pose a risk to an individual or organization. The sensitivity of a secret is highly contextual. That's pretty simple, right? That last part there is really

important. The context of a secret can change the situation entirely. The simplest example that I commonly use is when you look at test data. Often this data contains a lot of secrets, but it should hopefully be fake data. If the contents of the same production system were to be leaked, there's a high chance that it would actually be sensitive. Today we're mostly going to be focusing on secrets that you would commonly see during development and deployment. Secrets like API keys, certificates, credentials, session tokens, and more system style secrets. Um, a lot of the time these are things that you see in configuration files, environment files, uh, hardcoded or even in logs. That's not to say that other types of

information aren't important, you know, uh, PII, medical data, all of that. Uh however, in our experience where I work, these sorts of uh leaks thankfully don't really happen. So where are we finding these secrets? Many organizations have a large number of disparit systems that are used by various teams. In some organizations, there are hard and strict rules. Others have a lot more freedom. Here are some of the systems that Red Hat use uh for various purposes. Um, we have source code management both internal and external. U GitHub, GitLab. We have people that are using personal accounts. GitHub uh has a one account policy which is pretty easy to get past. But yeah, we have code flying everywhere. We have

forks of forks of forks. We have bug tracking systems, customer support systems, internal systems, help desk. We have our CI/CD systems both upstream and downstream. We have all of their logs and all the artifacts that they create. We have logs in general. We have container registries, paypins, chat platforms, both those that are blessed and those that aren't. Um, and now we have AI. And uh, then we also have all of these again from all of the vendors and third parties that we rely on. So every company has these to varying degrees. At Red Hat, the majority of these are also public and open. We work on Open Shift, Kubernetes bugs openly, Linux issues, kernel issues, all sorts.

And then we also have customers and a lot of those customers um will upload, you know, they have a bug, they'll upload a log file or some information about that bug to one of our bug tracking systems because they feel like going that way rather than through the customer system. And a lot of those are open by default. A lot of logs are normally pretty clean, but sometimes debug logs or there's a failure and that there's sensitive credentials in there. All of these platforms are posing their own sort of uh risk for leak secrets and data. And for those working on open source, the risk is even higher with many of defaulting to open. Cool. So why care? I've been thinking

about how a breach could be a little bit like a game of snakes and ladders. As the attacker is trying to move across the board, they want to avoid any snakes. They want to avoid detections and incident response just as much. They want to find the breadcrumbs, the secrets scattered throughout the organization that give them the step up, moving them closer to their targets. Every secret your organization detects first and remediates is removing a potential ladder for the attacker, making their life harder. The snakes are your other detections and responses, pulling back the attacker as they try to move forward. Maybe this is a bit of a stretch, but if you leave your secrets laying around the place, you are making

life easier for attackers. Maybe you get lucky and that AWS access key you accidentally published publicly, they only end up deploying, you know, 50 high performance instances and start mining crypto. It's probably better than breaching a production system and data, but it's still going to cost you a lot of money, especially if you don't realize for a few days. So, let's look at some of the research that's been done in the industry. A Gigg Guardian post uh shows that there's a 25% increase in hard-coded credentials from 2023 to 2024. That's huge. That's a huge number of leaks, representing about 23.8 8 million new leaks detected by GitG Guardian in 2024. This is up from 19.1 million in 2023.

And further, they suggest that 15% of all commit authors have committed a secret. So that's 23 million secrets leaked and 15% of all commit authors leaking secrets. That's just crazy numbers. There's also a reversing lab article and they quote Thomas Secure from GitG Guardian as well that easily identifiable secrets published to GitHub can be compromised in a matter of seconds with uh yeah attackers constantly monitoring GitHub for new pushes. So consider that gone in 60 seconds for the sake of the pun. But we see that remediation is slow with many secrets still active after 5 days and an average remediation time of 94 days. This is absolutely crazy numbers. They find a secret within seconds and often have a

minimum of 5 days to use it. Now if they haven't just used the secrets for short-term gain, let's consider that they might have used the credentials to breach production system or gotten a foothold into your uh organization. The average cost of a breach in 2024 was $4.8 million US. IBM's research has further shown that security skill shortages uh correspond to an average increase in the cost of a breach by $1.78 million US. Um Red Hat is completely owned by IBM, but uh no one in my team or myself has been involved in any that research. This paints a pretty dim picture for organizations. Secrets are being leaked more than ever. They are being found quicker than ever and remediation still

remains too slow. Organizations are losing significant money through simple oversight and mistakes. Uh yeah. So why does this matter to Red Hat specifically? I've touched on it a little bit already, but we're very heavily focused on open source software and we have an upstream first mentality. Um my contract, my employment contract actually legally protects u my interest in upstream projects. Our engineers are working publicly using shared systems to fix improve update software that is underlying a lot of our product offerings. This team is then further working across the globe in different time zones sometimes different working days um very much like they do in upstream community and they do this by using systems to work asynchronously.

We also consume a lot of upstream code into our downstream. Um we even host a lot of the upstream CI/CD infrastructure and testing infrastructure for both upstream and downstream though we do keep them separate for obvious reasons. Our engineers also have a lot of freedom uh freedom in the technologies and solutions that they choose to use as this helps empower them to really solve problems efficiently. But we do have strong standards and baselines internally as we start to productize our projects. And finally, um, we're a very customerfocused company. Obviously, when you're selling open source software, you're selling support. So, our customers really do expect us to care about uh their security and about their secrets and

their data. Uh, so what are we doing? For quite a few years, we've had an internal system that uh we refer to as Pound Alert. It's a combination of software that we've built to detect leak secrets and inform both my team and any relevant associates. It's got a few main uh components to it. It's got a monitor, pattern server, scanner, analyst, and a forwarder. Uh the monitor does just as it suggests. It monitors different services for updates, new pushes of code. You know, this includes things like GitHub's event stream, um Jirro, internal LDAP, uh GitLab, and a few other places. It then creates scan requests that it passes onto the scanner. The scanner is a real

workhorse. It takes a git repository, container image, JSON, URL, and it scans it using our custom our custom rule sets and patterns. The core engine here for the actual scanning is git leaks. Um, we have leakk scanner um which is part of pond alert and it's more of a convenience wrapper built around git leaks um that prepares all of the images and everything else for scanning. Um we do contribute upstream to git leaks and other projects where we can and the rules we use um we we also uh share those. The rules we use come from the pattern server and the pattern server it basically uh takes authentication and it dishes out rules depending on what

people use want to use it for. For pond alert, we have a very um broad set of rules so that we can catch as much as as we uh yeah everything. Whereas uh if an associate is uh pulling down the rules for use with um git hooks u pre-commit scans um or if they want it on their CI/CD system, we can actually um build custom rule sets so that uh they have a smaller number of rules they need to scan each each um file with. Once the scan is uh done, it passes it off to the analyst and this does a review of the uh the results and it works out who should be notified. Um at

this stage as well uh for some systems we have the ability to um immediately mitigate uh and disable or quarantine certain um certain credentials. Uh but often it passes on the information to the associate or and to the information security team. Um yeah, and that's all done by the forwarder which is the last step. It sends out emails, blunk events, it calls web hooks um to to trigger off other automation. Uh and it can be fairly uh easily expanded. As part of our rule set, we actually have some tags that enable us to manage um so like an AWS access key and secret key. We might actually uh want to notify the infosc team, not just the associate and

things like that. So um we've tried to keep each of these components simple. So all of the components use standard IO and JSON just to make it as simple as possible. Um we have actually run this in a fairly uh well scaled out horizontally using NATO as a message bus and um to do that we actually upstream some changes to the NAT CLI to allow us to pipe from our tools into to NAT CLI onto the message bus and then the other way as well. Uh we also have a few extra bits and pieces that aren't pone alert but are separate things like the pre-commit hooks um software for streamlining the installation of these pre-committ hooks

for the associates. Uh we have uh GCS filters um it's basically running as a function as a service and it uh scans all new files going into uh Google cloud storage for secrets and it quarantines any that it needs to. Um we have the similar things on uh Jenkins and GitLab. My favorite part of Pone Alert is looking a little bit more at the technical side of things. So it started as a um a Python project slightly before I started. Um it has the emitters that continually monitor uh the supported platforms and send off to the scans. Um we also with that notification part we have a another system that uh pers people. So if there are credentials that

or secrets that aren't particularly uh serious and don't need immediate follow-up um it will escalate with their uh with their managers with them and continually remind them to do something about it. Uh we do have some memory uh issues at the moment with the Python project itself. When it started small, what we used to do was just uh hold everything in this little custom message bus within the application. And those requests used to be small cuz it used to just be you know a bit of information about the depth of each git repo that you wanted to scan. But as we started to expand out to to scanning um JSON in particular and um as we can now

do that uh in line without having to expand it onto the file system, those JSON blobs that we are scanning end up sitting in the queue. Um but it's not significant. It's using you know like six gig of RAM after a week. Um the LETK scanner itself actually uh uses very minimal amount. Uh we host this on Open Shift because Red Hat. Um and we also use uh a memory disc or a RAM disk or whatever you want to call it for all the temporary files for all the gig clones everything that needs to hit a file system. Uh just so that we don't have the bottleneck of an actual hard disk. Lek scanner is a a newer

part. We used to just uh branch out to git leak straight off. Um but as we wrote lect hair scanner which we wrote in go um we now have basically encapsulated git leaks with inside it and this saves us a lot of time in calling the executable over and over again. Instead we sitted there in a listen mode. We have also worked with the um git leaks uh quite a bit uh to upstream a lot of the improvements so that they go into uh git leaks rather than being in Lake TK scanner. Um we have eight CPUs assigned and uh even though uh we're scanning 350,000 repos or more a week um scanning oh hundreds of gigabytes of JSON and um

not JSON sorry of Jira and all other sort of systems um we find that the CPUs are there for burst traffic and to continually monitor but um when we move from git leak as well as a few other things, we actually managed to uh double the amount that we scanned um or the half the amount of time it took to scan the same amount. Um we have also um 11,000 gear accounts that we monitor across the organization. I'll speak a little bit more about the challenges in getting those as we go on. I'd mentioned some rules before. So um yeah the rules or the patterns are at the core of um our scanner. They're at

the core of git leagues. And uh this is a very simple example. It's a Google cloud platform API key. Um reax is the core core of each of these rules. Um yeah. So uh yeah you can see things there like the ID in the description fairly self-explanatory. You've got the rejects followed by a secret group which is the secret group pulled out of the reax. And then we have those tags there which are uh what I mentioned earlier. So this is a type secret. There are other types like uh um phone numbers and things like that. And then we have alert. So we have alert repo owner and there can be alerts for info techch and

other other words there. You also then have the keywords. Keywords are kind of like a a short circuit. Um, so if the part that you're scanning doesn't have the keyword in there, then it won't even try and run the reax. And then we have um stop words, which are kind of like a shortcut after. So after the reax is run, if if in the secret any of this um any of these stop words there, then it just continues on. It it skips that and it's not considered a finding. Challenges. Um, we've faced many challenges over the years building the system and getting it to the coverage levels that we're comfortable with. The culture within Red Hat allows a lot of

freedom. It's it's one of the company's core values and many associates have been contributing to open-source software and upstream prior to coming to Red Hat. They have their own accounts on various systems up and upstream platforms. This culture allows them to really innovate, work quickly, and be part of those upstream communities. But it also means we don't necessarily have simple ways of tying their Red Hat accounts to their personal accounts. Um, so in order to really respond quickly, uh, we needed to put systems in place to link all of these accounts together. Uh, through many years of campaigning, we've got really good coverage with 11,000 um with a git with 11,000 accounts. Uh, that that's just massive. Um, this was

achieved mostly through uh, associates self-reporting. uh their accounts and um they do this by adding it to the internal uh directory. But this was built on the back of um an information security team uh that has spent many many years working with engineers without within people within the organization um to work with them rather than kind of just demand things. So we've really built up a good rapport and um the engineers are generally quite happy to to come along without much push back. I know of some other organizations that are a lot more strict and they have the ability to really demand this sort of information. And there's some that are actually uh they forbid anyone from

using uh their company emails or their company accounts or anything like that with these upstream repositories uh without explicit permission. Um going as far as continually scanning um all of these platforms for their associates. Um however obviously Red Hat operates very differently. I also mentioned earlier pre-commit hooks, GitLab sci scanning, Jenkins and cloud storage scanning. These solutions uh were created as part of responses to incidents. Um and we find that a lot of times um our focus is kind of directed by where these incidents are happening. Um and it also helps uh the teams are a lot more willing and their managers are a lot more willing to put these sort of systems in place after things happen

which is a bit bit sucky but you know um we've had some good wins that have come about from incidents that probably shouldn't have happened in the first place. Um we do find that occasionally we get some push back against scanning um as it's really seemed to sort of impact performance and slow down processes. But uh here we've really just had to work with the engineers to try and make sure that you know we cut down the rules that are specific to where they're working. Um and we work with them to try and uh remove all these roadblocks and work with the leadership across the company um to to have more of a natural sort of

implementation rather than demanding it. Um so yeah, I've spoken about a few of these systems. Um here it's broken down into four key areas. So we've got detect obviously that one's pretty uh pretty obvious and that's where a lot of our effort has been so far. Our scanner and our patterns are open source. The pattern server and GitLab CI those aren't um they're on their way there. Uh we then have mitigate. So these are the ones that um help us to actually solve or resolve the risk. um things like automated secret revocation documentation and the forwarder. Um we do have a bunch of hack scripts to help validate credentials in the GitHub, but we're still working towards the um open

sourcing a lot of that automation. Then we have the prevention which are tools designed to to really um stop them before they get there. So things like uh the GCS filter stops files from making it into the cloud storage. the pre-commit hooks stop people from actually committing and then we have the improve and this is where we bundle in all of our upstream contributions like with Nats I io and git leagues and there's been quite a few um like in place uh decompression libraries and things like that that we've um worked with them to improve the libraries and yeah then we use them. Uh if you want to talk any more about any of this stuff, uh feel free to come

up to me later. So what have we learned at Red Hat? Um most of what we've learned through this project isn't isn't technical. Um when leaks happen, it's normally an accident and we need to make sure that we have respect and empathy when dealing with leaks. um the associate has probably just made a massive mistake um that's going to be expensive and have lasting impacts for themselves um and potentially their career, but hopefully not. Uh yeah, we really need to partner with them to help them solve the problem, not to just broadcast the blame. Um, ultimately we're here to protect the company and as associates are often using personal accounts, we want to make sure we don't overstep and we don't want

to collect too much information. Um, only about necessary to protect the company and we really need to engage with the associates as people and often people under stress and that can sometimes cause defensiveness. Um, we also need to know the why uh why things are being leaked. This really helps us to prioritize what's important. Um yeah, and then uh when communicating uh we need to ensure we give actionable um actionable information so that when they receive the notification, they realize something's happened. We really need to give them a good call to action spelling out exactly what they need to do and keeping it as simple as possible. And then as with all projects in the

organization, we need to understand the value and we can u we need to you know share this with management and the engineers and everyone. But here we can really leverage the data. Um you know thankfully we're working with data that's relatively quantifiable and we can use this to our advantage to really um track the trends and the coverage. Uh we also need to know that uh collect enough information to make sure that we uh can attribute actions. We know the who, what, where, and when. Uh we'll probably never know the why. Um but then we also need to balance it with the employees privacy. Uh the impact to Red Hat's been fairly significant over the years. We've seen a

lot of improvements. Um we have much better visibility and understanding of where the secrets end up and why they end up there. Um and this allows us to help direct our future improvements. Um yeah, as mentioned before, some of our sub projects have been born of incidents. Uh we've seen significant gains in our response rate. Our time to mitigation has dropped from from days to hours or less. And a lot of that has come about from um the time spent refining our process for choosing the right person to contact. Um we're also seeing alerts being sent very quickly. uh some systems within a minute and others within minutes like if you upload a secret to

our Jira instance um you'll get an email within 30 seconds. Uh and we have an increased uh security awareness as well. Uh nothing rings home quite like receiving an actionable alert for something that you've done. Uh if you think back to the GG Guardian report, 15% of committers have committed a secret. uh chances are you or one of your team members has done it and learned the hard way. By working with associates in an empathetic way, uh not broadcasting blame, uh the incident ends up being a lot more pleasant for everyone and a broader learning experience for them and and their team. And we've found that a lot of people that have uh caused incidents end up

becoming uh advocates for information security. We've also uh we have seen cost savings obviously. So we use AWS a lot and AWS do scan but uh we still get uh some things that go through and you can never know how much money you you save from a quick response but uh you can infer it from past incidents and we have seen sign significant cost savings across the organization. >> So what are some of the other companies doing? Um, I've mentioned the differences between Red Hat and some other unnamed companies a few times, and now we're going to be looking at some of the vendors and open source projects. Um, AWS constantly scanning for secrets relevant to their platform, quarantining

them, notifying the owner. Um, they also actively monitor for misuse of keys and services on their platform. The other cloud platforms do it, but in my experience, AWS is the best at this one. uh GitHub have their secret detections as part of advanced security blocking potential secrets before they really uh properly get committed. GitLab have the same both of these platforms offer parts of these services for free uh for for public repositories. Uh you got TruffleHog and Geek Guardian they're two big scanning vendors for they provide platforms for scanning notifications mitigation and reporting of leaks. It's pretty obvious it's a big issue that's not gone away and each of these companies has their strength and

weaknesses and all of them do provide high quality solutions. Especially if you don't have the people uh to implement a secret scanning system yourself, they're definitely worth looking at. At Red Hat, we utilize uh GitHub and GitLab secret scanning where we can, especially in public repositories. And we also do receive a lot of uh well, not a lot of alerts, but we do receive alerts from AWS from time to time. And you might have noticed the truffle logs in the middle, you know, a bit bigger and center. Um, this is to remind me to call out how to rotate.com, which is an open source repository, I guess, um, and website. It has a wealth of information on on how to respond when

you do have a key leak. So, it's got information for heaps of different platforms on how to actually rotate your API keys and your secrets. Uh, so yeah, I recommend checking that out. Cool. Uh so why why leak TK? Um why do we want to share it? So we were looking um at what we'd built, our patterns, our rules and um yeah, we just wanted to share it in the we wanted to do it the best way we knew how and that was to open source it. uh we wanted it to be free to be open and whilst it it didn't start in the open uh we are progressively opening up projects and we aren't selling a product we're just

sharing our implementation and we are committed to all new work uh being started in the open also wanted a community focus uh not to create our own bubble but uh to work with upstream projects and integrate them into what we needed um we've done this with git leaks and nats and a few other libraries Um through this work we've actually built uh quite a few relationships with other other people that are creating scanning tools and other people that work at big scanning vendors and yeah um in GitHub scanning team. Uh we've also wanted collaboration and contributions. There's a number of non-read hatters that are already already involved especially in the creation and testing of uh rules. Um but

yeah, we love uh contributions from from everyone, even if it's just talking about what your challenges are or what you're doing in the area. And finally, we we wanted to provide a set of tools that make it easier for organizations to start building a scanning system in a way that suits them uh without a big price tag. But uh scanning for secrets, it it's not enough. We're focused on the detection and the mitigation, but we need to look at what we're doing and what uh companies are doing and why this isn't enough. So, finding league secrets and dealing with them, it's important, don't get me wrong, but it's reactive. Uh we'll never be able to catch them all.

Uh some of the things that uh you need to look at as well are reviewing the way you do credential management, pushing towards secret management platforms, um even with developer systems. Keeping this central and using lookups helps to keep secrets out of environment files, source code, project files, all these areas where they end up leaking into other places. It also provides better access logs so you know who and how your credentials are being accessed. where possible, it's really good to to move towards shortlived credentials over longer lived static keys. Things like AWS security token service and um are good with this. Uh it helps because it really limits the exposure of credentials as they're expiring quickly,

but you also need to make sure you're setting the right scope on your credentials and your secrets. Um but even with this we still see attacks happening uh like GitHub actions you know um you've got the uh tokens that are created just for the point of the the GitHub action and you're still seeing them getting leaked occasionally and misused. Uh we need to make sure we keep watching that we have uh good detailed logs. We're looking for anomalies, you know, all the normal sort of um um and we need [clears throat] to change our organizational practices and keep creating a culture that's aware of this as a risk and is actively seeking to avoid them. Uh providing them with

guidance uh to securely configure all the different platforms including their systems and the way that they work. It really does take a layered approach and a lot of diligence. What can the industry do better as a whole? Leaks are always going to happen. We've talked about the ways that we can find them and what we can do to minimize the likelihood of them being leaked. However, uh we need to look at the way the industry can do better at it as well. The organizations and platforms that are providing these secrets and the standards that define them. These are some ideas that have been thrown around. Ideas that could make dealing with leak secrets, particularly those provided

from platforms, easier. Some of them are already implemented by some companies. Uh but it would be good to make more ubiquitous. Firstly, let's let's make keys easier to identify. Some places do that. You know, adding a prefix or a postfix. Publish good information about the format so that detections can be written more easily. Standardize the type of uh the way and the type of embedded data that's contained in those secrets. And also including a check sum for quickly validating the key format. uh providing a way to quickly quarantine or disable a key using the key. Uh an API where it can quarantine itself unless someone opts out of it at the time of creation. Uh GitHub's had this

for personal access token since April and uh it it's really it's fantastic. Uh provide concise documentation about the creation and revocation of secrets particularly really simple and clear instructions on revoking secrets. think, you know, the how to rotate.com. Uh provide secure defaults and really warn users when they're creating a key with significant power. Um sometimes you need that key with significant power, but really enforce the shorter lifespans on these as well. Um don't allow the keys to your car to last for days, weeks years. um make key rotation easier and uh doing their own scanning as well and uh providing a way in documentation on how they respond to these sorts of incidents as well. This isn't just a user's

problem. Leak secrets are costing companies money. Um if a platform experiences a large number of leaks, like users leaking their keys, people don't just look at them and go, "Oh yeah, the users are making mistakes." They also they don't look too fondly on the company and you don't want to be in the news as that company who's you know um whose keys have been leaked even if it's not your problem. It's not fair right but you know the organizations didn't leak the keys um but there are things that they could be doing better. An organization that really helps their customer protect themselves is also securing that relationship. When we were thinking about what the industry could do better, we realized we

needed to create a a group across the industry. And this is where the secret sanding scheme came comes in. Uh at first 2025, um one of my colleagues um was there and he presented a similar sort of presentation and that's where the idea for this was born. So not that long ago. It's not a first base stick. It's just that uh it was created while he was there. We already have members across different organizations including vendors and platforms and other open source scanning projects. And the mission's pretty simple. Uh to enable open source and vendor tools to move beyond basic scanning to solve novel problems in secret leak detection, remediation and prevention. We see the SIG as an ecosystem of tools

and services built around open collaboration that proactively protects um data bringing individuals and organizations to innovate. We're here to pro improve the status quo working with organizations. We don't want to cut off those who create and sell great tools. The work of the SIG should improve their products as well as those of the community. And the SIG principles uh really help us guide us to that end. So you've got collaboration where we prioritize working together to solve common problems. We aim to integrate with and enhance existing tools and services not to replace them. Do no harm. We operate with respect and empathy. The goal is to partner with organizations to solve problems not assign blame publicly shame them for

leaks. All activities should be conducted with the aim of protecting the organizations and their users and actionorientated. We focus on producing practical actionable outcomes whether that's code, documentation or standards. The group's not about product endorsements. It's not about creating a commercial offering and it's it's not actively scanning at large scale or reporting links. Uh we're looking at ways we can improve detections, share knowledge, and make secret detection uh more accessible for everyone. Uh and here are some of the current research projects that they're working on. They've um already got a a tool agnostic uh rule set. So it's a um a generic format that they've come up with uh which is then uh converted into uh

the rule format required by git leagues king fisher nosy parker truffle hog uh github there's uh work happening on detection benchmarking and it's really to standardize a benchmarking to help identify the fastest most accurate tooling but this is for the purpose of improving tooling and detections across the board as and um best practice and documentation. Uh so this is aimed at multiple levels you know providers, developers, CI/CD, you know all all these different stages where where the secrets get leaked. So what is your organization doing? Um yeah, I don't know what your organization is doing and maybe you don't either. Um here are some questions that maybe you can think about. Do you know where your employees are sharing

secrets? Uh, do you know what accounts employees are using or how they're storing their secrets locally? Do you scan for secrets in git repos, ticketing systems, containers? Um, if you're a platform, are you providing easily recognizable secrets? Are you providing a quick way to quarantine secrets? Are you providing excellent documentation? Hopefully, as you go away today, you will consider where your organization is and uh where there might be room for improvements. Maybe this is all new or maybe you have mature systems in place. Either way, uh what more could you be doing? I know at Red Hat, we still have a way to go. We have to we need better training, better the documentation, and

more coverage across our internal systems. As well as that, as a provider of API keys, um we really do need to be better and provide the quick quarantining. We don't do that ourselves. So, yeah. Uh my slides got mixed. So hopefully throughout the presentation I've shown um that this is not an issue that just affects a few companies or just affects a few users. Um you or someone you care about will likely one day leak a secret. And we need to work together individuals corporations vendors providers, communities to build tooling, standards, and awareness. And without a collaborative effort, leaks will continue to plague our industry. If you have a further interest in what I've chatted about today about Lake TK, the

secret scanning sig, or just wanted to share what you're doing, I'd love if you come up and have a chat. Uh, thank you. [applause]

>> Great talk. Do we have questions here up the front up here? Thank you. >> Thank you. Uh what that you were mentioned um providing secret management platforms and tooling uh like what can you suggest any good secret management platforms and tooling specifically? Yeah. >> Um hashikov vault is vault >> what we use? Yeah. >> Um >> reckon that's the the go >> that's outside my domain but that's what we use and I found that to be pretty pretty useful pretty good. Thank you. >> Any other questions for Josh back here? Ror,

>> hi. Uh you mentioned that uh you're scanning like hundreds of thousands of repositories and so those are huge numbers. Um I was wondering if you have uh any advice on um how to track the resolution of those secrets uh or maybe you could share what you use for that. >> All right. Um so with our system we we have a separate system that tracks the resolution of it. um where we have um like say for for GitHub or Jira we have that uh link um and based on the system it will actually go back and check whether the it's still there and it'll keep pestering whoever's responsible or escalated to a manager or um or to the

information security team um as far as um tracking those within pawn alert itself um it only ever sends that one alert for one leaked secret. Uh, and then the rest of it is managed over there. But we just hash a bunch of the different um different uh bits of information about it like its URL, where it is in the file, all of that sort of stuff so that the system knows whether it's seen it before. >> Thank you. >> Any other questions? Nope. Okay, another big round of applause for Josh.

Where Did I Put My Keys? Preventing Data Leaks at Scale with Automation

Related talks