
Good evening everybody. Welcome back to Bides Las Vegas Proving Grounds. This is our loss. Last stat last ah I'm dead. Sorry. Last talk of this day [laughter] and it's titled broke but breached secret scanning at scale on a student budget uh given by Ravi TA. So, a few announcements uh quickly before we begin. We'd like to thank our sponsors, especially our diamond sponsors, Adobe and Iikido Security, and our gold sponsors, Profit and Formo. It's their support along with our other sponsors, donors, and volunteers that make this event possible. These talks are being streamed live, and as a courtesy to our speakers, and audience, we ask that you check to make sure your cell phones are
set to silent. So, if you haven't already, please take out your phones and make sure they're silent. If you have a question, I'll be handing out the audience mic that I'm holding right now. Please raise your hand. I'll bring it to you so that the people on YouTube can also hear you. As a reminder, the Bides Las Vegas photo policy prohibits taking pictures without the explicit permission of everyone in frame. So I would request you to refrain from taking any pictures even if it includes taking pictures directly of the slides. These talks are all being recorded and will be available on YouTube in the future. With that, let's get started. Please welcome our speaker. >> Hello. [applause]
>> Hello everyone. Good evening. Um thank you for staying late for my talk. Uh my talk my talk is titled broke but breached yes secret scanning at scale on a student budget. So hey I'm Ravita I recently graduated from University of Maryland College Park in masters in cyber security. I explore Golang and Kubernetes and I am a part-time CTF player and I do have these certifications and I don't have the disclaimer for like employer like all all of these are like not of me and employers because I'm looking for my next job. So if you are hiring please reach out. So there there is some sort of interesting relationship between secrets and breaches. Like if you can quickly
Google like uh what are the some companies which got breached due to uh like sort of secrets which are out public you would get all these companies and stuff and that's that that can be like sort of hardcore API keys or which are pushed like everywhere literally and one such popular uh breach which we could kind of come across is like the solar winds uh where like an FTP credentials were like pushed to the GitHub and these leaks are like preventable given that like you have like respective checks in your pipelines. My mission was like to find these secrets uh way before attackers do. So my my research was limited to VS code extensions. So why VS code extensions? Uh as you can
hear like the AI is kind of taking everything and stuff and then like you see people wipe coding things. So if you can see the trend across uh ac across 2023 to 2025 there are like a lot of vs code extensions which are being pushed to the VS code marketplace. So technically they are like kind of often bundled with uh environmental variables and stuff and some things like say for example in the recent trends which we talk later down the lane uh we see that uh there's some sort of pattern across like the extensions which are being pushed to the marketplace. So in terms of scanning for architecture and stuff, so I kind of used this
architecture wherein uh given that I'm a student uh I I kind of pulled all of my friends Azure accounts and then spin up a quick Kubernetes cluster across their accounts and then technically given Azure gives like a $100 free limit for each every each account. So I was able to kind of pull them using a Kubernetes cluster and I was able to scan across like you for for this thing. But technically this setup was not that scalable given that uh sometimes like some of the VMs would kind of crash given the CPU uh given the secret scanning is like a CPU intensive task. So I I was looking for like other solutions and stuff like uh I came
across digital ocean. Uh digital ocean does give a free limit on like $200 for like students. So I I was fortunate enough to kind of use that and this was like my modified architecture for like scaling uh to to to do the scanning at scale. I I was using uh digital ocean's kubernetes cluster manage kubernetes cluster and I I have pods uh which are like kind of running as a damon set. So each and every part is kind of uh running on one of the VM instance and technically they kind of pull uh URLs which are like the VS code marketplace URLs from the radius cluster. Uh they kind of pull whenever they start and
they kind of scan those things and then push the data back to the radius cluster. Uh given I need to have like a backup solution for like a larger scale data. uh I was using Postgress SQL and all of these things are like uh architected using Golan. So this is like a overall view of like my cluster using cube view tool. Uh and the these were like the pods which are running as I described in previous slide. So on on and whole like I was using this tech uh technology stack uh technically Kubernetes like in my initial V1 architecture I was using K3S given that I was having multiple accounts in my v1 architecture I was using tail scale to
get all of these Kubernetes clusters on like a single network. Uh I was using terraform cloud for like easier deployment and docker is like for like quick testing on my end and truffle hog is the one which I kind of used for secret detection given that they have like ex extensive I guess like the last time I checked approximately like 800 uh reaxes or like things which are built into the binary and radius for like fast key value uh storage and like Golang which kind of orchestrated the whole pipeline for scanning. Python 3 for like initial PC and scraping and again AWS SQS is for like VPN queuing we which we are going to see at the end and in terms
of cloud I was using Azure and digital ocean. So in terms of scanning pipeline this is how it looked like. So this is like the VS code marketplace. I had to corrupt to fit in the slide and my first phase was like collecting VS code extensions and second phase was like to scan those things obviously and the third phase was like to normalize the findings and like put it in a presentable way. So looking at the first phase like uh so scraping uh scraping was like a pretty bad idea and I was looking for like an intelligent way to kind of get all the VS code extensions and there is this thing called sitemap.xml XML wherein all
the sites kind of publish published so that like the Google uh like either Google or like any other search engine could kind of go ahead and like index those links. So this is where I got all of my extension URLs from here and this is how I normalized. Let's say if you take first one which is like the Cryion marketplace the Cry uh Cryion is like the publisher name and the VS code theme Dragon Ball Z is like the name of the extension and you can also quickly fetch the version uh which is the latest version from the VS code marketplace using an API and this is how like the whole URL for downloading the specific
VSX packages can be generated. As you can see, Cryion is like the publisher name and like this is the version name and you can also see uh the extension name over there. So for scanning extensions uh I have used trufflehog uh given again given that they have like an exhaustive set of uh legs already built in and given like in my v2 architecture the scale at which I was scanning was like approximately 100 extensions per minute. uh given that other rate limits and stuff and the whole entire back end is written in Golang for like concurrency and scalability. So this is like a quick overview of like my memory usage CPU usage and stuff to
just to give an idea like how intensive is the secret scanning like CPU secret scanning is a C CPU intensive job. So this in terms of findings these are like unverified findings as you can see like there are like a lot of uh API keys which are present on VS code extensions. Ideally in in an ideal world this should be like zero thing but as you can see like there are like a lot of API keys. So in terms of like a finer uh thing like if if like let's say I I want to remove the noise and let just look at live credentials. These are like the live credentials which I just tested before the talk. So as you can see like
private keys are like mostly along the lines of uh they are like SSH GitHub private keys where like you can kind of push code to some of the uh GitHub repos and stuff and in terms of OpenAI uh like as as I said like the wipe coding is kind of pretty much crazier nowadays. So you can see like open keys are being like pushed at a larger scale. In terms of MongoDB, uh the the trend which I was looking at was something along the lines of uh folks who are like just getting started with VS Code extensions and they're kind of trying their best to uh like kind of look at some basic things and stuff again as as the AI and stuff
like I I see some VS code extensions uh which are like more along the lines of uh enhancing user experience in terms of AI but uh like the gro and anthropic are something along those lines. And in terms of GCP, I did look at some of the projects uh keys and then most of them were like I would say along the lines of uh users who were like trying to work with GCP. But I did get some interesting findings along the GCP where like the like I I was able to kind of escalate the whole thing to admin and kind of get the admin administrator privileges on like one few of the GCP accounts. And in
terms of Azure SAS tokens uh like they were like more along the lines of uh supply chain. So in case like if you are able to kind of poison those supply chain that would be like another talk altogether. Again in terms of discord web hooks uh those those credentials were more along the lines of wipe coders like who just want to look at like say some stats or something along those lines. And in terms of git, github and github, I was able to kind of escalate uh these things to like oath and uh have like more of an attack chain along those lines. And again in terms of AWS and all these other credentials, the the impact
can be much more higher but I mean I I couldn't verify each and every credential but it was more along the lines of companies which are leaking more credentials and like users who are like just wipe coding and pushing things. One last interesting thing which uh kind of did make a difference like in my previous scans and like the current scan is like the superbase token. I guess that's again a wipe coding uh side effect I would say and that these are like some of the verified findings which I just looked before the talk and to kind of present them in like a better way like these are like the numbers and as you can see there like
which is an npm token uh I'm going to kind of look at that specific thing in the next slide. So given uh the uh npm token was like I I was able to kind of escalate that npm token and use uh like to kind of push push the JavaScript uh packages and then like have like a supply chain attack but given that I was able to do that before uh like attackers could do that's a pretty cool finding out of my research and on top of that uh this extension which uh which I found an npm uh token was like I guess last updated 4 years ago But like I mean that token is still valid. So I was pretty
much surprised like I mean why why there is one token and then it it it is like there and like VS code extension which was pushed four years ago. In terms of trends uh as as I said like as you can see the graph right so in 2022 2023 like there was like a lot of trends going around like crypto and like given it was covid era. So like as you can see uh like at that point of time most of the tokens which are like or like API keys which are being pushed to the VS code extenses were along the lines of infura and blockchain API keys. But in the recent shift like from 2023
to 2025 uh we can see like newer extensions are like kind of leaking more along the lines of AI related secrets like uh open AI and anthropic and hugging face. uh in terms of hugging face there is like an interesting uh case study which I couldn't put in the slide but which I could I wanted to share is like there was this company let's call it XYZ for like privacy reasons and stuff uh they did have like an hugging phase token exposed in one of the VS code extensions and I was able to kind of download their u like unreleased AI models and stuff and then like I did another sort of recursive thing where
like I kind of also scanned their unreleased AI models for extension and that had that do that did have like more secrets which were live. So it it's like uh this is just one case of this thing but it could kind of go and like can be part of like a larger chain of events and stuff like going uh if we explore like each and every token. Uh so one specific challenge which I wanted to kind of say in terms of uh rate limiting is like Microsoft was like pretty lenient on like the amount of like the amount of downloads which you could do but uh again given that I'm doing I I have like approximately like
116k extensions and then I I want to do that like the downloading part and the scanning part at a time and like most of my IPs were like kind of blacklisted. So I I I've come up with this sort of approach wherein like I used to have IPs uh in a queue like that's where like my AWS FIFO Q did work and then like I used to have those IPs uh which are in the queue and as you seen on the top and then I have like an associated TTL with them at the bottom like whenever like say I can kind of IP is white like block listed I'm going to quickly put them at
the back and then like have that TTL attached so that even though they kind of come down the queue uh those IPs are like not rate limited and stuff. So in case of IPs, I was using uh a paid VPN service, but I just don't uh like uh that was pretty much efficient to kind of bypass Microsoft's rate limiting. So one of the takeaway is uh is like I mean given that I'm a student and I'm I was able to do this sort of thing under $15. So I mean like attackers could do that at at a much more higher scale. So like secrets are literally everywhere. Like if you could look at like the
places like which developers commonly use you would find much more interesting things. And uh another like sort of uh takeaway for companies is like enable logging like literally like uh log logging would help you to kind of have the sort of uh visibility along the secrets and then like if you if you can have we can also have like some automated alerts and stuff so that like whenever a secret has been misused from like say a new IP location or something along those lines you could get a alert or something like that. And some of these secrets again as I said uh in like the case studies uh these can be kind of used for like a much more larger attack
campaigns. For example, as you might have seen in the previous slides there were like send great tokens and stuff as an attacker like they could use uh to kind of spend send spam emails or like also impersonate a company and then that could be like a larger attack thing. So one such future work which I kind of plan to do is like along the lines of uh expanding my secret scanning to uh Jet Brains and OpenVSX. So like Jet Brains is like another popular ID and they do have like extensions approximately I guess like the last time I checked along the lines of 2000 and Open VSX is like a new newer uh marketplace which most of
the AI code AI code editors use. So for example uh windsurf and uh windsurf and kodium or like another cursor use openvsx given that Microsoft extensions have like Microsoft VS code has got certain limitations and also like have to I would also kind of extend this project to have like anonymized metrics wherein like uh like say just give people like okay this is the developer who is going to who is kind of pushing too much secrets and then like also work with companies to resolve that at a scale. I mean one much one such motivation for me to kind of ext expand this to other marketplaces is like given that it's pretty cheap. So and and like
one sort of call for action is like if you're a maintainer of an extension or like like add secret scannings to your pre-commit hooks or like GitHub because it's free and you can also kind of uh rotate the secrets at that time and another like one intuion which I see is like people usually when they commit GitHub secrets and stuff like they kind of force push into the branch so that's like another research another researcher has done and then they were able to find secrets in the dangling commits. So technically like I would say just have a pre-commit hook that would save this all these hurdles and acknowledgements I kind of worked with no no hack labs and
then we were able to scan these things and verify the findings and shout out to my mentor Ming for like helping with all the presentation and stuff and thanks for the truffle team for open sourcing that thing. I was able to kind of do the analysis part wherein like I call specific APIs to look at like who owns uh like say this who owns a openi key or something like that. I would get much more details on like the emails and stuff and Azure and distill ocean for like giving free credits for students and thank you if you have any questions can [applause]
feel free to raise your hands if you guys have any questions.
Well, if not, um, please appreciate your speaker for the day and we're done for today. Thank you for coming, guys.