← All talks

State of API Security 2024: Insights from Analyzing 1 Million Domains

BSides Seattle28:23169 viewsPublished 2024-10Watch on YouTube ↗
Speakers
Tags
About this talk
Join Tristan Kalos & Antoine Carossio from Escape, for insights on critical risks from exposed API tokens. Their groundbreaking research, analyzing 1 million domains, uncovered 18,000+ API tokens and RSA keys accessible without authentication. 41% were highly critical. They will share his unique web scanning methodology, dive into sensitive API data revealing potential severe financial losses (up to 17 million $), and draw parallels to standard API security threats. Going beyond the findings, they'll present actionable remediation strategies and provide a practical API security checklist. Leave equipped with a clear path to secure your APIs.
Show transcript [en]

hello everyone um it's great to be here today uh our next talk is about the IPI secret scroll so basically it's the result of a research that we conducted in a research teaml Escape about how can we get for API secret tokens if we scan 1 million websites on the internet um so in this talk uh we will cover uh why first we started this research of looking for IPI secrets in the wild how we detected hardcoded secrets in front end code so there is an analysis of the technical uh process that we created for detecting hardcoded Secrets what critical findings we found and including millions of dollars in tokens and we come back to that later and finally we

will cover the recommendations that we can use and leverage to reduce the risk of leaking secret tokens in front end code before we start I'm trist I'm the co-founder and CEO of Escape uh and I'm gradate for from UC Berkeley an next researcher in AA applied to cyber security um I got hacked myself so I started a c security company and today I'm here with my co-founder Antoine with the CTO of Escape um he's a passionate open source contributor a creator of graphical armor um and he's a huge Apple fan he worked in at Apple in cyber security uh if you didn't get it from my accent already uh we flew from Paris to here so we are both French and we work

half in the US half in France currently so just to give you a bit of a word about Escape um Escape is the API security platform so basically we have an agentless approach to discover and help you secure all your apis efficiently we connect to the tools that you already use so by agentless scanning clot providers API gateways developer tool and we use this to perform automated API Discovery protect your apis at run time and run automated Security in the cicd through API Dev setups and then we export all the results in the tools that you already use okay so we have a research team at Escape that focuses there is no screen anymore okay we have a research email

Escape that focuses on doing um researching API security of All Sorts so we created last year a report the state of graphical security where we analyzed more than 1,500 public graphical apis and found more than 40,000 security results and then we published a second research report the state of public apis which analyzed more than 6,000 public apis for security vulnerability and we uh explain what we found and what that that says about the state of API security and the topic of today is another report that we recently published which is the state of IPI security the API secret SC so you can download this report using this QR code H it contains all the results that

we're going to present in this talk but more in details and with graphs and explanations and Analysis of uh the findings okay so first why uh is an API secret leak a problem why should we not leak Secrets obviously um so every company now has resources uh on the web and the devop team struggle to deploy assets effectively and securely so you have like automated deployments of websites so how can we do that securely so there are many examples um of brides that come from exposed API and exposed secrets on the internet that can be easily exploited without advanced knowledge to fetch sensitive data or to gain access to systems if we analyze what we have to

secure our applications and API we have like many different techniques and tools so we can start with threet modeling in the design phase we can do software component analysis in the development phase static analysis and code analysis Dynamic application security testing and then runtime protection so we have many different different tools and many different processes to create more secure applications but what about secrets so for secrets we have a few tools like git Guardian git leaks or sraps that can uh help find secrets in git repositories we have uh API gateways and dynamic testing tools like Escape that can find secrets in apis when they're deployed but what about front end that could be embedded in public L accessible

front end code like JavaScript what happens if my developer embed front end secrets in the front end code so this is the question that we ask ourselves and this is why we decided to do this research and I will let my co-founder C Anto explain to you how technically we conducted This research thank you very much foran so here uh the C and Stage so this part is going to be a bit Technical and we felt like this was very important that everyone gets the full pictures U to that explains why it's really hard to find tokens uh currently today in front ends so uh before U doing too complicated to complicated things I wanted to know who in the audience is

familiar with frontend development by raising hands okay so it's roughly 15 20% of the audience so let's uh get one step back so that you can understand a little bit how front end development works and this will explain you why Finding secrets in front end is still a challenge today so basically um front end developments and front ends consist in static files so basically HTML CSS and JavaScript so normally you are all aware of those static files those files they are downloaded by your browser when you're browsing the web all right and uh they are used by your browser to render web pages the problem is that in modern application development um those uh static files are

not pretty convenient to do modern applications uh so um and and very very Dynamic application that are compatible with your mobile and so on that's why uh frontend developers they chose to use and they created Frameworks for frontend developments so here you have the three main frame which which are react VJs and angularjs and those Frameworks basically consist in creating very complex front end applications for modern web uh by building small modules with complex dependencies okay the thing is that your browser is still not able to read this front end uh this this framework code and that's why you use what is called bunders when webpack is one of the most famous bundlers used today on the market

and those Benders they are used to transform to convert this framework this framework code sorry my m is not working anymore thank you Jon so um the those bands they are used to transform your framework code into static files that are understandable by your brother okay so this um is basically the output of um the bundlers especially for the javasript files that we saw earlier and this is really important we choose to focus on JavaScript files because in JavaScript files you have the business logic of the app which is coded the algorithmical part and then um this is in JavaScript files that were the most likely that it is the most likely to find actually secret tokens looking from

apis okay so how do we do that um the first thing that is important to understand is that there are two kinds of tokens uh we can split tokens in two categories AP tokens in two categories the documented tokens so those are tokens that are used in many web applications on the web and they are public publicly documented for example if you use open AI API strap API and so on you use documented tokens in that case finding them on front ends it's actually pretty easy you just use pattern matching and the goldx does the job just a few words on those documented tokens there are a lot of tokens that are actually just public Keys such as

Google Maps keys for example we didn't care uh we filtered out those keys into this research because they are not relevant for the research and they are they are not leading to sensitive data and so on for example Google Map public key is just public but it's still a token uh same for Amazon F3 more that those token they are not public but they're very scoped they have very low privilege so we also filter that kind of um the report so every result we're going to present you afterwards does not include those results that's that was an important uh note all right so but what about um the undocumented Tokens The Tokens that are built by your developers

in your companies uh to create applications and to connect to their internal apis and that are not um actually public uh to the internet that are giving access to your application that by the application that develop internally the classic way of doing that uh is to use entropy so for example before Tren talked about uh static analysis tool the statical analysis tool they are able to find uh strings in well structured code um that are complex with a lot of random characters and so on and do strings hence they are very likely to be uh API tokens the thing is that as we just saw earlier uh in the JavaScript bundles there is already a lot of entropy by

Design so this is what we just saw earlier so using the entropy strategy uh only cannot work it leads to tons of false positive because every single of the character here could be considered actually as a token so this is a bad strategy especially uh we are not authorized to test the tokens and we take a lot of times to test all the potentially extracted strings with entropy to see if there are real tokens and so on uh but obviously in the study we only want to focus on the real tokens on the real positives the true positives and this is where comes the algorithmics uh to the force we managed to leverage as abstract syntax tree to

extract the tokens with a high confidence signal so a few words about ASC I won't Deep dive into the detail here because it's pretty complex but the thing is that what you need to understand here is that we restructure the code okay to understand the context where the variables are used declared and for example something very basic is that we are able to identify the context where there are IPI calls in the JavaScript bundle files and in those specific context if they are high entropy variables they are very likely to be API tokens and actually this strategy U was a huge success so the cool thing uh is that this strategy is also very um cost

effective uh in terms of computing power it requires almost nothing so I want to dive into this architecture but was what really Mires that we uted 1 million domain names into this algorithm into this architecture we filtered out uh the sensitive domain names such as the government domain name and so I saw the FBI was giving a talk in the other conference we don't want to be in trouble with them uh we manag so this leads to uh stripping almost uh 15 50 ,000 domains that's why there is not 1 million domains so I forgot to mention we we took the the top 1 million domains we managed to to scan four domains per seconds and all the subdomains

associated so this pretty fast and cost effective for a total time of computation of 70 hours approximately and in total we managed to parse front ends of almost 200 Millions uh front ends and and and JavaScript pages so this is quite huge for only 70 hours on computation and what is really fun that all of this only cost us 100 bucks and Asin is going to present you uh now this investment but the most profitable investment we've ever made uh thank you so so let's uh so we I'm try to explain how uh we managed to scan all those websites and how we managed to strip false positives secrets to focus on what really matters um now let's take

a look at the results and especially how we traded $100 of computation power for $20 million um in total we found almost 9,000 uh Secrets overall in all those websites so that's a lot of leaking secrets on the internet um if we take a look there are a lot of private can you go please yeah um if we take a look there are a lot of private API Keys uh GitHub access tokens and gitlab access tokens so the developers are leaking their access tokens on the internet in their frontend code um you might notice there is a bit of open IPI Keys here that are leaking also in secret in frontend code and on the top you can see

there are like twitch tokens Discord but also stripe tokens that accounted for almost 1% of the found Secrets which uh is frightening to be honest okay um if we take a look at like the domains that uh were leaking the tokens we can see that obviously almost half of the results came from do com uh domains and uh in interestingly the the the the country that is leaking the most tokens is Brazil uh so yeah they got the the the first uh first place um and also in the U um it's the Netherlands that are leing the most tokens in in their website so interestingly we can make a comparison of the the maturity of

security processes in different countries um when it comes to the industry uh really all Industries are represented uh of course media are very represented gaming Aerospace online dating Tech uh so many many Industries are represented uh in the leaking tokens and what's interesting is that the biggest number of Secrets exposed uh in the domain is 28 so there is a website that is almost leing 30 Secrets but also the average Expos Secrets per vulnerable domain is 1.7 which means if you leak one secret you are way more likely to leak others because it probably means that you don't have like the best practices implemented to avoid this from happening so there is a correlation when

you fail once generally you have other tokens leaking okay um if we take a look at how they are leaking uh we can see that frontend development back practices are still prevalent so this is a nice example of a developer calling the open AI API from the front end with including their private open I key in their front end code uh for everyone to see so obviously you should do that on the server and then have your API communicating back the results on the front end code and this is an example of something that can leak your token very easily but this is not the most complicated and the like this is something that is already known so if

you implement the best practices you should probably not have that however there is something that we noticed that is very common even in organization that have implemented secret scanning in the repositories because today as Antoine explained there is like a web Packer like a a bundler like web pack that use your your developers code and bundles it in HTML JavaScript and CSS this happens in the build time generally in the cicd and this bundler has access has access to the environment variables of your cicd and what happens and we notice that sometimes this bundler is not well configured there is a misconfiguration and it actually packages the files directly inside of the front end code so we have found many

front end bundles with just the do n file that is like just bundled with the JavaScript and sometimes the m files contains only like private information about the cicd private cicd tokens or Services tokens but sometimes it also if you're in a monor repo context contains the environment variables of the backend code so this is really a huge problem and we noticed it a lot of times in this study so overall over all the secrets that we found uh about a third could lead to an entire business shutdown so very critical secrets that could be exploited to create huge consequences on the companies that were affected we we found for instance uh hardcoded RSA private

Keys which you don't want to do that and in especially millions of dollars in exposed stripe tokens so for instance we found one single token that was hardcoded in a font and JavaScript code and that could lead to $17 million of money in the stripe account and it was a private stripe token that was directly Exposed on the front end code so now how do we avoid that happening to us so what mitigation strategies can we put in place to avoid this from happening to our companies and I will give back my miek to anine I think it's working perfect thank you very much Christan so now just a very basic recommendations about how to avoid uh those kind of faks

and to avoid to lose millions of dollars in your company for just stupid tokens left in the front ends okay so as Tristan said the first thing of course it's very basic but the thing that uh developers are pretty lazy okay especially frontend developers they are making API calls directly without talking to the backend developers and especially here we take the example of open AI so this is a huge Trend at the moment the front end developers they want to integrate chatbots and so directly in their front ends um and they are just leing the organization of token uh so where at the good practice obviously is to um uh use the back end code because the

back end code contrary to the front end code cannot be accessed by the the customers by the browser only the browser the browser only downloads the front and code so this is obviously the the right way to perform the action it it requires a little bit more synchronization and Laten see in the development but like this is the the very basic of security okay the other thing is to leverage actually type prefixes so we talked earlier about uh documented uh Secrets what is really cool is that internally instead of generating random strings you also uh type your um in internal tokens your uh private tokens your homemade tokens that you type them as if they were public okay so you

leverage type prefixes you have uh libraries to do that in every languages so this type IDE is for a go I guess um and this allows you to um to make um the static analyzers work properly inside your repositories what mean is that if you consider um that your tokens should be properly documented and typed uh even the tokens that are developed internally you can just implement the rig xes inside your static analyzers for example Sam graph and hence they won't be able to Le on the internet because they will be easy to identify in the code sup we saw earlier even in the bundle javasript so for example strip is doing that really well and all of their tokens uh

they are prefix it uh according to the usage of the token okay obviously instead of storing the tokens uh inside the environment files so environment files are fine for plenty of uh usage but not for storing Secrets all right so here you can leverage uh Vols as a service such as AWS secret manager Ash ashik Vol or in gitlab and GB you can also store your tokens uh and obviously uh those tokens since they are not in environment files they won't be bundled uh by web pack into your front end uh applications again uh something that is really simple uh but you should use scoped tokens which means tokens that are only giving access to what uh your

user needs okay not to Broad access tokens and you should also use um tokens that are expiring because uh leaking token that is expiring is still better that a token that is has an infinite expiration time for this you have tools such as Vol key rotation Vol Dynamic secrets that can help you uh implementing this process at scale in your company okay so thank you very much for listening us um if you you want to see the full report and download the full report as PDF feel free also we really invite you to ask us question during the conference or even after by adding us on LinkedIn uh we will be happy to answer you especially if you want the slides uh

we can show you the the slides with on need thank you very [Applause] much yes so you talked about uh that there was some critical findings like which could lead to a business shutdown you showed us an RSA key uh is it possible to use any of those two you know as an initial access point or I mean initial access point to the organization yes so uh this is exactly what we we meant so basically the the the most critical findings were three things uh mainly um so RSA keys that could give access to like the company servers um code repositories tokens like GitHub gitlab or cic tokens where you could actually like get into the code

repository change the code change the deployment you know like attack the supply chain and the last one was like everything that is related to stripe or two money uh basically where efficiently you could steal like $17 million from someone uh using their their leaking tokens thank yeah so you had a slide with your infrastructure for the scanning is there a deep dive you guys have somewh on more that infrastructure it's in in the report it's a bit more detail but basically just very simple kuber scaling uh we just have bus that are running in parallel there is nothing crazy in that architecture but for sure you can see more report more details in the report or we can discuss that

directly on Eng feel free to to contact me um one last thing uh that is worth mentioning um we have implemented this detection strategy in in Escape itself and Escape has a free tier so you can just go on the escape. tech website log in and plug in your domain and we'll tell you if this applies to you is there a way to monitor or do this same exact thing on onion sites uh in internal websites oh no do onion website like oh um I I I don't know like on the tour websites we we could try um yeah we would try we didn't do that we just used uh public domains but it would work the same with

store it's just that you would have to install store thore sorry in the in the middle of the the traffic but there is no difference just it R through the the domains and you do it but we didn't do that in in the study maybe for the next stud maybe maybe no we have um a lot of research incoming with more cool reports that will be published soon okay any other questions than

for I'm sorry IOU please sorry sotion you oh yeah yeah because they are they are documented they reges so since they are known the the pattern is known just F mod that's all yeah is thearch only and not on appliation like appliation AP this is another another study but yeah here it's only for the front ends but you can find uh maybe we we we have the state of apis which includes part with the secrets yeah um is another reporting coming in like one month and a half about that H we are currently working on it thank you okay I thought you had the question thank you very much yeah thank you very much