Decoding GraphQL: How to Map Hidden Vulnerabilities

Name: Decoding GraphQL: How to Map Hidden Vulnerabilities
Uploaded: 2025-06-04
Duration: 31 min 37 s
Description: Decoding GraphQL: How to Map Hidden Vulnerabilities Antoine Carossio, Tristan Kalos GraphQL APIs offer flexibility and efficiency but often introduce security risks that remain hidden in the shadows. In this session, we’ll share findings from scanning GraphQL APIs, revealing vulnerabilities like s

BSidesSF · 202531:37490 viewsPublished 2025-06Watch on YouTube ↗

Speakers

Antoine Carossio Tristan Kalos

Tags

CategoryTechnical

TopicWeb AppSec

StyleTalk

About this talk

Decoding GraphQL: How to Map Hidden Vulnerabilities Antoine Carossio, Tristan Kalos GraphQL APIs offer flexibility and efficiency but often introduce security risks that remain hidden in the shadows. In this session, we’ll share findings from scanning GraphQL APIs, revealing vulnerabilities like schema leaks, brute-force risks, and GraphQL-specific "bomb" attacks. https://bsidessf2025.sched.com/event/49d7000615ff8ff4f596ec2827277532

Show transcript [en]

Good afternoon. Uh we have today Antoan and Tristan talking about decoding. What did miss? Yes. Decoding GraphQL. Thank you. How to map hidden vulnerabilities. As always, all Q&A will be over Slido. So besides SF.org/q&A, so please have your questions and answers over there. We are not going to have a verbal Q&A live. So with that, Tristan. Thank you. Um hello everyone. and glad to be here today at besides 2025 and to present the result of our research. Uh the topic of today's talk will be decoding graphql how to map hidden vulnerabilities at internet scale. I'm very glad to be here today and to have this opportunity to present our research uh to the besides attendees. Um on

today's agenda we'll present uh the motivation. So why did we conduct a research on the specific topic of GraphQL security and why it is a blind spot uh today in the absc programs of many companies. Um then we will cover the methodology how to scan GraphQL applications in the wild internet. How do we discover them? How do we scan them? Um and then we'll analyze the results of our study and specifically what are the critical vulnerabilities that we found on GraphQL applications and how to solve them, how to reinforce your GraphQL applications to make them more safe and more secure. Um there will be a little bonus if we have the time, if we have the time

only about some open source GraphQL security tools uh for both blue and red teams. So everyone will have a little bit of uh tooling at the end. Okay. I'm Tristan. I'm the co-founder and CEO of Escape. Uh I was formerly a software engineer and a researcher in AI applied to source code analysis. Um I noticed my company getting hacked firsthand through an API. They stole the world database. So I decided to create a cyber security company uh to help other people avoid that. And today I'm joined by Antoine, my co-founder and CTO at Escape. Hi everyone. Um, Antoine is also a former uh security engineer, pentester uh and also a huge Apple fan and an open source

contributor. All right, very quickly just to quickly present Escape before I move on to the research. Um, escape is rebuilding the appsec stack from scratch for the AI age. Uh we provide application attack surface and dust at the business logic level built entirely in house for modern stacks and modern developers. Um we discover shadow endpoints in particular and validate authorization authentication bolas id doors via custom AI model and we help developers with framework specific code fixes for every finding. Okay. We have a long history of producing research. Um so we published the state of public APIs in 2023. Um the state of API security the API secret in 2024 and the results of today's presentation are compiled in a research

which is called the state of graphical security. Uh this research is available online for free so you can check it out. The full report contains a bit more data than the talk but the talk focuses more on the technical part. So how did we get there? Okay. Um, GraphQL is uh everywhere and it's very vulnerable. So, here are a few example of a lot of wakeup stories with a lot of vulnerabilities found in GraphQL endpoints including vulnerabilities are playing uh affecting payment systems affecting user sensitive data and so on so forth. So, there are a lot of vulnerabilities that have been popping up in production graphical APIs with high consequences. But I will let Antoan um cover this part

more in depth. Okay, thank you Tristan. Um indeed here um we are going to deep dive a little bit more into the technical aspect and uh first uh focus on GraphQL a little bit more and explain uh why GraphQL different is different in terms of security. So first uh who in the audience here is working in a company um that is using graphql. Okay. So it's more and more each year um and this is corroborated by by the recent data we have here. So from Gartner for example that estimate that almost 40% of enterprise company will use graphql within two years which is way more uh than um what is was a few years ago in 2021 from 10% to 40%. uh we

have more and more cloud providers such as AWS that is building dedic dedicated graphical services such as epsync for example uh and more and more companies in every single 500 500 fortunes vertic for verticals of the fortune 500 sorry that is progressively moving to GraphQL or exposing um their main API as a graphical service such as Spotify is doing this year in 2025 and completely dropping I mean dropping progressively sorry his rest api and the thing that graphql is completely different from rest so it requires a different security strategy actually so the first thing we have to understand here um is that graphql is a allows a federation of your APIs so behind uh your main uh single

entry point your main graphical entry points um you can have different kinds of APIs behind it whether they sub graphql apps um or other rest APIs or even other kind of APIs and those APIs can be either exposed on the internet or um there can be internal uh APIs that are not exposed to the internet but that are reachable through the main uh graphql endpoints. Um so now we are going to focus a little bit more on GraphQL as a language and GraphQL is a full feature language okay with a lot of features with depth with width in his queries and with uh every single of those features such as directives parameters aliases and everything uh comes potential

vulnerabilities and exploitations. Um so this is why GraphQL is so um uh special to to secure today because with great powers comes also with great responsibilities. The second thing is that GraphQL obviously organizes your data as a graph. Okay. So when you have data specifically located in your graph, you have to make sure that every single path linking to this data is actually secured the same way, which makes access control a real nightmare in GraphQL. Yeah, sorry. Yeah, it's very low, but it's okay. So um here um we have an example of how to exploit a typical GraphQL venerability. Um we can see that we manage in one single HTTP request to send multiple GraphQL queries. So by

doing this we actually bypass the ability the the the rate limit that is implemented at the server side level by just sending one HTTP request but multiple graphical queries. And then we we here we brute force uh the login route. So this is what it gives actually in um in the playground in the graph playground. We can see we bypasses uh we bypassed um uh the rate limits in one single request and we brute force the password and we see that the last try here actually allowed us to find the real token for the for the user that is authenticating here. Okay. So let's recap here. We have a new type of APIs with large art tech

surface. Um this um um single point um contains all all of your data and makes all your data accessible to the public internet and it has a very complicated access control model. So what could possibly go wrong? This is what we are going to to um show here and we are going to explain you how we managed to get this data and and conducted this study. All right. So, um what did we do in this study? We actually took the top 1 million uh domain lists uh containing 1 million unique to top level domains. All right. Some of them of the or domains from the Fortune 500 or other very large companies worldwide. um we actually um used discovery and

fingerprinting techniques to find graphical APIs um exposed on those company domains. Um and this is how we found almost 200k graphql APIs. Um and onethird this is very interesting oneird of those APIs had actually introspection open. We'll come back to that a little bit later. And then we conducted security testing on those APIs and then we found vulnerabilities and this is what Tristan is going to to show you at the end um of the talk. In this first part uh let's focus um on discovery and fingerprinting of the APIs. And uh I'm going to explain you how technically we managed to find so many graphical APIs at scale in the wild exposed on the internet. Okay. Um the goal of

discovering APIs um is to find actually assets on the attack surface. This is very common in any offensive security um um strategy. Uh but we also needed to fingerprint a little bit more the graphical APIs just first to make sure they are indeed graphical APIs and not false positives or or honey pots. Um and it al also allowed us to to collect some data about the different frameworks graphical frameworks that are used uh the the the language is the introspection open or or anything. To that end, we built actually a tool that is today open source and that you can find G on GitHub that is called Gtopus. That is an all-in-one GraphQL discovery and fingerprinting tool

written actually in Golang. How did we use it? Um here there is an example on one single domain. So I took my favorite company here as an example or almost my favorite company. And the first thing we do from the domain that we perform subdomain enumerations to be able to cover as much as of the attack surface behind this domain, this main domain. Then we were able to find a lot of potential URLs here that might be APIs, front end, static assets or many other things. But here what we really try to find is we really try to fingerprint by sending specific queries and analyzing the response. We really are trying to find GraphQL APIs um and we really want to

identify GraphQL for from any other kind of data or services. So as output we not only have GraphQL APIs but also GraphQL introspections which is the schema like the open API schema but for graphql that is defining the data model behind the API which is extremely relevant for the security testing that we are performing just after. Okay. So actually we ran this at scale. We built a specific infrastructure for that. I won't really enter into the details here but as mentioned before we took the the top 1 million domains on the internet. We actually excluded a bit the sensitive government governmental domain names to to avoid having any issues and this allowed us um to find almost 200k

graphql APIs exposed on the internet uh with more than 60k of them that has actually their schema completely open to the internet. All right. So now in this second part I'm going to explain you how we managed to test those APIs at scale find vulnerabilities at scale and why it was part particularly challenging uh in the case of GraphQL. Okay. So here to take my back my my first schema this is the part we are going to focus on. Uh here we actually built a completely new algorithm here that is a reinforcement uh uh learning algorithm that is taking as input not only the URL of the service but also the introspection as as mentioned as as I

mentioned before to run automated application security testing on it and to find not only business logic API vulnerabilities but also sensitive data and le leaking information through the API. So why was it challenging? First uh again uh graphql is a graph so you already saw that. So we must make sure that we explore actually all path uh in the graph especially if we really want to test properly the access control of the API and make sure that all path link the same resource or secured the same way. And the second problem is that classic testing cannot really assess the business logic of APIs. Indeed, uh if you just randomly fuzz or send random data even generated by LLMs to the APIs,

it has great chance not to pass the data validation layer. Uh and the request will actually never end up testing the business logic of the API uh and just get stuck before at the data validation layer. So this is really why we built this reinforcement learning algorithm that consists in making smart requests in a very specific uh order and depending on this request order we were able to reingject data in the API. Uh and this is exactly the example that I'm going to show you here. So let's take the example of this graphical mutation that allows users to book a hotel in a I mean a room in a hotel. And you can see this that this mutation takes as input

three parameters. A note id which is here in that case an integer why not a room type which is just a string uh and the email from the guest. A classic dust that will just uh send requests based on the typing could actually send that kind of request. an integer one two three a room type that is completely a random string because it's just a string so it doesn't mean anything for the classic dust and the guest email which is again a string so it will just send a string and see what's going what's going on so here the the problems are obvious the values don't only respect the the data types integer and string and nothing

more so there is no connection uh to the real API or business logic here we can see this is particularly problematic for the guest where it makes no sense at all. Um what happens if we add an LLM to the this strategy? Okay, so this looks a bit better. Uh we have a not ID which is still a bit random but why not? It's still an integer that that could make sense. Uh we have a room type here uh that could be delivered I mean um generated by an LLM2. It's a delux room. Okay, why not? and a guest email that is a sample email that really looks like an email. So this looks more interesting

with more credible data. But the problem is that here the data and the the the request is based on general knowledge. Uh it does not verify uh with the actual API if this data actually exists and makes sense. So this is why really we really really created this feedbackdriven semantic AP exploration algorithm that un not only generate values that make sense but that also validates beforehand in previous requests that they actually exist in the API. So this is completely different in that case. Uh the hotel ID has been retrived from an earlier request that might be for example list hotels in an API call that takes no parameter. For example, a room time deluxe one that is

a valid room and that is available in this hotel today. Um that is confirmer confirmed to through previous API requests. And then the guest email here, which is actually mine, um might even be a real user that has previously registered in the on the the hotel or just um a user that is used to run an authenticated scan in this case. Okay, so this is a quick summary of the the three main steps um of the algorithm and and what what makes it very different from what existed before and where uh this makes this research so valuable actually. So if you want to know more about the algorithm by itself um it's an algorithm that is based on graph theory.

I really let you read this article which is way more complete than what I said just today. uh and I will be happy to answer all of your questions uh after the call after the the talk or maybe if you you want to add me on LinkedIn I will be happy to to answer the questions. Okay. So thank you very much. Now I will let Tristan to really introduce you the findings uh of the studies and the main vulnerabilities we managed to find. Thank you very much. Thank you Antoan. So so far what we uh we covered three things right like why is graphql vulnerable and why is it important why is it a blind spot in many

absec programs then we covered how we discovered almost two 200,000s graphql APIs exposed on the internet and then we explained how we created a smart algorithm to find advanced vulnerabilities in graphql APIs in a fully automated scalable way. So now it's time to analyze those results and try to see what we found um in those 200,000s GraphQL APIs exposed on the internet. So we found almost 90 issues uh per GraphQL service. Interestingly this is three times the amount that we found in 2023. So it's both that we improve our algorithm but also that probably the graphical ecosystem is becoming more vulnerable. there are more vulnerabilities that are appearing in GraphQL engines because of security research on the topic of

GraphQL. Let's take a look at what we found. Um, if we take a look at the OASP top 10 API for GraphQL applications, we can see that the first and major issue that is present in almost all GraphQL endpoints is unrestricted resource consumption. Um, the reason is simple. GraphQL is a very wide language with a lot of possibilities to combine and create complex queries and very often the graphical APIs are not protected properly against misuse or queries that are overly complex and so many GraphQL endpoints are vulnerable to denial of service attacks for instance. The second part is security misconfigurations. There are a lot of GraphQL APIs that have basic obvious security misconfigurations which shows that um

the best practices are not well known and well mastered um overall by both the developers and sometimes the security teams. And finally the third one uh is broken function level authorization. Um so as as we explained at the beginning access control in graphql is complicated because of the graph nature of the API and and this is proven by the data right we see bflas bas are typical vulnerabilities existing in graphql APIs uh from from our sample um the first one is interesting serverside request forgery we'll cover why there are so many ssrf appearing specifically on on graphql endpoints Okay, the first thing that is interesting to note is that it's GraphQL, but there are still common API

vulnerabilities that are present. Um, stack traces and error message disclosing sensitive information. There are a lot of tools. We found 12% of GraphQL APIs exposed on the internet were disclosing internal information in error and stack traces. And this can be a problem especially because um it can become very verbose and give to attackers information about uh the software components that are used by the graphical API or by the server which can then lead to easily easy exploitation of vulnerabilities in those known components. So for instance as an example here we can see that in a strace we find that it's using graphical gateway 0.6.1 6.1 and we can see that um in this gateway there is a a

vulnerability that is known. Okay. Um so this is specifically complicated to fight because GraphQL as we said is used as a gateway as an aggregator for many different APIs and so sometimes even deactivating stack traces at the federated level doesn't prevent stack traces from underlying APIs to end back to the client and that's very complicated because very often the teams building the underlying APIs think that those are internal they're not exposed directly in the internet. So they don't have the same level of security that the one that are exposed. Um this combination creates many situations where there is sensitive data that is leaking through graphical APIs and we've seen that a lot. Okay. Second problem access control

issues. Um GraphQL contrary to HTTP APIs or pure HTTP APIs doesn't has get or post or put batch delete request. It has two kind of request queries for fetching data and mutations for editing data. Um the problem is that the whole HTTP stack has been built based on get post batch request and so very often we could see mutations that weren't protected uh by authorization or authentication at all. Um so in this example for instance we found a mutation called update payment that was accessible to any user on the internet no authentication no token no requirements uh which can be a huge uh security concern in in almost 10% of endpoints we found uh this type

of issues uh with mutations that were accessible to unauthorized or unauthenticated users. And finally the good old injections um are still there. Uh so we found 21 uh nonprotected JO tokens. Uh so which mean that anyone on the internet can craft a GWT token, forge one uh that has the same level of authorization. Um and we found two XML external entity injections um which probably happens when you connect GraphQL to legacy systems uh together. All right. Um yet we covered the classic API vulnerabilities but that's not what we found the most. There are a lot of GraphQL specific risks that are emerging and that are appearing very frequently in our data set. So what are

DS? We find a lot of schemal leaks, denial of service opportunities, brute forcing and a specific vulnerability that escape security research team developed internally which is the GraphQL bombs. They will cover what it means later. Okay. So the first uh vulnerability that is GraphQL specific and that we find very often are the internal API schema leak. Um so GraphQL APIs by default will disclose all their internal schemas um to anyone on the internet willing to analyze what can be done with the API. For security reasons, many teams they activate this ability to uncover the schema on production APIs uh because you don't want to disclose that necessarily uh to your customers or to uh users that are external to your

company. However, there is a feature in many GraphQL servers that allows to correct user typos and suggest the right fields if the user made a typo when querying the server. And so for instance if someone try to access create session with one s the server will send back hey did you mean create session with 2s or create user or create file or whatever. Um the interesting part is we can use that with a smart fuzzing system to actually uncover what types exist in the graphical API because the server is sending them back to us. And so for instance, this is a real life example from escape security research team um where the introspection was deactivated

and yet we were able to build back this full schema just by smartly fuzzing the endpoint and uncovering what were the right types. And in this specific case this team found that you know security by obscurity was something that worked. And so there was an update admin mutation. What a nice name. Um and this mutation since it was not supposed to be accessed by anyone was not protected by any form of authentication or authorization which means that we were actually able to change the password of the admin user of this S. So this is a real SAS platform. It's a real real story. Um so this is a critical um vulnerability that can happen in GraphQL

and that is very common uh in GraphQL endpoints. Okay. Second uh interesting GraphQL specific vulnerability uh fragments. So fragments are a piece of logic that can be reused between multiple graphical queries. It's very similar to a function in a programming language. Uh but what happens if uh we create a fragment that calls itself? So a recursive function. um then it will enter the GraphQL server will enter an infinite recursive loop and it will end up crashing because we will exceed the maximum call stack size. Um what's specifically interesting is that it's very hard to catch those type of attacks and we were able to create denial of services PC's on many production graphical server with this simple

request here. Um which shows how easy it is to create den of service on graphical APIs. Okay. Um for instance we found a CV uh that we disclosed on the Juniper uh service uh and we found multiple denial of service uh vulnerabilities at among companies um that has been since resolved. All right. Um another example of an interesting usage of GraphQL specific features is GraphQL supports batching or uh sending multiple queries in one single HTTP request. Um so this is nice. It can al to save uh HTTP calls but it can also be exploited to bypass HTTP level rate limits. So for instance, if you have a login mutation to get a token, this is what example that was

shown at the beginning of this talk. Um you can actually send one single HTTP request that attempts 1,000 different combination of login and passwords. And if your rate limiting is done at the HTTP level, you can just bypass it and attempt thousands and thousands of combination ultimately impersonating users. Okay. Finally, the last uh vulnerability that uh we found is graphical bombs. Um it exploits the same uh batching and aliasing capabilities that we used uh but for file. And if we alias a file of one megabyte and we create 1,000 aliases of this one megabyte file, the server will expand all of it and create a one gigabyte file on the server side. So this can create a

lot of den of service and you know saturating the the storage of servers with one me one simple one megabyte request. There is a blog post that details more in depth how we exploited that and and how we we did the P of of this exploit. Okay, finally, um, GraphQL is very prone to sensitive data exposure. Uh, we found more than 4,000 secrets in stack traces in GraphQL endpoints, including 49 passwords, uh, two credit cards numbers and about 1,396 uh, access tokens. Uh, so there is a a concerning sprawl of secrets that are exposed through GraphQL APIs that are unprotected. Um and this is really something that as an industry we should take a look

at. Okay. So after we have seen all that um if there is one key takeaway from this talk this would be uh the graphical uh secure by default checklist. Um so the quick wins are mostly very basic stuff that exists for all APIs. Um the medium list would be really related to graphules specifically. So implementing depth query and cost limit rate limiting aliasing secrets management. Um and then the really to go in depth into security uh what is really important is field level uh role-based access control uh federation boundary hardening uh implementing continuous fast testing with escape or another tool uh and finally having top ofass observability to really ensure that the graphical APIs are not exploited in a micious way uh by

by actors. Okay, that is the end of talk. Um, I'm Tristan and I was joined by my co-founder Antoan. Thank you for your attention and we're uh happy to answer any of your questions in the 30 seconds that we have left. [Applause] Thank you Tristan and Antoan and thank you everyone. Our next talk will start in 15 minutes. If you like to talk to Tristan, please have those conversations in city view. Again, thank you. Thank you everyone. Before we jump, uh, I just have one, uh, quick open source tool to show everyone. Uh, we created the open source tool, GraphQL Armor. It's a simple security middleware for GraphQL that implements 10 best practices out of

the box in one line of code. It's used by a lot of companies in the Fortune 500. It's battle tested, so check it out.

Decoding GraphQL: How to Map Hidden Vulnerabilities

Related talks