
Good morning and welcome to Besides Las Vegas's ground truth track. This talk is autonomous discovery of logic-based API vulnerabilities given by Taha and Deve. Few announcements before we begin. First, we'd like to thank our sponsors, especially our diamond sponsors Adobe and Aikido and our gold sponsors profit and run zero. It's their support along with our other sponsors, donors, and volunteers that make this event possible. These talks are being streamed live. And as a courtesy to our speakers and audience, we ask that you check to make sure your cell phones are set to silent. If you have a question, there will be an audience microphone that I will have so that the stream can hear you.
Additionally, there will be a data science meetup on Tuesday at 7 PM at the Tuscanyany pool by the entrance. With that, let's get started. Please welcome Taha and Deir. [applause] >> Hey, thanks everyone for coming. I'm delighted to be here. Uh so we'll be talking about um discovering logic based API vulnerabilities autonomously. Uh so you've been if you've been following the news uh following Twitter, LinkedIn, uh you're probably familiar with uh several AI agents that uh surpass uh human hackers and capabilities and uh that's awesome. They're they're they're doing great jobs. Um but that really depends on what metrics we have. uh if you look at the number of XSS vulnerabilities they find uh they're doing much a much better job
than all uh human hackers on hacker one perfect but uh if you were to take another metric like uh if you were to consider the severity of these vulnerabilities uh then you would probably look at um say a list like OS top 10 and uh the critical vulnerabilities would be vulnerabilities like broken access control and um again if you look at this list. Uh, broken access control is the is on the top of the list. And if we uh filter by broken access control vulnerabilities on hacker one, uh then we see that uh everyone on the leaderboard uh including after the fifth are humans. Uh so this gives us uh some hope for uh for for humanity and uh
as as human hackers in general uh I guess. And so um I'm Taha. I found vulnerabilities uh in applications of Apple, DoD, Nvidia, Sony uh and and and other organizations. Uh I play CTFs with the pled parliament of poning at Carneuen. Uh and D and I do research at uh this company called Alconos. Uh it's an AI lab that we founded about two months ago where we uh train specific uh AI models from the ground up uh to uh to solve specific cyber security problems. And uh this research is basically one of our uh first research that we did there. Uh DV is a reinforcement learning researcher. And so I'll start with a brief look at um application security
automations in general. So we have whitebox testing and blackbox testing. Uh and white box is where when you have access to the source code. Uh it's it's great for uh cases where uh you might want to uh have a CI/CD automation that uh when you push something to your dev or staging branches uh it checks your code uh statically to see to to pick up the low hanging fruit or pick up stuff that can be picked up by uh these tools called SAS static analysis uh tools and uh and they're great for various um uh reasons but they can't observe runtime time behavior. So we also need blackbox testing and uh for blackbox testing uh
we have these tools called DAS which are dynamic application security testers uh where these applications these tools only have access to the application that they're testing as much as an end user does. Uh so it's a complete black box where you only have the inputs and outputs uh of every endpoint. And so uh this is great at simulating uh attacker behavior. If you have a specific threat model where uh you think uh it a bigger threat is like the biggest threat is is is someone outside of the organization then uh it's it's it helps you use that uh sort of simulate that uh but both of them misses uh logic based vulnerabilities. Uh this is because these vulnerabilities
are not input based. Um it's one of the reasons uh what do I mean by input based? So if you look at XSS or SQL injection you have a single input uh where the application takes that input in and the application being vulnerable or not really depends on uh what happens to that input. Does it get sanitized? Uh what kind of processes does it go through uh before it gets returned back to the user or before it turns into an action that you take on the uh application. Um whereas in logic based vulnerabilities you don't have that kind of a single uh input. It's usually more about the state and user roles and permissions. Um and and so these
vulnerabilities also spend multiple endpoints and user states. Uh so it's not like you can have a single endpoint uh where you can uh sort of take a traditional fuzzer approach of like sending a specific input and then take looking at the response uh applying some some regular expression or whether your oracle or cover uh coverage um coverage method is coverage calculation method and um depending on this you might mutate your input and you keep doing that um it doesn't work for large based vulnerabilities uh at least not in the way that it does for uh for for the other ones. And so they also require context. Um the definition of a logic based vulnerability changes based on uh
the application that you're uh uh looking at. So uh Twitter has a completely different business logic than Google Drive. Um there are files there there's there's there features where you share share stuff which sort of uh morphs the IM into different things. So it requires context. This is the third reason. Uh examples of this would be privilege escalation. Um where there's really like not a single method for privilege escalation. You can make a cheat sheet for SQL injection or XSS. Uh but you can make it you can't really make a cheat sheet for privilege escalation because it really depends on the custom implementation um for that application. Uh this was the traditional fuzzer p image that I talked about. Um so and
then the natural idea is obviously can we use LLMs to solve this problem because uh they're just so popular and uh apparently people have been using LLM to solve various uh cyber security problems. Um so it's a very very very natural idea. Uh but they turns out they lack the context window to understand business logic. you're doing black you're doing black light blackbox security um where you have to send so many requests um to all the endpoints and keep all of that in context where you can make speculations about how the back end might be working which is which what is what we call reverse engineering basically uh and they fail over long time horizons uh they're slow they can't
uh take so many actions uh which also leads us to the third point which is they have to be reliant on tool use because um while let's say a modelbased algorithm um think like alpha zero or alpha goes can uh do uh so many things thousands of uh requests uh thousands of things per second um but but for LLMs you would have to uh have it maybe write a Python script because uh the inference takes a long time you can't do like very granular actions you can't approach from uh uh first principles I guess >> so Tahadai looked at this li these limitations for of both the traditional fuzzers and uh of using LMS for this
task and we decided to take a first principles approach to ask how could we solve these problems that are not able to be solved traditionally and are not directly able to be solved via LLMs. How can we still solve them? What is the correct ML tool to use and we we came up with reinforcement learning? Why? Well, RL is great at long time horizons. It's meant to be dealing with sequential data. Part of the entire point of RL is to be able to take uh rewards that come after a long sequence of actions and figure out how to assign credits to the actions beforehand to figure out how those actions led to getting to that
reward. It's context aware. A huge part of RL is having a state representation and this state is stored and is able to capture basically all of the relevant information about the application at any given time. So the model will have access to it. And furthermore, RL can learn optimal representations of these uh of the structure of the web application. Especially if you go looking at for example model based RL methods which are methods which teach um which teach a neural network how to represent um a particular environment in order to allow it to predict what would happen in the future. that leads to very efficient representations that capture exactly all the information that's relevant about this environment. And and
finally, you can take faster individual actions because you're not quering a 500 billion 600 billion uh parameter model every single time you have to send any request. Okay, so let's start with a brief introduction to RL. So the big principle of RL is that rewards are really all you need. You don't need to have any sort of um label data. You don't need to have any sort of expert instructions, any expert guidance. You're just going to have some reward for when the agent does something right. And the RL algorithms, the the beauty of this theoretical framework is that it automatically assigns uh the credits to the previous actions. Therefore, this allows you to do real-time learning because the agent is
going to interact with this environment. So, if you can think about it as a fuzzing agent is going to interact with the web application, it's going to send requests. It's going to create um files, create objects and this is going to affect its environment. So as it creates as it takes a useful action and gets positive reward or it takes an action that it believes later on would like assign like was part of the reason why it was able to get reward. It's able to understand that that action uh was useful and quickly change its behavior. You can deal with very long sequences. You can often have times have sequences of thousands of actions um in a single
episode and this to some extent can go on indefinitely. Um, and so you really have all of these advantages that come with using reinforcement learning. Um, with RL, this is what I think is really the most important, the most valuable aspect. The biggest thing that it brings to the table is that it has the potential for superhuman performance. with regular supervised learning, with regular methods of training ML models, you always have to have some human being assigned some sort of action as being like the correct thing to do and the models just learn to replicate that. With reinforcement learning, since you're only giving some sort of verifying um function, some sort of oracle uh that's able to tell the
model when it does something good, it is able to become much much better um than even the best human beings as we'll see soon. Also, if you structure the environment correctly, if you have enough diversity in terms of the applications you're testing on, for example, you could have virtually unlimited data because you don't need to have a human being going and annotating all the data um all the actions, what is um what was the correct request to send here, what was the correct request sent there. And so this also allows these methods to be incredibly scalable. Now here I'd want to talk about an example of a situation where we can see the truly superhuman performance that
reinforced learning can have. So in 2016 there was no um computer method in the world that was able to reach like a master level in the game of Go. Go simply had too many possibilities. It had too large um too many states. You'd have to search too far to the future. it wasn't solvable like chess with pure search. Therefore, um, DeepMind decided to try to apply reinforcement learning to this method. So, they trained a modelbased RL algorithm. This means that their algorithm learned how to represent um the go board extremely efficiently and was able to figure out how to plan to the future on it and it was able to produce truly superhuman incredible performance. Move 37 is the most famous
case where the commentators who were looking at this game between the world champion and Alph Go. They didn't really understand what the point of this move was. They thought that perhaps Alph Go had had a bug, maybe it blundered. But in fact, later on in a postgame analysis, they found that this was the incredible move that won the game. Okay, so how can we leverage reinforcement learning to solve problems in API security? Well, all reinforced learning is presented through the lens of mark of decision processes. So a mark of decision process is some state um it's some environment with a state some transition some actions that an agent can take a transition function that takes in an action and a state and
produces a new state and then the positive rewards that in that enable the agent to learn when you're training the agent um within this framework. work, you'd ask, okay, if it can only see at any point the reward that it had at that particular moment, how could it possibly be able to assign credit over long time horizons? How would it be able to understand um which actions to take if it's only getting rewards after having taken multiple actions? Well, this is really the the genius of RL is that it figures out that you can really propagate backwards these rewards into all of the previous actions. So if you take an action and after this action you have a reward of one then
okay this was probably a great action. If you take an action and afterwards you don't have a reward but afterwards you do have a reward of one and the action before was also probably a great action. So with RL what you do is that you train a model that predicts how valuable a particular action is. This is called a Q model. Predicts that the next action is going to have a value of 0.5. If you take the action and you get a reward of one, that means that the action you took beforehand was much better than what you expected. And so this should be rewarded. This should be encouraged. And by using deep learning, you can
train neural networks to be able to every time they are surprised by a higher reward or a lower reward, learn how to better predict what rewards are going to be obtained. And so therefore, um, these RL methods are able to learn into the future. This has this reinforcement learning has already been very recently applied to the field of uh web security. Um this is an example paper APIL. However, this one focused mostly on finding um on finding bugs, less security vulnerabilities. It looked for 5xx errors, but they had very interesting results. And so we thought this could also be used for finding logic based vulnerabilities. So then the big question is how do we turn this problem into a market decision
process so that we can solve it using reinforced learning. So first of all we need to have some sort of way of representing a web application such that a neural network can understand what is going on. For this we were inspired by the site map that we on on Burp and the knowledge graphs that are used oftent times to represent web applications and we and we came up with a tree structure that would be able to capture really the structure of the application. So at first you have the root node. Um this is simply used in order to be able to uh unify the entire um the entire model. And then underneath it you have a bunch of u a bunch of
endpoint nodes and within these endpoints um you have some sort of like you have request response pairs. The idea is that initially you can initialize this tree by take by making some uh requests to the application and you're going to have a bunch of these request response pairs underneath these endpoints and then within these you can look both at the important parts of the request and the important parts of the response. So within the response we the really important part is the response body and here we're looking at single page replications. So um the response body often is usually already implemented as a JSON. So we split we simply converted this JSON into a subree and that is
added to our to our tree within the request. You have to know which user it is authentic like which user sent this request. This is the authentication node. You need to know um the path parameters. You need to know the query parameters. And there's also oftent times a JSON body which is again is represented as a tree. This structure is able to allow a graph neural network to truly understand what the web application looks at looks like at a very high level. So it can figure out what sort of requests were made uh and how and by navigating around this graph, we're able, as you'll soon see, to send interesting requests and to and to
mutate requests in order to find vulnerabilities. The actual architecture of our agent was as follows. The graph goes into a graph neural network to generate embeddings that represent the entire structure of the graph. Apart from this, we also need to have um some thing that the agent is working on, some request that the agent is working on that it's going to be able to send to the application in order to get back the feedback. So therefore we have an original request that the um that the agent is modifying and this request has like the the parameters um it has it's authenticated as a particular user um because it has been sent before it will have a response code these are encoded
as one hot vectors um and it's going to have an idea like representing it we also because uh as you'll soon see with the action space the the agent actually um moves around the graph like in a chessboard. We also need to have as an input the position of the agent inside of this graph. Um and so this is simply a vector of coordinates um that tells you really how to get from the root to whichever node the agent is on. So here's the here's an interesting part. um our action space is really really revolves around graph navigation. We realize that if we're working on a particular request um for these specific um logic based vulnerabilities, the
parameters that we need to mutate in this request that we're working on to try to um to try to exploit this vulnerability oftent times exist in other requests in the application. So if we're talking about an idor for example uh where you take um an ID from a different request uh or like an ID from a different object and put it into your request and you find that you're able to access the other object without the necessary authentication. Well, for that we can find the ids in the graph. And furthermore, if you have a some form of broken authentication where a user is able to um to access another like to use um a user is able to send a request as
another user even if they're authenticated as their own user. So, for example, you're going to have a request that's going to have um that's going to be creating a note in a notes app, and you want to create you want to create or maybe even delete these notes um on another user's um uh account, then perhaps even with your own authentication token by sending exactly the request that they would have sent, you could still this is a vulnerability that happens. You can still um exploit this. So by going to select these these authentication tokens, you can also find these vulnerabilities. And then if you're going to be taking the the reason there's like a a previous
and next parameter actions is because sometimes you is because the request that you're working on is going to have a bunch of parameters and you're going to want to bring the value and um the value that you selected on the graph might only apply to one of these. So the model needs to be able to select which parameter is going to take on the value that you took from the graph. Okay. Um another important part here is that we we need to train this and then we need to benchmark this somehow. We realize that these are two very similar problems. Um and and especially for training we need thousands of applications. Uh so I we came up with
this idea uh of um combining different um sort of parts of combine different partial applications to generate applications that uh may or may not be vulnerable. Um and and and to do this we uh came up with this YAML uh format. um we generate uh application parts like let's say we write one file that handles uh DB connections another file handles the uh API and and so on. Um and and each of them uh has a version where they're vulnerable and another version where they're not. Um so you can just edit the CML to uh use true or false and uh it will just create a new lab uh where something is different um and something is not not vulnerable or
vulnerable. And so we made this uh notes app. So it's like if you're familiar with OSHOP it's pretty uh similar to that but the difference is that with OSHOP you only have one vulnerable application. It's always the same application. And if you want to uh train RL on this or even if you're using LLMs uh this is actually very useful uh because uh even with LLMs you need uh I think techniques like reinforce finetuning finetuning and uh you want sort of diverse set of uh uh evaluation benchmarks um that or training uh sets that you can use. So, so um yeah, this is this is what it is. And uh we have this lab manager where um
the RL algorithm can just send an API request to uh SLW switch/random uh where the uh lab manager would change uh one of the labs. So to change the current lab to a different lab uh where there's a different vulnerability and then um you you you observe RL uh observe what the RL agent does, what kinds of actions it takes uh and and how it differs from the previous ones and so on. So here's one very simple vulnerable that found um these are two different users and notes uh private notes. So the user one note is uh has ID 10 and the user two notes has ID 7. Uh so user one tries to delete um 10 and can delete it.
Uh and it also tries to delete seven and can delete it as well. Uh so this is the simplest vulnerable stats we put in the application. This is just like the simplest one to uh sort of showcase. Um I think um so this is this is this is one that our um our agent was sort of able to like create this trace for and durable experience. >> Yes. Okay. Um so even though the actual exploit of this vulnerability is really quite simple, we also wanted to show how using reinforcement learning also can mean that the the path that it takes to find these vulnerabilities might still not be a complete direct um a direct exploitation
or I mean it won't be a direct path to the exploit. So for example in this case the agent started as with a request within this API notes um uh endpoint and it went a little bit back and forth between the request and the and the um node that represents the request response pair um and eventually it went up and reached the node of the authentication. Yes, eventually it went up and reached the node of the endpoint. From there it went down to a different request and it moved around there a little bit and then came to to and finally it came to a request response pair which had a different user authenticated. From there it went and first checked the
response to see if there's a useful value from there. It didn't find any. It went back and then it went and finally checked the request. it checked the authorization um token of the request and it selected that one. Then the the select action happened and so what happens is that it took the the JWT of this other uh user from this other request and applied it to the original request and in this way it was able to exploit the vulnerability. We can show that there's this um we can show here the trace of this. Here's just an example of it. You can see it takes uh here it takes like a back action and then it takes like some selection
actions and then this allows it um this this is how our traces look like that we are able to identify as this sort of behavior. >> Thanks for listening. You can take any questions now. >> Yes. [applause]
Thanks. Uh did you succeed in teaching the agent LM how would you call it to find a vulnerability that didn't necessarily you put in your bucket of vulnerabilities? So we wanted to make sure that our uh our um environment is reliable and useful for our benchmarking purposes and for things like that. And so we were very careful to make sure it only had the vulnerabilities that we were expecting because we wanted to be able to see the the usefulness of this approach in in being able to find these sorts of vulnerabilities. So there were yeah >> yeah that would basically show like I I think uh quality benchmarks would be benchmarks where we exactly know what
the vulnerabilities are because otherwise um we wouldn't have um we we wouldn't be able to reason about the behavior of this agent um if if it can find vulnerabilities that we we are not aware of in the benchmark yet. But we did if if you're asking if we did if we tried this on like real real world uh we did in a bunch of applications uh in small scales and uh it was able to uh point us to the right place or like oh hey here's a here's a different interesting behavior. Uh there's a vulnerability here. >> Yeah. On stuff that you didn't necessarily taught him. >> Sorry. like you when you let it run in the wild for example you
didn't give him the information of JWT oh no >> null null header >> yes >> if he could find a null header >> it's it will be exponentially stronger because he was able to reason an o bypass that you didn't necessarily give him the pathway to know about >> yes so this. So if it was an LLM, then LM sometimes are able to surprise you in this way by coming up with like um some sort of exploit that wasn't really in the task that you gave it. But still what we see in practice that they usually don't um the the trade-off that we're that we are making here compared to using LMS is that um LMS can maybe
we're we're not like really seeing this but maybe you could say that they'd be creative in particular cases but they're just much much slower because you have to query an entire uh one trillion parameter model. uh you have to query it like a thousand times to have like the reasoning trace and then this would lead to one action or it would lead lead to it maybe uh creating a script with which would basically be a fuzzer that's already implemented in the open source. Um our our approach is fully capable of finding um all of like the the uh broken access control vulnerabilities that you would find in the wild. it has this ability and yes in terms of the
strategies to find these vulnerabilities it came up with those by itself. We didn't actually feed it any strategies. We didn't feed it any traces. We just ran it with a reward function and it was able to figure it out. Uh and the the usefulness of this is that it's much more like a fuzzer in that it can reach a much greater breadth and much greater depth than what um a similar uh LLM with like a similar action space would be able to reach. >> Cool. Thanks. >> Thank you.
>> Um, how does this uh have an advantage over traditional like dumb fuzzing on burpuite or you know even just a human manually fuzzing? >> Yeah, it's it's uh mostly about the context. Um uh so as I said the definition of a vulnerability changes. So for logic based vulnerabilities the definition of a vulnerability changes depending on the application. So you need uh if you're using traditional fuzzing, you would need something custom for uh for every application that you're testing. Um yeah, you would need to understand all the uh identity access management and permission structures of the application that you're testing. Specifically for for each endpoint on that application, you would have to understand uh what each type of each
level of user uh each level of privileges uh is is um able to access, what kinds of objects can can it access, what what uh personally identifi identifiable information can it access and so on.
Thank you so much for your talk. Um my question is I was a little bit confused about the usefulness or implementation of the the graph neural network piece like when you were talking about the training and evaluation environment. What part does using a graph neural network play into it? Exactly. >> Yes. U so you need so the agent needs some way of understanding what the structure of the web application is, right? So you need to have some sort of representation of it. Now if you were to use simply like raw text for example then you'd be trying to create some sort of language model. So you'd need a very large model and a lot you'd have a lot
of um not useful information built in there. Uh for example you'd you'd see the entire like raw text of like the UU ids and JWTs and all that stuff. Um and so this would not be useful to the model. Um in order to understand really what the structure of the application is, where it can go and look for vulnerabilities, uh which endpoints um sit where, how to like here it can learn for example that um like the like a list like a dashboard with node ids is in this place in the application. So it can go and and navigate there, send a request perhaps after having made other changes, then get a response that it can
then navigate to afterwards to take values from and input them into a different endpoint. Um but for that needs to understand what the what the web application looks like. The only way that like or the best way that we found to capture this sort of structure is with a tree. And therefore and this allows us to use a graph neural network in order to process this tree um and have the model at any point have some sort of um latent space embedding of exactly what the structure of the application is. >> If nobody else has a question that I'm going to take as opportunity to just ask a quick followup. Um does that mean that
every single time you have a new application you need to recreate this structure? >> Um yes, but this structure can actually be recreated from even like we even have code to take like a burp sitemap um after doing some manual crawling and turn this into this structure. It's actually like a very natural structure that follows from how uh users or even crawlers uh would interact with an application. We we also have this uh so we we did say that we don't use LLMs but uh if you want to turn this into a product there are like other parts to it. So another part is this crawling part and so for that we use a browser
use LLM agent uh to crawl the application. What it basically does is kind of a DFS uh like strategy where uh it just clicks and it clicks every button and fills out every form and uses every feature in the application. And while it does that we uh generate the network traffic uh and you can just have a function that uh goes from that to this uh the gen. >> Cool. Thanks guys. Great talk overall. Um just my question was um have you tested this against other vulnerabilities types as well or just broken access control for now? >> Uh for now just logic based vulnerabilities. broken access control idor uh the definitions of these are kind of loose so I'm not yeah it's
generally logic based vulnerabilities but uh aside from that uh this method wouldn't apply to say xss completely different thing we do we do have like other research and those are we're interested in those personally so uh if you come up after talk uh we would love to >> these guys for example have show that that reinforcement learning can be used to find other sorts of like injections yes exactly
Hey guys, uh awesome talk. Uh what are the next steps actually? Uh because you built this awesome model. Uh it shows potential but what's the future? >> So uh we will open source these benchmarks and the train the benchmark is basically the training environment. So we'll open source this lab generator essentially um very soon like this weekish probably um if you come up after talk we can exchange emails and uh love to share uh those with you um yeah I think yeah in terms of the evaluation and the training that's the that's the next steps uh in terms of the research um we the the we just we just believe that uh our thesis is that the the
future of security doesn't look like an an LLM agent uh like one a one sizefits-all LLM wrapper uh that can do find all vulnerabilities. Uh we think it looks like um if you have 10 different categories of vulnerabilities, you'll have 10 different methods. Um humans also probably use different parts of their brains to uh find different types of these vulnerabilities. I I assume uh different neural activations, right? So uh that that's that's our thesis and so we keep working on uh training different kinds of methods for different kinds of problems and I I I find this this one specifically very interesting because uh as far as we know based vulnerabilities uh are this is problem uh there's there
hasn't been any uh autonomous solution that was able to find these we also want to scale this up. Yeah that's awesome. Thank you.
Cool. Thanks everyone. Thanks so much for coming.