← All talks

Till Recollapse: Fuzzing the Web for Mysterious Vulnerabilities by Andre Baptista (@0xacb)

BSides Ahmedabad42:449.4K viewsPublished 2024-05Watch on YouTube ↗
About this talk
Discover how REcollapse can revolutionize bug bounty hunting and HackerOne LHEs by enabling zero-interaction account takeovers and uncovering new bypasses for web application firewalls. 💪💻 #bugbounty #hacking #vulnerability #ahmedabad #gujarat #infosec #conference #bugbountytips #cybersecurity #exploit #hackers #bug #bughunters #reverseengineering #livehacking #lhe #hackerone Join this channel to get access to perks: https://www.youtube.com/channel/UCPXygU1E0THhOYOjLmPCZRg/join
Show transcript [en]

all right um thank you so much for inviting me for the organizing organizers so today I'll be revisiting a talk um named till recollapse uh it's especially useful for those that are starting to look into more mature uh Targets in bug bounties basically so in this talk I'll be explaining basically how to find weirdly simple but impactful bugs on such targets uh this is a longer version uh of of this talk with a few more examples so thanks for the intro I think I can skip this part I'm Andre Batista also known as Ox ACB uh in the on Twitter eer one and so on so for today we'll be discussing uh some quirks about Rex uh and general input

validation and how this can be bypassed then then we'll moving on into the recollapse technique um and finally we'll be visiting some real world examples so it all started a few years back uh when I was doing bug bounties for Live acing Events with AER one and some friends were like okay uh there's some sketchy functionality here on this host uh if we can byass some certain uh White list uh we can eventually leak a token uh and that was what exactly happened so we figured out a way to bypass a certain validation and the way I was able to do it um leaded to all this uh research so it all starts with the user

input right so uh modern web applications rely on uh validation of user input good for us when it doesn't right um and it uses of course regular Expressions which is something that not even most developers actually understand as they write these these regular Expressions um so this is just an example aa.com doesn't match these these rexs that is meant for an email address uh but the second one will return a match object in this case in Python but also of course sanitization of user input and the way it's rendered in uh the the output of the web applications for example uh the HML special charged function will sanitize uh to HTML entities the these dangerous

input and also normalization which is not um not so well known let's say uh and we have different behaviors in different uh programming languages in terms of the out output um that that it produces by default okay so if we have these Unicode characters um like this a a i o u uh it will actually convert the that weird ordinal indicator into degrees D EG uh down there in Python uh and up there we have in in this case in PHP you see the behavior is even more weird so um all this can lead to uh very interesting scenarios where your input somehow gets normalized and then you can eventually if they are blocking for

example single quotes or double quotes uh in the first case you can see that if it this input gets normalized somewhere uh you'll be able to eventually Pop um some some impactful issues even on very mature targets because this is not so well known uh this is simply a normalization table that is available uh on the GitHub for this Tool recollapse uh that I launched uh not um it was like one year ago um so basically I was able to build this table by uh matching characters that will normalize to the base characters that you can see up there which are the regular asy characters um so there are some problems with validation of user inputs I was

already telling you that uh regular expression is not uh uh understood by people that actually copy paste these regular expressions for example stock overflow uh people used to go there now we are using more and more specially developers GitHub co-pilot and so on um so here's a regular expression to match for an email address sure it's very secure right um not really because it will accept uh in this case uh uh email addresses like the second one that you see here okay so most of these developers especially these developers that work on very very large corporations uh they don't actually go to to uh rigex uh 101 for example to test the regular expressions or

sometimes they don't even uh write proper uh unit tests uh for it and sometimes they actually copy paste these uh regular expressions from other projects uh let's say that you have a subdomain for this company and they are using a certain regular expression there to validate an email address but then on another application for example in a mobile application uh the regular expression will not be the same uh uh most of the times because it's a different developer uh and and it's very hard to have uh uh the same the same regular Expressions on on the code base um in the case of modern tools such as GitHub co-pilot for example powered by open AI uh this was a function uh

generated dynamically by GB co-pilot and in the description in the prompt we say okay check if the URL is a subdomain of the domain provided okay so it generated this regular expression and it seems like llms are not very good as humans uh in in terms of regular Expressions so as you can see all these examples will go through these um this this code uh it's very interesting that the the last to go through because of the dot because uh the the actual domain is in this case formatted directly into the string and you have example Doom in this case so the dot that you see in example.com will not have a backslash okay so in this

case uh it will be possible to have the the the last two examples those will go through this regular expression generated automatically by uh GitHub co-pilot in this case and it's very interesting that in my research uh I was able to understand understand that not all the base libraries for um programming languages in terms of regular Expressions have the same behavior so the ler sign asserts a position at the end of the string for example or before the line Terminator right at the end of the string so this is very confusing if you read it uh but basically it should mean in my opinion uh that okay we'll have a regular expression with a dollar sign in the end

and these will will mean that the string ends there but we can have a new line in the end it will still match right um the thing is like when you look into JavaScript for example um it will match uh only if you have these regular expression uh you have a dollar sign asserting asserting the end of the string there uh but uh JavaScript will only match uh the first the first example with three A's in this case and if you put a new lines or any new string uh a string after the new line it will not go through the validation in Python uh it's almost the same but the third example will actually go through

so python is basically following what the convention about this dollar sign means uh at least in my opinion or the developer uh that coded the the base library for python and in terms of Ruby uh it's even more interesting because the fourth example will go through with another string in the end so basically uh this means that if we can just send a request and put a new line or any other character uh wides space characters for example uh by default I'm not talking about these new line or multi-line uh flags that we can put in regular Expressions as well by default they will have in fact different behaviors so this is something that is not very well known

by the developers and good for back Bounty enters and hackers in general that we can still find some very weird bypasses with this technique uh this is just a table resuming the this this regular expression example regarding the dollar sign that asserts the end of the string so there's some sort of confusion on implementing regular Expressions uh in the core Library so regular expressions are art so let's talk about the technique that I've been applying to these mature targets and it's very very simple uh it's about somehow redefining the impossible um because let's say that you have a very tough user input validation you cannot put any payload there um you have an URL it is super WID listed uh

you have an ssrf scenario and you cannot put like you know internal host like how can you actually redefine this almost impossible validation and in at the same time how can we also leverage user input transformations in terms of uh what I was talking about normalizations and basically the answer is about fuzzing the parameters pretty much how we do in binaries for example with a with basically AFL or other tools um but in a in a smart way way and in a fast enough way for the web uh so if we start with the initial uh scenario uh we just need to send unexpected inputs lots of them we are poking into a black box uh and basically

we are going to get a weird Behavior back and this blackbox this is a simple very simple concept uh but it will start shining some light about uh the behavior present in the in the back end uh of the application about the source code that we do not know so the idea about this initial scenario is okay we are going to uh just uh send a lot of possible inputs uh and in this technique is all the possible bites in multiple positions like I'm going to show you after uh to see if uh any request actually goes through or at least results in a different uh Behavior so the technique is very very simple it's focused mostly on regular

expression groups okay uh to defeat this kind of validations and basically we have the starting and termination positions to check if the regular expression has uh the starting uh and the the the the dollar sign in the end uh we have the separated positions we which are usually around uh punctu pun punctuation shs or special characters and then we have normalization positions which are basically all the characters that can be usually normalized uh the idea is about fuzzing positions with all the possible bites and analyzing the the responses so these are just some uh examples of these uh uh pivot positions so we the a regular expression can be asserting the beginning of the string or it can be

asserting the end of the string or they can even be not asserting anything at all and you can have an URL and if they are not asserting the beginning of the string you can have anything that you want before the actual URL it will still match and in the end the same thing then we have the separator positions around special characters this is because usually for example uh uh letters are part of a toz regular expression group and then you have the actual characters uh being uh uh present in the regular expression and then you have the normalization positions we are vowels work uh better in a lot of libraries but obviously any character can have uh

Unicode alternative uh representation so these are just some examples but basically the idea is to F all the positions uh with all possible bites um but in a way that this doesn't explode in terms of complexity so we are not going to fuz all the possible combinations of this we are going to fuz one by one and check the behavior and iterate multiple times over this um so these are just some examples with usernames with even HTML to find new bypasses for uh um uh xss sanitizer for example and so on so this is a tool that is available on my GitHub and it's a tool capable of generating such inputs and it supports multiple fuzzing sizes

and encodings and it's basically a tool not meant for actual fuzzing but to generate these type of inputs so let's just jump to a quick demo of the tool so let's say that we have this example this is just a demo application um so let's say that uh you are a bug Bounty hter you approach this application and this application will allow you to log in to another application as an identity provider for example uh so so the behavior is it will go it will use your session you it will go to an x. example.com with a token okay so let's try to understand how this works so we click yes we intercept the the request and we can see that we have

a URL parameter with x. example.com so we can try to understand okay can we put our host in in here so let's try www example.com or or anything else for example whatever it will return forbidden we can try the same manually of course for example.com it will work and we can try y. example.com it will work and so on we can try a subdomain of subdomains and it will work but we don't know the regular expression here there is on the on the back end so our goal will be to go to another host of course we could try at evo.com which is very common it will return forbidden so there's some sort of of validation about

the user input so that the token doesn't get exfiltrated to an external attacker controlled uh host so to use the Tool uh we just need to basically pass the original input a Val input usually for example x. example.com or it could be just uh example.com we click enter with the default settings it will basically generate uh all these possible bytes in multiple positions um with all the the positions that I was telling you somebody's calling a phone on the stage uh so we are going to copy this output we are going to use for example burp in this in this case we can send to Intruder uh we can select this part of the request that we want to fuzz

uh we just need to paste uh the output and we'll start an attack okay so this will send request requests we can order this by status code uh so we go down uh we can see that it ends around here it will give 403 after this and we can try to take a look at some responses that appear so this is a 302 found uh this is not interesting at all uh it's in the beginning of the request but after a while we can see that for example this one works which is which is x. example.com /2 f which is a slash it will redirect to example.com which is not useful and then we see an actual add

symbol with a percentage 4 Z in this example so the at symbol is kind of working uh but it was not working for the at do evil.com for example okay so the idea here is okay we are going to pass for example this input and we are going to fuzz it uh with an add symbol and we can put evil com to see if some sort of variation I don't know who forgot the phone here but yeah let's go uh we are going to copy the output it will fill in all these positions and then we are going to the Intruder again uh we are going to paste the the new output and basically uh we can try to

understand what interesting responses we got here with this ad symbol and we got only a few and we got a 2f which is not in interesting because it will have a slash if we go to this URL it will be the whitelisted domain controlled by the Target uh this one is not interesting at all actually it will it will fail and then we have this one which is really interesting because we can redirect uh basically to H named evil in this case uh but we cannot have the dots so we found a bug in the regular expression on on the back end in this in this demo uh so right now now what we can do uh

especially for those that are used with this is that okay we can have uh evil here or any other string so if we can have any other string we can just put one to three one to three for example and this is this is going to be a a no uh that is a decimal representation of an IP address so if we just copy this URL we go to the browser sorry not this one we go to the browser and basically what's going to happen is we are going to be redirected to this IP address okay that can be controlled by an attacker so we found this Edge case that will allow us to leak the token uh

to an external IP address and uh it was uh basically at the first glance it was impossible to bypass this regular expression so this was just like a classical example on how you can use uh this tool so uh let's talk about uh mysterious bugs so we want to look uh literally at anything on the core of a mature Target that gets validated super sanitized normalized used in queries and so on the more times that you have this input that you can send that is being uh super validated the better okay so I I can recall one example where we could basically only on the sign up on the front end uh of this signup page uh

basically the email address was being uh uh at a regular expression but that regular expression in the back end was very different so we could intercept the request and inject HTML into the actual email address and from there we could prop pop basically cross- side scripting uh in this case was on Dropbox uh core okay and also in the desktop client because the the the same parameter will be uh rendered in the name of the user when you would send an invite and in the desktop application it could be an access to rce as well um so this is more like a methodology uh with some simple steps so it's all about setting a goal

for example atto uh I'm going to show some examples about account takeover so the idea is about picking a Target Field that is heavily San ized you identify all flows that that are going to consume the Target Field that you picked and then for every Ino you're are going to try to find these differences in the regular Expressions that are present in the back end then we are going to analyze all the responses you can sort by Response Code for example you can sort by response length and go from there and analyze if there's anything weird happening with this uh fuzzing that you are uh doing and then we can pick a weird bite that actually went

through such as a new line which is very common and then you can go through all the flows in that state so let's say that your user has these very weird email address with an accent in a vowel for example and you are not supposed to if you try for example to perform o uh uh with that uh uh application uh maybe the email address will get normalized and two accounts uh in that state will have two different accounts one with the accent another without the accent actually in the in a vowel for example and then somehow sometimes through different flows for example recovery and so on you can end up in the actual victim email address this happened with

me on GitHub and many many other targets that are super and they have big security teams and so on and hopefully in this in this stage you just found a mysterious bug a very weird state where you can you found like an impossible path to perform for example a count take over if it doesn't work just go back and try again on a different field so this is like a methodology that I've been following for the last couple of years and it's been very successful not in very small targets but actually very mature targets that have all these flows from SSO to Olf and so on so let's move on to some real world uh examples uh

this is some sort of the example that I gave in the demo with the token and basically the user will get redirected uh to an external host and we want to leak the token from uh the victim and there's some sort of validation uh but in this case uh it was even harder uh than this we can just put uh this evil.com or even just a word or a decimal IP addresses so now what um so basically it will just it was just giving 302 for all the possible scenarios uh to redact the.com it will ignore just our host uh if we fuzz wi one BYT but with the recollapse tool we can you can specify two bytes for

fuzzing at the same time and we got one it the the semicolon and at symbol actually were able to redirect to evil.com again so sometimes you can fuzz with more bites in all these different positions and we'll get very weird uh behaviors another example uh I was participating in a live acing event with zlz in this case and the he actually posted a blog post uh uh because of this this issue we got 40K uh for it and basically what we noticed was a null bite on a sign up request uh on a Target and the weird Behavior Uh that we got with this fuzzing technique was in the response there was something different on every single request if we would send

a n bite in the email or the first name of the account we are trying to sign up so basically somehow a low-level library on the target uh was handling strings uh in a bad way basically so the n was actually uh being removed and concatenating uh concatenated with the contiguous memory uh on the server so if we would sign up as victim a NB at domain.com this would return victim L for example at domain.com and so on and if we try multiple times we'll get victim b or victim and some garbage bites and we're like what the is going on uh so this was sort of an nart bleed bug all over again because we could dump uh

private Keys uh from the region that we were at uh it was very orinal uh because we were eating certain servers in the US uh back in the day and basically we could dump passwords uh and all the requests that were being performed by users live so this was like found just by fuzzing and looking carefully into the weird behaviors in the responses so very simp simple but yet very impactful another one uh I was calling this in the report uh uh recash deception because of using this technique so we have this endpoint uh Json uh endpoint that will return uh information about the user and I was with Joel techno geek and we were

poking at this target for Ving event and we were like okay let's focus on web cach deception and we were trying back in the day the classic web cach decep stuff and we were trying user. scss PDF JS and so on but nothing was working I was like okay let's try the recollapse trick again and basically um we were able to obviously pop it but basically uh we realized of course all these caching rules are usually regular expression based or they have some sort of eristic to to uh uh uh check if the request the response should be cached um so uh basically we need to have the correct content type in the response for

it to get cached in this case uh so let's fuzz it and basically what happened is that we first uh we had a word list uh with the extensions with all the possible extensions known um and then we basically fuzzed before the dot of the extension from 0 to f with all the possible bytes and basically two cases returned at 200 with the correct content type uh with the correct content type being for example uh PDF with an age 35 xcash it in this case so a percentage 23 an ash symbol or a question mark 3F would basically get cached but still it will still return this information about the user with an API token so what we could do it was

just uh classical uh uh web cach deception from there we can send a link to a logged in victim uh that will request this URL and then as an attacker we'll just need to access the cached content from somewhere else and seal the API token to perform in this case the account take over and we could exfiltrate uh some serious uh pii uh of the victim um another one username confusion and in this case this is a very very simple one but very very impactful so let's say for example in any social network uh we have our usernames and our usernames need to be unique so let's say that the victim is the victim is this is the victim and

the attacker account is this is the victim with a new line so there was an a a bad regular expression validating the user input and an attacker could actually have a username with a new line in the end it was almost the same uh but it's actually a different account and the unique constraint on the database would basically uh don't do anything it will allow it because it wasn't being stripped or normalized in this case um so basically uh this I cannot disclose the target but let's say that we have uh an application that allows uh payments for example and the third user wants to send money to the victim uh and the way that they

do this can be by the username so basically um a third user that knows the victim uh will try to send Okay I want to send money to uh my friend but which one because they are the same one uh actually changed the pictures of the bunnies here so we could uh tell the difference on the report but the victim uh will be losing money in this case because the third user could pick the wrong person with the 50% chance so very very simple one but yet impactful but another flow was also affected just need sometimes to pay attention to the side effects of such a weird State on uh the application as an attacker if you

requested money from the actual uh uh uh from any user that knows the victim uh it appears that Andrea victim is requesting money and if a third person that is supposed to accept this request clicks on the profile it goes to the victim's profile but the money will go to the attacker okay so this was a very interesting and very very simple and impactful bug that I found on the core of uh an application another example uh zero interaction account takeover uh via oal uh in this case Shopify Shopify offers a sign up login with Shopify oal sort of mechanism and basically uh how this works is it it's going to send the email address to the Target application and it

will check basically if the email sometimes is verified or not so they have this third party application I'm not sure if it was an acquisition or not but basically uh in this case on the target application the mail address that didn't need to be verified to create an account so we could we noticed this Behavior we could sign up on Shopify and then you can create an account create your shop but you have limited access you need to verify your email address but if you try to log in with Shopify in this state to the Target application it didn't need to be verified but you couldn't perform a takeover because the email address needs to be unique and the victim already owns

the actual email on Shopify you will not be able as an attacker to create an account with the same email address um so basically um we could we tried to change the email address on accounts. shopify.com which is basically the application where you can change your email address there was a proper regular expression in place no new lines no nothing no weird characters no ordinal indicators being normalized and so on but then again as I was telling you different flows have different regular Expressions most of the times it's not checked uh usually in the actual database model as it should on a central place uh so by fuzzing the signup request on accounts. shopify.com on

actual signup flow we could use these weird uh I symbol for example and it will go through and the victim at domain.com in this case already existed as an account on Shopify core and then basically we're able to perform the ATO like this so this is the victim account he has an account on tor. app this is the victim account we open a new session we go to tor. apppp as an attacker and it was very very simple we signed up with Shopify and all we are going to do is to put these weird characters in there actually in this case was the same email as the victim but with a character that will be normalized in the flow to

actual a uh so we just hit create account and it will just redirect us to the third party application and basically we will be end up on the victim's account with no interaction okay so this was just another very very simple example on a very mature Target a couple of years ago uh regarding these flows I wanted to tell you that actually o s I keep finding a lot of these bugs uh on these targets because the email addresses or usernames are they just flow into the application and normalization is often used in these flows so the actual email can contain weird symbols uh even on the domain part you can have a a a weird

domain uh to test for this you can have two domains one with a weird a that is going to be a ponic code domain and your actual domain and and then you can test uh the scenario where an attacker can create a fake weird ponic Cod domain that can be normalized to the actual uh domain so it opens the door to this uh normalization flow especially with saml O and so on and it's very common to have straight up email verification bypass why because developers Implement ol before they move into seml on these big Enterprises and usually off like login uh with Google or Apple Google and apple they validate email address right so you

create an account and you will be able to log in with Google or Apple on any application because your email is verified you you have a way they have a way uh to verify your email address that will be super big if you had an email verification bypass for example on Apple you could just take over lots of accounts in lots of applications right but then the developers move on okay we need to implement SSO because uh we have this big customer and we need to have this and there the developers will be used to trusting the O provider and they'll be starting trusting OCTA for example and the thing is there's no guarantee that an account on OCTA

belongs to an actual email address because there is no email verification uh uh uh for that I mean we can configure uh configure these SSO providers uh for that but the thing is um the even the organization or the IDP sometimes cannot be checked uh is not checked properly okay let's say that an organization configured SSO uh on a given application and then basically uh there's another user a malicious user that configures SSO on their end on the same application uh if the attacker can create basically they'll configure SSO and they'll try to log into the application with a new user uh with an email that belongs to the other organization sometimes uh you'll end up

on the target organization this case is very rare but it happens sometimes another final example uh which is also a a zero interaction account take over and the recovery FL flow so the Target in this case is an email provider uh so it was big to actually perform this this ATO with big impact because we're talking about an email provider so our goal was to account to ATO uh to account takeover with zero interaction uh an inbox uh that was configured for us and people were able to sign up as username at target.com or use their current email address to create an account so this was an email provider but you could use your current

current email address so let's explore all the flows uh in this case uh so we tried so the victim at target.com existed as an account on the target uh and our goal was to retrieve uh to to access the the email inbox if we put the victim at target.com on um on the recovery flow uh it will show up okay we'll send a code to a redacted recovery email address uh so as an attacker we could add the email of the victim as basically our recovery email this will require email verification but it changes in a it results in a change of the previous flow of recovering victim target.com after we do this on our end uh it will

end to a lot to a weird state where we may have a logic bug so basically it results in this change of flow where when we attempt to recover the victim it will show the email itself which makes no sense like I'm trying to recover this actual email address of the victim it showed Oh you want to use this email address to recover it it makes no sense because before it appeared just like this so as an attacker we introduce some sort of confusion in this state so there was some sort of r regular expression again matching the target the actual email provider uh domain uh because they wanted to be able to distinguish both

types of accounts the ones that will they will be creating a an inbox for and the others that already have an inbox an external uh provider so after fuzzing we realize that some special characters we're displaying uh when we put an email address like this we were getting the exact same behavior and this was basically super weird so we were thinking H maybe the regular expression doesn't have a no a an assertion on the end of the string with the dollar sign so and I was correct because if we added a recovery email address as victim target.com doom.com the regular expression will match target.com and basically will show up as a recovery of the attacker's account and the option as

option two we still have the original recovery email address so we click on recover to this email address and basically what's going to happen is we recover a code and then the application will ask us okay do you want to log in into the victim's account or the attacker account so this was a logical bug uh and also this domain.com uh I was able to configure an email uh inbox on that on that subdomain as the main domain of my domain uh and basically I was able to uh take over the account bypassing to fa and so on and they paid me uh 50k for it back in the day so it was huge so to sum it up um in terms of

developers uh we should test and fuz our regular Expressions uh even if we rely on well-known libraries you see that there are some weird behaviors going on and one of the major takeaways here is that very simple input modifications can result in great impact uh by flipping or having new bites and all the possible combinations for those of you that are starting to poke on very mature targets I heavily recommend for you to try this and also if you are trying to attempt some bypass of a sanitizer or a firewall or or a W uh I recommend that you try the recollapse tool and try to to see the results I I've been I I have some

people that I've been been reaching out for uh in this year saying that they are they have been quite successful um and also these blackbox regular expression testing is not still not very touched I keep finding all these issues when I go to live hacking events for example with eer one uh it it's it's always there uh so it's very creative and manual work uh you can use this this tool you can do the manual fuzzing you get the idea just go for it um and finally the regular expression Behavior can also reveal information about programming languages being used uh in the case of the new lines that I showed you um so basically if something is being validated and you

can bypass it think about the impact and you'll see the big picture and basically if somebody says that something is impossible and it's not bip passible we we are hackers and we always find a path to do things so I would like to thank these people that collaborated with me I couldn't list everyone here and that's it uh if you have any questions uh let me know