Till REcollapse: fuzzing the web for mysterious bugs

Name: Till REcollapse: fuzzing the web for mysterious bugs
Uploaded: 2022-12-23
Duration: 41 min 19 s
Description: It all starts with unexpected input. Most modern complex web applications rely on regex for validation and implement input normalization. This includes but is not limited to crucial account identifiers, such as email addresses and usernames. In this talk, André presents the REcollapse technique tha

BSides Lisbon · 202241:191.6K viewsPublished 2022-12Watch on YouTube ↗

Speakers

André Baptista

Tags

CategoryTechnical

TopicWeb AppSec

StyleTalk

Mentioned in this talk

Tools used

REcollapse

About this talk

It all starts with unexpected input. Most modern complex web applications rely on regex for validation and implement input normalization. This includes but is not limited to crucial account identifiers, such as email addresses and usernames. In this talk, André presents the REcollapse technique that he has been using it to discover weirdly simple but impactful vulnerabilities in hardened targets while doing bug bounties. He’ll show real-world examples and how to leverage this technique to perform zero-interaction account takeovers, uncover new bypasses for web application firewalls and more.

Show transcript [en]

foreign [Applause] hello everyone it's really nice to be here back at besides Lisbon so first of all I'm Andre also known as oxacb I'm a co-founder of a startup named etiac I'm also an invited professor at the Masters in information security at University of Porto for the last couple of years I've been hacking companies worldwide mostly on anchor one live hacking events as a big Bounty Hunter and I'm a former captain of the extreme security task force CTF team from the University of Porto all right so the agenda for today uh we'll be talking about the user input as always being uh one of the root causes for uh weird behaviors and also we are

going to look at some regex quirks um then we'll move on to the actual technique that I've been using for the last couple of years to find impactful super effectful actually vulnerabilities in stuff that you all use um in the bug Bounty context so regarding this uh the technique and the the goal here is to present a very simple technique that can be very very impactful and then I'll show our how I approach a Target to uncover uh so-called mysterious bugs by using this technique and then I'll show you some real world examples in terms of the examples I didn't get authorization from all of these companies to disclose the vulnerability um but I'll disclose at least one

so um how I come up with this um the technique that I'm going to show you today so I was hacking a Target in a bug Bounty context and I had an URL like this so redirect with the overall parameter to um a host like legit.example.com so I wanted to redirect to another host such as evil.com and it didn't work I also loved the orange site research regarding like URL parsers and so on but those techniques weren't working so I wanted to find a new bypass to actually perform this redirect that would lead to more impact actually than just an open redirect so let's start with the user input so most modern web apps or apis and even

more software rely on validation mostly by regular expressions of course and basically G is just a simple example that will match for an email address format a very simple reacts just end it as an example that will reject aa.com but we'll accept a at a.com uh we also have sanitization of course this is just an example of PHP HTML special cards that will escape the HTML into HTML entities which will prevent cross-site scripting attacks and more stuff and they also rely on normalization techniques this one this there's already some research on normalization from the past but these are just examples um such as the icon function in PHP with the translate option that will basically

will convert and normalize an input like this the string you see on the right and it will basically convert to some weird stuff and also in in Python we have the unity codes and more libraries that will basically process this type of input and we'll normalize it to a strain in terms of the second one the python one it's interesting that not only the accents are removed but also you have like this degree symbol being normalized to um d a uh d e g in terms of the php1 so you can see like that we have some single quotes or quotes being inserted before the E and the I for instance and basically you can see where this is

going so basically if we send this input and this input is like trusted somehow because it's been normalized it's been validated and if we can insert this sometimes in the script context we can basically achieve cross-site scripting with the single code or the the double quote for example in this in this case um so there are some problems with the validation in terms of developers copying from stack over overflow of course so rig X is widely used to very validate parameters and it's like okay I sir please give me a way to make sure that email an email is valid in JavaScript or someone can just find a response like here's a regular expression secure and should introduce

no bugs in your code enjoy it so the developers enjoy doing this and they paste like different regular expressions in their code specially in the backend but also in the front end so it's not mostly not tested by developers and for instance they could use a reacts 101 and try to understand what the regular expression is at least like the explanation part is really useful for those that don't understand that much from regular Expressions so you as you can see we have the slash S that will match any known white space character um in the in the string so we can have some problems with this so we need to have good regular Expressions to make

sure that input will not and inputs that are used in flows um and are they are using many Flows In especially in complex applications and companies so we need to make sure that the regex is properly covering uh the the input scenarios in this case you can see that like even tactics and like it will just go through um also sometimes testing code exists with the typical assertions but it's specific to a subset of the the cases in this example we have the test string but then we have a DOT that will allow a lot of characters even special characters and basically um it's not properly testing the the possibilities in terms of user input

and then we have an example that I wanted to show you uh it's like according to the standard the dollar sign will assert the position at the end of the string for example in this case you can see that we have an assertion at the start of a line and then we have an assertion with the dollar sign in at the end of of the string or before the line Terminator right at the end of the string so this actually means that okay we have a string and we are only accepting that string if it basically the the regular expression it's between that rule um so or before the line Terminator right at the end of the string so if we

have a string that ends with a new line character um it will still be accepted so this is like what the standard says about it but in terms of implementation we have for instance JavaScript and that will basically match the string AIA for these simple reacts from A to Z and it will basically reject the second one which is absolutely right like AAA one two three one two three is not part of the the regex and then we have a slash n a new line character that according to this should be accepted and it will reject a new line character and with numbers after that as well then we have Python and python will have the same behavior

except for the AAA and a new line character that will still match the the actual string then we have Ruby which you will basically accept three cases so to sum it up for this reacts we have different behaviors for the dollar sign this is just an example of these uh the default implementations of regax validations can lead to different behaviors in many cases for different regular Expressions so and one of the problems is that um people actually validate the input and they are not extracting these match data that is what actually matters regardless of the input so they are just like okay the regular match regax uh on on the input and if it's valid we'll use the original input

and not what we are extracting um so this is very very common in backends so after this intro we'll move on to the actual technique um that I call recall apps um how can we bypass most user input validations that rely on regular Expressions but also how can we leverage user input Transformations like the normalizations that I showed you so the main idea behind this technique is to First the parameters but in a smart way so let's start with initial scenario we have the first one it's been accepted of course it's like okay example.com to a subdomain a legitimate subdomain of example.com it will be accepted but if we send evil.com it will be rejected

uh so um this is mostly an abstraction of the technique so we send like an unexpected inputs we just fuzz a lot of input even in the context of a web application we get back a weird behavior and we keep doing this so the Black Box will actually start revealing uh information so the recoilapse technique is about identifying regex pivot positions we don't have actual access to the regular expression in the back end so we'll start by identifying the starting and termination positions I'll show you what this actually means in the next couple of flights then also separator and normalization positions we'll fast these positions with all the possible bytes that we can have in that range from 0 to

FF for example and then we'll analyze the responses to to get some conclusions so this is an example of picking the starting position termination position to verify if we have some assertions in the regex that are missing in terms of starting and and ending properly we also have the new line variations that I showed you as well then we have the separator positions so the separator positions are basically about um special characters so we know that in terms of regex we have ranges of characters rules and special characters in The regex itself and how many times they can like repeat um so the major idea about the second uh Vivo position is basically inserting fuzzing points fuzzing positions around

the the special characters um and take a look what happens if we first with the new bytes on on these positions one by one and then we have the normalization position so typically vowels but it's not only limited to to vowels for instance if we have an uppercase a or a with an accent and so on an ordinal indicator we actually get if we try for instance to sign up on a given application with an email with these characters it will be like converted to lowercase and most of times it will normalize all of these on the username part and also on the domain part so it's for security reasons so to sum it up basically we have all

these positions now and we'll first all the positions from a new byte to uh FF so basically um we'll do this one by one and we'll look at the the responses more examples this is this was an example for legit.example.com where we should insert the the fuzzing the people positions then we have an example for email that will have the same the starting position and termination position as well then we have a username with an underscore and you can see as a green you have the separator positions on the underscore and this is an example more a more complex example for an HTML so this is actually useful if you do it in terms of cross-site basically stuff

that is validating or trying to purify the the input in terms of HTML it's actually interesting to understand if we can get a bypass in terms of attributes and so on um so and for that I'll be launching uh very very soon at my GitHub uh basically at Albert tool that is capable of generating inputs according to all of these rules that we will have for normalization for separator and starting and um termination positions so it will support multiple fuzzing sizes and encoding so you can basically paste them pretty easily in burp Suite or other tools in for instance in wordpressivity you can just like copy paste that um on on Intruder and then you'll be

able to just to give you an example here on the right how it works just for the option of separation and starting determination positions um it will basically just generate an input like this and then you'll just need to look at the the actual responses so I'll just give you an example about the thinking process of this so um I have an application running on my laptop so this is basically an example of you know like you can go to a subdomain of example.com okay so basically the idea here is that you can go to um it basically will redirect so if you we go to burp we can see that will basically uh you'll have a request to

um local test dot me in this case I was able to to be resolving these DNS but whatever okay so basically it will um have uh this functionality if we redirect um we get the redirect endpoint with uh x stock example.com we'll have a 302 font to x.example.com so if we try to go to a different domain like x.evil.com it will be forbidden so the idea here is to basically um fuzz the the string so if I open burp Suite I will just send these requests to the repeater and I'll try to send these requests all right it works okay so we could do this manually but let's try to use the Intruder okay so we'll add a

position here and basically first of all if we try regular stuff like dot evil.com it will be forbidden then if we'll try at evil.com it will also result in a 403 so basically the idea here is to fast positions in the input so just to make it more simple I'll add a position for fuzzing in the end of the string and I'll put it like evil.com so we want to First all the possible byte combinations here to see if we can actually get the 302. so I'll basically just let load a list that I have here are coded and basically I'll start the the Intruder to take a look at the responses all right so if we order by status we

can see that percentage to F which is basically a slash um will have a 302 font but it's not useful at all because we cannot go to to evil.com so we can try more stuff let's keep it simple as determination position that you'll be able to generate all of these cases with the the recollapse tool so basically let's start an attack like this to see what characters will be going through at this stage and we have a 402 to the slash as well but we also have it for the ad symbol so basically we have a 302 font to x.example.com and an ad symbol but we cannot use evil.com so at this point we can try to start

understanding the regex in the back end so if we try to send like at evil it will go through so basically the dot wasn't actually uh preventing the regex to to validate the the URL uh so at this point for exploitation uh we will just need to use something like this that is basically a decimal IP address and basically if we hit send we'll get the 302 and basically if we that we'll go to to a different IP address so basically we didn't know what's the regax and we were able to understand how the fuzzing can be applied to get an actual payload specific for that situation instead of being like spraying payloads that are

available on GitHub for example um okay so let's move on to the actual methodology where we can apply this technique so what to look for so we want to look for for a it's basically literally anything that gets validated in the context of a web application but also other types of software as well so basically we are looking for uh stuff that gets sanitized validated normalized used inquiries and so on uh so the data that is usually have more operations could be like the the name of the user that shows up in very in many places the email address the the username and so on so these will actually open the door to mysterious bugs that we don't get like

what's the impact at the first place and we we try to realize it so the the methodology to uncover these type of weird bugs that basically just required like a one byte variation let's say on your input so the strategy is to set your goal for example uh account takeover and then we'll pick your Target Field for example the email you start understanding the application and you'll see okay so the email is responsible for the most authentication mechanisms and not the username for example so we'll identify all the flows that consume it and I'm just just not talking about flaws within one single application talking about you having an overview of all the the hosts that actually consume

this Target Field for instance you can have a notification sub domain login.example.com then the actual application or whatever like log in with the application in other services and so on so we'll identify all the entry points for this Target Field and for every in endpoint hosts an application you basically apply this technique your smart pick these positions and you'll try to understand what actually goes through and is accepted by these type of filters and then we'll just need to analyze all response codes did you get any successful response and if so like the 302 from from my simple demo basically you'll go into it and you'll try to realize okay so if the rig X

rejects the dot and we cannot of ever host we can actually put like an IP address or a decimal IP address and so on in this case it's always specific to the situation you are analyzing so another question is that okay is the rig X always the same in all the endpoints and applications usually not as I told you the developers copy like and different developers copy different rig axes to multiple backends so basically what's going to happen is that sometimes in specific host or in a mobile application an email with weird characters is like accepted and on the core web app it will not be accepted so this can lead to a lot of problems uh so

the strategy at this point when we have the response codes will pick a weird bite that went through you can have multiple you can have one you can have two and so on so you'll pick a weird bite a special character that went through uh so you'll go out throughout the folds that you identify from step three all the flows that consume the Target Field in the case of email we have the recovery process for an account login sign up or single sign-on email change confirmation it will again depend on the Target Field so at this point hopefully you just found like a mysterious bug so and to realize it you need to look for

um errors and weird responses weird behaviors stack traces and so on and then you'll try to realize the impact or an attack scenario for you in the real examples that I'm going to show you understand this this better if not if there is no bug or there is no impact like we are able to bypass this filter on the sign up page let's say but we don't know what's the impact of it okay there's no impact in any of these flows like a confusion between the accounts and so on you'll go back to step 5B you'll pick another weird bite that went through or you can always go to redefining your goal or picking another

Target Field of course so this is basically the methodology I've been following for the last couple of years and it has been very very successful in our net targets so let's move on to the real world uh examples um so this is an example of an open direct that could lead to a token exfiltration basically so um we have login.redight.com alph and then URL something with the subdomain of the Target that will return the 302 and then you have this location a token with something that is secret and that will actually authenticate the user after or if the user is already logged in when the attacker sends this URL it will redirect to a URL within our token

parameter so as an attacker our goal is to exfiltrate this this token obviously and then after the victim clicks on the link will be able to reuse this token because it has not been consumed to perform account takeover so usually there's some sort of validation through regular Expressions that only allows redacted.com and some domains of it or even whitelists of subdomains for example um so if we try to send like a DOT or a net symbol or so on we'll get a 403 so now what from the demo we will first all of these and you'll eventually come up with a response so this wasn't exactly the case so fuzzing URL with like a a

position a pivot position before evil.com from a null to FF one byte basically will return no useful 302 so only for the ash symbol a slash or a question mark that will basically uh in this case will will not actually go to evil.com at all so my idea on this scenario was like okay let's fuzz all the possible combinations in terms of two bytes in this position so and all of a sudden it returned a nice 302 with 3B 4 0 which is basically um what you see down there um and we can send a a link to a victim and exfiltrate legitimate token to perform an ATO with semicolon and an ad symbol evil.com so and this will

redirect to evil.com an attacker will be able to exfiltrate the Token from the core login process that would serve many applications on this target through a one-click interaction based ATO via redirect well like very ardnance and in this case it was a specific like a library internal library that they were using to validate subdomains and whitelists for the the top level domain so it was pretty impactful and then some of you might have seen this there's a actually a blog post for this collaboration so basically um we were fuzzing a target with the same technique I'm presenting and zlz we were in a Starbucks shop and we were preparing for for Defcon and basically we were hacking a Target so zlz noticed

that a new byte on a sign up request will reveal a weird very weird Behavior you can always go through the blog post to know more details so basically the idea is that okay we were basically fuzzing the username in this case it was an email address actually an attempt to sign up then basically the back end will supposedly check if there's an email that exists like this it will consider it as unique we'll proceed with this iteration the registration process and basically the derived value after being passed between two Services the null byte was being removed in this case it was by we think like a very low level library that is being used by the the

target so when we try to sign up with victim null byte domain.com and only that byte in in these positions basically this would return victim L with an uppercase L at domain.com so we were like what the is going on and basically we sprayed a lot of null bytes and this was kind of like uh art bleed over again because um we did a script to continuously dump this from the target the login system and we'll get all the passwords from even logins signups from a specific region of this this target even private keys that would allow us to do like some nasty stuff basically um even passwords and like many more secrets and personal data pii and so on

so this was a pretty impactful bug again using this technique on specific positions will reveal weird behaviors we just need to go and try to understand what's actually happening that we don't know inside the black box then we have another example uh what I call re-cash deception so basically the idea about webcast deception for those that are not familiar with it is that okay if we go to a specific endpoint this is the traditional more traditional web cache deception is that basically we go to an endpoint and we add like a CSS or static extension and it will basically if it returns a 200 and it's cached we can basically access it from like another session or another location

or whatever without being in the context of the the user session so basically um we have this Target redactor.com and then an API that will return details about the user and also the API token of the user that could be basically uh if we get this token again we get access to the account so basically we try to go to Dot css.pdf.js and it will return the 404 so no caching at all so the caching rules are are usually also regax based and the static extension is not enough these days to perform web cache deception so many times we need to enforce the correct content type in the in the response so if we send like dot PDF if

there's not application slash PDF in the response it will eventually not get cached it always depends on the rules of a specific application or something that is in the front of it so okay let's fuzz it according to the same technique it's always the same so we try to smartphase it as well so we will put a pivot position after user we also tried a bunch of other stuffs and other positions but we try this one as well and we try it again from 0 to FF and basically um and well-known extensions as well mostly static extensions from min.jsmin.css jpeg PNG and so on and basically it actually returned at 200 with a new era encoded version of a Nash

symbol and a question mark percentage 23 and 3F and this response have an age a caching header for H and X cache it so basically if we send this to a victim which is logged in in the context of the application then we just need to access the cachet content from our end and we'll be able to retrieve the actual response that you see near and reuse the API token as the the victim and perform account takeover via webcash deception this one I got authorization to disclose this one thanks Shopify folks for having enough some bug Bounty program so Shopify offers a sign up login with Shopify of mechanism so the or scope is well documented and includes

email addresses to log in in multiple applications basically you can build Even build as a developer an application like this so there was this application that Shopify acquired I think like taller dot app which was in scope for an engagement and basically the email address in this app and many others doesn't need to be verified to create an account you can create an account on Shopify and then you'll be pending email verification and basically you need to verify it but you can log in to to other applications if they are not actually enforcing this so but if the email already existed in taller.app or in the Target application you can log in or sign up with the

victim on shop if I because you don't know the password you can have two fi and so on and you can cannot also sign up because the email already exists so our goal was like to understand if there was some weird Behavior again so we went to the subdomain responsible for sign ups and login process of Shopify which is accounts.shopify.com and there was a proper regex in place in terms of uh when you are already logged in and you found first the email change request so no weird characters are allowed um but then we tried we identified all the flows and all the endpoints so we tried all the other ones and a sign up

request on accounts.shopify.com and basically victim with these weird eye character with a circle will go through actually and will show up like that so if we eat login with Shopify in this state with an account on on this state if even like being unverified it will bypass all the process in the Target application and we can successfully take over the the target account so I have a video POC uh here so we are on accounts.shopify.com we were not able to change it here this is the actual victim's account so we go to another session we log in and then we in this case we'll be basically signing up with Shopify and after this [Music] um

go a while all right so basically we'll uh in this case it wasn't and I actually it was an a from this video example so basically we put like a password and the name we are not the victim this email without this weird character already existed so and if we go to taller dot app it will basically will be in the victims account right away without even verifying our email address so the takeaway here is that um so basically oauth flows this wasn't the only case where we were able to exploit a bug like this in terms of normalization because oauth flows and single sign-on saml assertions and so on in terms of email addresses and

usernames when they are retrieved or they are basically by the Target application normalization is often used in these flows even by the libraries that basically all these saml libraries use a lot of them not actually normalize the email address that is being retrieved and if that happens the problem is not that is basically that in the source The Source shouldn't allow the the wheel characters and should properly validate and normalize the email in the first place because in terms of the destination of the authentication the target application uh it's actually it makes sense that they normalize the the stuff that is coming through but if not in the login provider these is not taken into account it will

basically result in this type of problems then we have a final example this was a very impactful one I didn't get authorization to disclose it unfortunately so the target is actually an email provider and our goal is to take over a victim at target.com inbox without any interaction people can sign up as username at target.com or use the current email address so let's explore all the flows that the email provider has so we try to recover the email from the victim and this will present like a redacted email address and we can send the code to these email address and if we have the code we can use a recovery email address and we can have access to

the victims account so Evan basically the victim email as the attacker recovery we email will require email verification but if we go to another session and we try to recover the actual victims account uh it will result in a change in the flow so here and one email showed up and if we as an attacker add the victim's email as a recovery email we get like some confusion so it returns now we insert okay we want to recover victim at target.com and it basically we want to use the same email to perform a recovery which doesn't make sense like I don't have access as a victim to this account and also the same redacted original

recovery email address from the victim so some sort of free x was matching at target.com in order to distinguish both account types the ones that you create at the target.com which is basically the email provider a specific email or an external email address so after fuzzing the email parameter with the same technique some special characters were displaying the same recovery email addresses so having a recovery email address and we felt that the the regex wasn't properly built it was actually matching the the at target.com rig X because it wasn't assertion asserting the end of the string so if we added the victim email address at target.com.ourdomain.com it will basically show up as a recovery email of

the attacker's account but as option two we still have a confusion that will show the original victims email so after recovering the code via email to victim and on our subdomain that looks like the Target in is like target.com.ourdomain.com and we set up the DNS and MX records for it we will be able to select an account and we hit the victims account and boom like we will take over the victims account and bypass like all to fi and so on and get access to the email inbox without any interaction so this was by far one of the most impactful so far since we are talking about an email provider all right to sum it up uh some takeaways

so um developers always test fuzz and fuzz your regex and rely on well-known libraries but as you've seen like even the core functions from programming languages have some problems in terms of regex simple input modifications can result in great damage so we can just fuss by flipping or having bytes and applying some sort of like what fuzzing those in terms of web as well so Black Box regax testing is still not very touched so it's a very creative and manual work but go for it there's plenty of stuff to to get in terms of bounties I'm pretty sure and I don't have the time for it so go for it rig X behavior Can reveal information about libraries

the programming languages because if you actually try to and you are suspicious that the back end is being programming like PHP or node.js and so on you can actually try to do the same validation and try to understand if like new lines and all these weird characters go through and you can understand the backend language and get that information which can be very useful for further exploitation so if something is being validated and you can bypass it even in terms of fluffs that prevent payloads just fuzz on these positions and I'm pretty sure that will basically be able sorry you'll be able to find the bypass pretty easily actually instead of being just spraying copy paste payloads

from you know GitHub so the goal here is that something very simple it's the opposite of like complex exploitation it's very simple but what's this difficult is to have a big picture of the target realizing the actual impact and then hopefully you'll see the big picture so to finish I would like to thank many people that helped me review this talk and to also that were great collaborators I couldn't enumerate all of them in here but big thanks all of these dudes and then also my team from my startup where we have been researching this technique as well the disturbance team and also the Acker one team for changing my life in the last years as well all right thank you and I

hope you liked it [Applause] thank you Andrea anyone for any questions

oh okay thank you Andre thank you [Applause] [Music]

Till REcollapse: fuzzing the web for mysterious bugs

Related talks