
hi everyone my name is shanani I'm here to talk to all of you about current and novel methods of enumerating cloud assets I'm a security consultant who recently started a new role at root shell security you might notice the slide say with secure on the bottom right that's because all this research was done when I was working out with secure you can see my socials listed here I'll put at the end as well um I'm particularly interested in application security and Cloud security so if anyone wants to talk to me about that afterwards I'd love to chat I'm going to begin by giving a summary of AWS Cognito and I'll talk about my experience learning about the
service I'll move on to current methods of enumerating cloud assets and I apply these methods to find 300 active instances of AWS cognito I'll then move on to talking about certificate transparency logs and I'll talk about how these can be applied to find um Cloud assets and I'll conclude with some takeaways from the research this is how I found 300 active instances of AWS Cognito exposed to the internet Cognito is an identity provider it lets you make a login page for your application and it's hosted on AWS it consists of user pools and identity pools a user pool is a directory of users and each user pool has many clients which interact with this directory these clients can do things
like login to um login as a user they can change attributes of existing users and they can even like register new users just to name a few things identity pools provide access to the AWS account uh for users so what's so interesting about these login Pages built on Cognito compared to any other login page why do we care about finding these login Pages since it's built using AWS it uses the AWS AP I to work that means we can turn this page into this this is the AWS CLI command for authentic authenticating to a Cognito client using the user password authentication flow on the back end of that login page of that application this is what it's
doing the developer may not realize that users can instead circumvent the login page and directly interact with the client and this can lead to an unmanaged attack surface if this is the case this could be exploited so this is what the normal flow looks like you can see myself on the left using the login page and the page makes an API call to Cognito and authenticates myself to that application but as an attacker I can do this I can use something like the aw AWS to directly communicate with Cognito the developer may have secured the web application but they may have misconfigured the client which I'm now directly interfacing with and if this is the
case this can be exploited before I continue I just want to give a bit of background on how all this research actually came to be it didn't start off with me wanting to find out how I could find other people's Cloud assets I just wanted to learn about how Cognito worked and how to break it I found blog posts online talking about how it can be attacked I used bug bouncy writeups and I configured several instances on my own and the AWS platform made this really easy to do I could set up a new instance in minutes and crucially it gave me an insight into into exactly what the developers see when they set this stuff
up this is really important from a cloud security PP perspective because I need to see which misconfigurations were feasibly could feasibly be made and this diagram is a testament to this developers may not realize they're responsible for managing the whole attack surface of the API in a normal application they they could get away with making sure that login page was Secure but since we're operating in the cloud this new attack surface is introduced that I can now directly interface with from my research I covered a list of potential miscon configurations for Cognito specifically I found in certain circumstances an attacker could bypass password policies they could register new users to the application even though the login page doesn't let you this
could allow you to access an application essentially unauthenticated they could take over anyone's account and escalate their privileges to administrative levels within the application and they could also get access to the AWS account using identity pools and you can see more information if you're interested by looking at the with Secure Public Cloud Wiki SEC wiki.
Cloud now I had found a bunch of misconfigurations for Cognito and I was excited to see if these were actually feasible but there's one issue I didn't have actually I didn't actually have anything to attack I couldn't use my own instances to testy misconfigurations because I was trying to test how feasible these configurations were in the wild I needed to find Cognito clients exposed to the internet and this led me to look into Cloud asset enumeration I'm going to Now cover some current methods of enumerating cloud assets assets can be identified typically with a domain name or in general some identifier what's important to know is that these identifiers have a common shape for an S3 bucket this isn't is a
bucket name and you can see the following identifies on the top we have an S3 URI and on the bottom we have a domain name and both can uniquely identify an S3 bucket by first looking at service and determining determining these uh the shape of these identifiers we can build tooling to Brute Force instances of these services for S3 um a lot of tooling exists already uh one one tool in particular is called S3 scanner it takes in a list of bucket names and it just checks if they exist and also does some basic enumeration to see um if they can enumerate any permissions on it now since these identifiers have a pattern tooling can be used to
automatically find them at mass in web pages but what we need is a lot of data um specifically a lot of HTTP responses we need to look through have a lot of responses to look for identif in we can use to again we can use tooling for this web callers like Katana can be used to scrape web pages for URLs and then we can use grep to filter these URLs to find identifiers additionally we can use you can use httpx which is another tool which can make HTTP requests at scale to all of these URLs and then we can look through the responses to look for identifiers websites typically typically contain JavaScript files containing the
logic for the application and this logic may include the logic it uses to interact with various cloud services you might find a bundle.js file and this is likely um if it uses a cloud service it's likely going to contain identifiers for whatever service it's using
finally there are some other notable techniques which could be considered similarly how to how we look through source code of web applications we can look through source code of native applications as well whiz in their talk um scanning the internet for external Cloud exposures they look through the source code of uh Native Android applications and they found 30,000 um Cognito identity pools there also are some thirdparty um sites which does most of the work for you gry hat Warfare passively scans and list the contents of various cloud services like S3 buckets and anyone can just passively look at this website and just look at the contents of various buckets let's take it back to Cognito I needed to find apply these
techniques to find instances I could test these Mis configurations on but firstly let's understand what we're looking for in the context of Cognito we're looking for a client ID functionally this is equivalent to an instance of Cognito that we can attack they're about 26 characters long consisting of alpha numeric characters which is unfortunately too long to Brute Force so that means we're going to have to find them somewhere when researching Cognito I came across a feature which I found particularly interesting and this was the Cognito um hosted UI it provides a login page which works out of the box but interestingly the default domain for this hosted UI is always a subdomain of Amazon Cognito
docomomo docomo hosted UI page and in the URL as a there's a client ID as a get parameter and that client ID is what I was looking for I use a tool called get all urls or gu and this callws for URLs who um who have a specific domain name or a subdomain or a specific domain name in this case I told it to look for subdomains of Amazon Cognito docomo through open F exchange way back way back machine uh common crawl um and it looks for URLs URLs for a specific domain
from this I was able to find 1,659 unique client IDs now this was great but there's one issue a lot of them were stale meaning they were made a while ago and then since then have been deleted I was able to leverage an AWS API Call to determine if they were active um at scale by trying to authenticate to the client ID it would tell me hey this is deactivated uh or it would say oh um the login failed meaning it was active and I was able to do this across all the client IDs I had gathered what I was left with was about 230 working active client IDs and unique client IDs it's also useful to try to apply
Crea ity to think of other sources of gathering Gathering asset identifiers in my case I was looking for back links if you're unfamiliar what a backlink is a backlink is a link from one page pointing to a different domain and in the case of the Cognito hosted UI that would typically be a back link so I would have my website and I would have a link to the something Amazon cn.com as a backlink and that would be the hosted UI so I was able to leverage SEO tooling to find in to find instances of these back links SEO tooling search engine optimization tooling can be used to provide metrics for your domain one tool I used U allowed me to
use the Amazon Cognito Doom domain and would to tell tell me metrics for that domain and one of the metrics it told me was backlinks to that domain and that included URLs and inside that URL to the hosted UI were client IDs combining it with my original data set I was left with 282 unique and working client
IDs as shown before these client are really valuable they're like Goldust to me similarly how I similarly to how I could programmatically determine if they were active I could programmatically determine if the instance was vulnerable I could use the API ad API calls to try to sign up new users given a user I could see which attributes were mutable and that could form a privilege escalation Vector within the application I could also Instead try to look for identity PS and this could provide me access to the AWS account and the important thing to know is that all of this can be done at scale uh
programmatically I'm going to now move on to how I can how applied a novel method for um how applied certificate transparency logs to [Music] find to Cloud assets but before I describe this technique I want to give a bit of a overview of certificate transparency logs at a high level CER certificate transparency logs are a place that anyone can go to view a to view a list of all x509 certificates issued by various C these are the same certificates used by U web servers to encrypt traffic for TLS and the reason that certificate transparency logs exist is because of incidents where some Cas were issuing certificates which were which should not have been issued essentially in 2011 a
group was able to intercept half of the traffic going through Iran despite them using TLS and the reason was is because one of the casa was um issuing certificates they were shouldn't have been issuing and this and since this CA was trusted by our browsers this meant we as end users trusted those certificates as well the web public key infrastructure depends on this list of root certificate authorities who they trust is essentially who we trust certificate trans certificate transparency logs are a log of trust to make it clear to us like who we are trusting and over here you can see an example of an x59 certificate so you might be wondering like okay what's why do we care about
these certificates well in addition to the public key which is used for TLS it contain some interesting information for us you can see the issuer of the certificate which which is key for the technique laser also some X x59 B3 extensions and these contain typically contain more information about the identity of the person who um who who the public ke for for example there's a subject alternative name which is a a quite useful extension which contains in this case contains more information about the domain name a quick overview on how the certificate transparency log process works on a low level it begins with a domain owner uh wanting requesting a certificate for a CA the ca creates a
copy of the certificate called a pre- certificate this pre certificate I'm just going to call the fake certificate because it's the same thing as the normal one but has its extra extension this extension is called a poison extension it means that if um if it's if a server tries to use it for TLS the user agent won't accept it so we can't actually use this to in Crypt traffic it's a fake certificate the ca creates this fake certificate and sends it to the certificate transparency log service and the servers spit back something called
anct sorry this sat or sign certificate time timestamp is proof that this certificate has been publicly disclosed the ca takes this SE and takes a certificate and hands it back to the domain owner so now the domain owner has two things a certificate which acts as um Pro as a public key and prove that they that public key maps to their identity and an sect which they can give to um user agents and that access proof that the certificate has been publicly
disclosed so I covered that already so this is an overview of the whole process but at a at a high level all you need to know is that certificate transparency logs are a publicly auditable log of all x509 certificates issued by certificate authorities so in this case now if a malicious group manages to trick a CA into issuing a certificate that they shouldn't like have um have at least it's public so everyone can see that's happening and someone go like hey maybe you shouldn't be issuing that certificate to to to re to reiterate at a high level all you need to know is that there's a public log where anyone can see all the TLs certificates being
issued by C AWS has its own set of Casa and I found this website called Amazon trust.com that lists all the information related to the certificate authorities that uses and interestingly it uses four root Casa for the certificat issues so when you use a certificate issued through AWS it uses ACM Amazon certificant manager and this uses one of the four root CA um displayed here when I found this out I asked myself what happens if you look for the certificates issued by these one of these four CA crt.sh is a postgress database with a website front end it's allows you to look through certificate transparency logs easily using its um using using the website so as a proof of concept I use
crt.sh to look at the certificates that one of these root Casa have issued and from the diagram you can see there are about 23 million certificates in total issued by one of these root CA and each one of these certificates each one of the 23 million is linked to typically linked to a domain name and this domain name is likely going to be pointing towards a cloud asset because it's the the certificate for it is issued through ACM subsequently I built a tool which extracted as many of these domains as I could and I got about 168,000 domains due to limitations from from CSH um it has crashed quite a lot I could have gotten more certificates if
I directly queried the CTL servers rather than crt.sh it's like a middle person between the end user and the CTL service now what I had was a list of domains and these domains were from certificates issued by the Amazon rout CA they weren't necessarily Cloud assets but is likely and I can do some Transformations as further research search to filter this list I could try resolving all of the IPS and just getting rid of the ones which didn't fall within the AWS API IP ranges additionally I could use some of the tooling described earlier and I could try to find URLs for some of these domain names if it's the case that could give me some information like asset
identifies for for example if I'm looking for Cognito I could use the URL um I can look for the client ID and the get
parameter Cloud asset enumeration is a new research area it's a subsection of attack surface management or ASM unmanaged assets present an overlooked ripe attack surface for hackers to exploit now the intersection with this of this with Cloud security is particularly interesting the shared responsibility model is conducive to producing an unmanaged tax surface the shared responsibility model by the way is um is a model which operates on and it means that AWS are responsible for some things like managing the infrastructure and the developers are responsible for other things like ensuring this stuff is configured securely but when developers apply typical um development practices to the cloud they may not re they may assume that since it's built on the cloud it's
automatically uh secure this is wrong and because of this Cloud asset enumeration is a really interesting and and Powerful research area to go look into the platform because of the shared responsibility model the platform assumes the developer knows that they have to take care of something but the developer may not even know that thing is an issue Cloud asset enumeration is a research area which is growing Ingenuity can be applied to find new methods of enumerating these assets and the impact of finding the right one could be catastrophic unless we find it first thank you everyone for listening
any
questions um so if you were running your own cloud services this is looking at other people's attack Services I guess rather than be having to run and check against stream whatever check is there a I get data out of um control tower or something else in AWS to help me understand how many times you've gotun should so do you mean like um assessing your own like assets um I think there are some services in which give you an overview of your account I'm not sure specifically on of what which ones um there are but there a lot of there are also a lot of offerings which provide that this is more um this is like more in general looking
at like things related to your organization rather than a specific Cloud [Music]
yeah yeah so this is this is a really important point for um this sort of research um I want to stay away from intrusive uh tests um and when it came to adding users to user pools I didn't do that and one of the reason was is that it's seeing someone try to log into your account uh log into your login application is less scary than seeing a new user in your login in your application if there's not meant to be any users um but absolutely yeah so so when it came when it comes to this stuff in general you can classify stuff into like passive active intrusive and passive in general it's okay to do because um it's not can't
directly be attributed to you active is probably where you want to start dra the line and intrusive you shouldn't really go for um but that's a really good point yeah disclos configurations um so again like I didn't actually do the um scan to look at the misconfigurations because they were intrusive um and the stuff that I did find this is more as I wanted to use this as like a precursor to show people that hey this is there's this research area and it can be quite impactful like if you find it's a lot easier to find um it's a lot easier to attack something which hasn't been looked at very much rather than something has been looked at quite a lot
but you need to find that thing um and Cloud asset numerations is is is quite good for that so I didn't actually like try to run any of exploits against that because again it's too intrusive the reason it's easier that's pretty much it um but there you do get a lot more coverage if you use the certificate transparency log servers directly and interesting since is since you can just by the way these are these these servers they just these they just host all these shards of data you can just download all them and in query them locally and essentially this can almost be done passively because a lot of people will be downloading this data
anyway so um yeah given more time I would definitely prefer to have done it uh by just downloading all the data from the CTL
servers yeah absolutely um so this is quite a big project to build a tool around there there are definitely organizations doing it already um ASM is quite a growing part of industry and for a good reason I I'm always of the belief that it's a lot easier to attack something which you don't put that much effort into securing and ASM is really good for that um so I I would I'm I'm planning on doing some making some tooling around it but it does take a lot of effort and definitely organizations who have put a lot more money and ow into that all right more questions right thank you very much thank [Applause] everyone