← All talks

Pwning Android Apps at Scale

BSides Ahmedabad26:431.3K viewsPublished 2023-02Watch on YouTube ↗
Speakers
Tags
About this talk
Sparsh and Shashank from CloudSEK present research on discovering vulnerabilities in Android applications at scale. They introduce BeVigil, an APK scanner and internet-wide reconnaissance tool that indexes over one million Android applications to uncover hardcoded credentials, misconfigurations, and backend service vulnerabilities. The talk covers their Log4J-based vulnerability assessment campaign, which identified 300+ confirmed remote code executions and 400+ SSRF/open redirect vulnerabilities across banking, healthcare, and other sectors, along with an open-source CLI tool for accessing the indexed data.
Show original YouTube description
Watch Sparsh and Shashank from cloudsek to present their talk on Pwning Android Apps at Scale Slides: https://www.canva.com/design/DAFN6i0Z7b4/TW_hNUIkBAdnJc3zmyIRSA/view?utm_content=DAFN6i0Z7b4&utm_campaign=designshare&utm_medium=link2&utm_source=sharebutton
Show transcript [en]

before we move on we need to wish somebody it's for us birthday so let's all wish for a very happy birthday from Team besides to you I'm very happy for our many many happy returns of the day all right so coming up next to talk about open source tool to help with automation research and Analysis of Android application let's give it up first kulsarisht and Shashank virtual from cloudsec okay so welcome everyone and thank you so much for attending this session today we will be talking about pawning Android apps at scale right uh in this research we exploited multiple vulnerabilities in the backend Services which are powering these Android applications uh but before we get into the research and

the findings I want to give you a little background about how we reach these findings so it started with a project that we've been doing we call it be Visual and be visual is a APK scanner but it's not a regular APK scanner the whole the whole theme of be visual if to not to scan an APK from a vapt point of view but rather scan an APK from the data collection and indexing point of view so can so that we can do more and more reconnaissance on the data so with that I would like to start the stock uh I am Spurs and I have Shashank with me we are both security researchers at

cloudsec we are part of the r d team which takes care of the cloud check ASM product and the further acknowledgment of this project goes to my other awesome teammates at cloudsec who are part of this project and these are the things that we're gonna cover today we'll start with the motivation behind this project and then we'll look at some of the problems which we saw in the current mobile ecosystem and then we'll look at our Innovation that is the our solution to these problems we call it be Visual and then we'll have an overview of the data we have indexed so far in our inventory and then Shashank will take you through the research and the findings and and we are

also launching an open source tool today by which you can have access to the data we have index so far and hopefully we'll be having some time for the Q a so internet-wide data Gathering and scanning so we as a security Community have been doing internet-wide data Gathering and scanning for a long now uh we are not the first one to do so and we do it in order to identify widespread vulnerabilities and misconfigurations and Patch them before the bad guys get their hands on them and we have a lot of tools and projects in this category but I have mentioned these four tools here which I believe have made a real impact on our community

uh first off we have Shodan Shodan is a search engine for Internet connected devices and the most awesome thing about Showtime are The Showdown filters so using The Showdown filters you can correlate an IP to a SSL or or you can map out a specific kind of service on the internet which is very cool and next up we have project sonar by rapid7 they also used to scan the internet for a particular service and after that they used to post this data online project sonar has been discontinued for some reason but the data which they have provided there are a lot of tools which are built on top of that data set and we have a Wayback machine which is also

known as the digital Archive of the internet payback machine is collecting the data of web applications from the internet for a while now and census.io which is also very similar to Showdown so while you can see we have a lot of tools which are collecting the data of the web applications or the networks data but when it comes to mobile applications we are not really doing much of the data collection there so that is why we chose mobile applications for our research now let's look at the look at some of the problems we saw in the current mobile ecosystem the biggest problem that we saw is that the huge amount of Android applications which are out there

so this is according to a report we have more than 14 million applications spread across 80 plus app stores and growing every day and most of these applications never pass through a security test and we also have mobile only startups and I'm talking about unicorn startups here so a mobile only startup is a startup which offer most of its functionality through a mobile application they don't have a web application at all uh and the next problem we saw is very well known that Android applications are notorious for hard-coded assets and secrets now look at the amount of assets you can extract from a single applications a single application you have URLs you have uh parameters you have endpoints

now when you multiply the assets that you get from a single applications to the total number of applications which are out there the resulting data is is insane so it's a lot of data and this is where we believe that there is a hidden attack surface which we are trying to uncover with this research and the last but not the least problem that we saw is the limited availability of comprehensive data sets and investigative tools so like we have shoden for web applications and network where you can correlate multiple things but when it comes to Android application there are no such tools so for example you found a you found a zero day in a

popular Library which is very popular among Android developers you literally have no way in order to identify what other applications are there which are using the same Library so these are some of the problems that we saw and we came up with be visual a solution and this is in a nutshell what we did so the step one of our research was collection of mobile applications so we used a play store for the majority of those applications and we use some third-party app stores as well so right now we have indexed more than a million Android applications the step two was decompiling all these applications so for that we use ZX and after decompiling once we had the source code of all these

applications we did some static scanning as well but the focus was on identifying more and more assets and Secrets From the Source Code so for that we came up with a lift of more than 250 regex patterns and we ran it on the source code of all these applications and that way we identified a lot of assets and the last and the most important step of This research was providing this data set along with the search functionality to the security Community because we alone cannot fix all the works so this is the pie chart of Secrets and API Keys we have index so far in our inventory so here you can see I don't know if it is visible in the back we

have AWS API Keys we have GitHub or tokens API Keys some some of these uh API keys were very critical in fact some of these are capable enough to pawn an entire organization so when we see these many API Keys coming in we decided to report them but reporting 1.6 million hard-coded API keys and that to manually was not an easy task in fact I don't think it's even possible so for in order to report them we set up an entire reporting pipeline we built a validator API the validator API takes a token or a API key determine what kind of token it is check if it is valid or not and if it is valid uh automatic

reports will be generated and it will be sent to the developer or the respective organization so so far we have reported to more than 600 different type of organizations with the findings just related to API keys and secrets but this talk is not really about the API keys and secrets this talk is more about the assets that we have uh indexed so far so uh in from 1 million applications we have indexed more than 294 million assets and this is a pie chart of some of the assets you can see we have URLs here so like we do like we use a way back machine we give provide a URL and we get all the associated URLs with that

particular application in the same way this data set can also be used and you can see we have file names we have rest API endpoints uh we have hosts we have file paths I don't need to explain the importance of finding these assets I think the talk we had today in the morning I have given the importance of these assets along with the examples of how they can be exploited so these are some of the assets that we have in our inventory apart from these we have more assets we have 1.3 million unique subdomains and it has been integrated with 15 plus open source subdomain enumeration tools so if you are using sub finder amass or if you are

using fine domain you are sorted because be visual is already there as a source in these subdominant navigation tools what you have to do you just need to add your B visual API key in the configuration file and you are sorted so this is a graph of the cloud storage buckets that we have indexed in our inventory here you can see we have close to three lakh S3 buckets so how many of you use gray head Warfare to enumerate buckets or it's a searchable data set for open S3 buckets yeah so in a similar way you can use that as well uh so I was comparing the data we have here to the data of uh gray head Warfare

and this is slightly higher than the data of gray head Warfare not all of them is configured but as I said we also have a validator API so I can tell you that a lot of them are misconfigured

okay so this is a pie chart of other kind of assets we have in our inventory these are not directly exploitable uh but when you find a set like this there are chances that there is a API key or a token hard coded in the source code along with these URLs so like we have AWS appsync graphql URL here a lot of times in fact the most of the times I have seen whenever I found a graphql URL I will find a API key as there as well which can be used to query the graphql endpoint and we also have Amazon execute API URL so uh so what I'm saying these are not directly exploitable right so

don't go report them blindly to the customer or to the through the bug Bounty program you are hunting on you need a story to build up you need to give the guy a context how you reach the findings and how they are impactful so I found on uh just few days back I was hunting on a program and I just searched that top level domain on the B visual search I'll give a demo of that as well I just searched the top level domain uh their official app came up and while I was looking at the URLs they have in their official application uh there was one AWS execute API URL so nothing can be done right now so I searched that URL

on GitHub and I found a repo that repo was posted by an employee in that repo authorization tokens were there which can be used to access this API and I got access to some of the endpoints and with that I was able to execute uh some some actions on that API now that you know and one more thing I would like to mention here that there is a supply chain running on top of Android applications there are companies which have Android applications for their employee only or there are companies which have a separate application for their vendors or for their Distributors and later in the chain uh these vendors have more applications so I'll explain

this with the finding with a security incident we came across so one time we found let's say there are two companies company a and Company B so what we found we found the credentials of company a were hardcoded in the source code of Android application of Company B right and when we investigated on this issue we we found out that both of these applications were developed by the same software development company and later these applications were posted on Play Store by the respective organization accounts so now if you look at this finding from a black box point of view or from a Bounty Hunter's Point of View there is literally no way for him to

identify that there is a loophole in the systems of company a because some developer hard-coded their credentials in a different applications uh now that you have seen what kind of data we have uh the question is how can you have access to this data so there are two ways by which you can have access to this data one is through the code search functionality and the other one is through an API it's an Open Access API you just can you can go and sign up and you can have access to that API so this is an example of the code first functionality uh in this screenshot I've searched for URL parameter I search for URL parameter

as it is well known for vulnerabilities like ssrf open redirect Etc in a similar way you can search for any kind of URL it can be it can be any parameter it can be any company keyword as well and the finding that I was talking about that's exactly how we how we came across that finding we just searched for the company keyword here and all the all the applications which have the mention of that company in their source code came up and in one of that application we found those credentials uh the other way to have access to this data set is through uh API we call it ocean API and for that I would like to

invite Shashank who will tell you more about the ocean API and he will also take you through the research and the findings

thank you welcome everyone uh so I'll be taking over the presentation from now on so as you just see uh the code search is one way to have access to this data another way through which you can have access to millions of organized assets is through the bewvigil ocean API so we have created an API over the data set of millions of organized assets we have and it is uh publicly freely available for everyone to use more details about the API can be found in the links mentioned here so now let's have a look at some of the features or I should say some of the endpoints that are offered by the be visual ostent API

so you can see that we have bunch of endpoints here the world is host uh URL params subdomains Etc so one such interesting endpoint that I would like to highlight here is the apps endpoint so it allows security researchers uh to extract or to enumerate all the Android packages or Android uh applications that are associated with a domain name uh so we disc we observed that a lot of researchers do not even look at the source code of Android application until unless it is mentioned in the scope of the Target and due to which a lot of assets that are embedded inside the Android applications are never discovered and they go untested and leaving room for a lot of

vulnerabilities so that is why we decided to create this endpoint which simply takes in a domain name and gives you all the related Android packages right and then once you have enumerated the Android packages then you can make use of other API endpoints like you can through the wordless endpoint you can generate a wordless out of an Android package which you can use for fuzzing and you can also extract URL parameters sub domains and URLs from an Android package so another such interesting endpoint that I would like to highlight here with an example of the actual API call is the S3 keyword search so this uh endpoint allows you to enumerate all the S3 bucket URLs that we

have discovered in more than one million Android applications which have a common pattern or common keyword in them so during some vapt engagements we encountered that a lot of companies have patents in their S3 bucket name so let's say the company name is ABC so they might have S3 bucket URLs that start like ABC hyphen prod ABC hyphen resources and you get the idea right they have patents in their SD bucket names uh so that's why we have design Point here which simply takes in the keyword or a pattern and gives you a list of all the S3 bucket URLs uh that we have in our inventory so in this example we have searched for

the healthcare keyword and it has listed all the SC bucket URLs which has S3 in their URL or name right okay so uh now we now you have seen that we have a lot of data we have more than 290 million assets and we have a very handy API through which we can extract all these Assets Now the bigger question is how can we make use of this data uh how can we utilize all these assets to do vulnerability research or to scan the internet right so one way through which we can uh we could think of utilizing this this data as uh through smart fuzzing and uh so we can use all the collected

data points uh to conduct an internet-wide scan on the internet for vulnerabilities right and and by smart fuzzing I mean by using the correct endpoint and using the correct URL parameters right so we decided to use this methodology of smart fuzzing uh to conduct an internet-wide scan for vulnerabilities for which we choose all the URLs that we have identified in millions of Android applications as our Target but as you all know that Android applications can have some targets or some URLs that are not useful for us like Facebook URLs uh WhatsApp business links and Instagram Pages links right so that is not what we want to test our spam so that is why first we filtered

all those targets of all those URLs and once we have a complete uh validated list of all the targets we can scan this is the structure of fuzzing that we followed so for every validated Target so let's say in this case uh netflix.com we first use the oceans API to enumerate all the Android packages that are associated with that domain name so in the example you can see here that we have uh shown like five of them so once we have these packages we can again utilize the ocent API to enumerate all the Android URL parameters that we have discovered inside these Android packages right so once that is done now we have the correct Target we have the correct end

points to hit and we also have potential correct uh parameters that we can first and now we can force them with any payload you want so back in the time when we were doing this research when we were collecting apps the log4j vulnerability is typed in and shot the internet and uh since most of the Android applications are written in Java and also the log4j was a Java logging utility and the fact that more than 70 mobile applications are never tested for vulnerabilities so we assume that a major portion of uh the mobile application ecosystem could be vulnerable to this so that is why we decided to pick a log4j um to test the internet for right

so these are the Snippets of the HTTP requests that we send to each Target to scan them for vulnerabilities for each Target we send both get and post do not miss out any any functionality and you can see here that for every enumerated parameter throws in API we were fuzzing the parameters with the log4j payload and you may also notice that all the payloads have a very customized value all the payloads are different from the rest and that is because in every lock for JP load we were putting uh the host name Java environment variable for for validation and apart from that you're also putting in the Target name the target we are fuzzing and also the

parameter and header we are fuzzing and the reason behind uh putting this information in a payload was because a lot of times you're encountering that we are sending the payload to domain a and we are getting a hit from domain B and that is when we were not able to track okay where is this hit is coming from so that is why we decided to put uh the target information as well so that we can easily have track of where we actually sent and from where we got the hit from and to correctly identify which of the parameter or which header is actually vulnerable which is being logged we try to put the parameter name and the header

name in the payload as well and in the post requests you can see that we are sending all the parameters in the payloads in the Json format so the reason behind that is a lot of Android applications rely on apis to communicate with the backend server and since Json is a pretty uh common data exchange format in apis so that is why we decided to use that so once we rolled these all these requests out to all the targets uh the outcome of This research was pretty interesting we identified more than 300 uh confirmed remote code executions and more than 400 ssrf and open redirects we use the hostname validation the hostname reflection technique to

validate whether or not it's actually an RC or just a false positive and we check the HTTP hits to identify the ssrfs and open redirects and the 300 RCS we are talking about here were from every sector you can think of banking Transportation Health Care education everywhere in every sector we identified multiple remote code executions uh so the impact was pretty huge here and at the time when we did this research we only had index 80 000 applications and nearly 1 million of urls and now we have one million Android applications so now you can think of what could be an updated number and how bigger the problem is and since this issue was picked so we

decided to uh disclose the issues responsibly to the authorities of respective organizations and followed up with them until the issue is completely mitigated yeah so uh this was about the research and we have a little gift for the security Community here so be vigil CLI is a command line utility and a python library that is built as a wrapper over the ocean API that provides you easy access to extract all these assets from the oversint API uh more details regarding the usage and installation can be found in the link mentioned here so let's have a quick demo of it so now let's say you want to enumerate subdomains for any domain let's say netflix.com right so you will run be

visual CLI enum sub domains and pass in the domain name netflix.com and it will give you all the sub domains that we have encountered in one million Android applications that are related to Netflix and if you have an automated Recon flow then you can make use of the bvisible Python library to achieve the same thing and with that we move to takeaways and conclusion So In This research we try to uncover a very uncommon yet very important attack surface that through b visual which is mobile application attack surface which a lot of researchers tend to ignore until it is mentioned in the scope and looking at the number of API keys and misconfigurations we have discovered

so far it will be safe to say that hard-coded credentials vulnerabilities misconfigurations are everywhere and the internet is broken and the only way through which we can fix all the issues is by a collaboration of security researchers and engineers and in the end we encourage all the security researchers here to utilize the be visual CLI or oceans API in their daily daily enumeration process to power up their Recon so with that I would like to end this talk here and we are for any questions that you may have thank you [Applause] hi guys thanks for the great session here in front of me hi yeah hi so I just had one question like it's like kind of a

database from where we can extract uh information about the apps right so do we have any option where we can maybe go and debug live applications suppose some Android application is not listed in your database it's not present in the application or the tool you have prepared so is there any option for us that we can go and maybe debug that application at that time yeah so we have a website called babygel.com so you can go over there sign up and you can scan your application for free you can upload your application there okay so that's that the whole process is automated there yeah yeah all the decompilation and extraction of assets is all automated okay okay thank you so much uh

uh great talk first of all so I have one question so in the product life cycle uh usually there are multiple apps released as a part of a product lifecycle so bbgl tracks all those apps or like what's the development what's the update cycle of BBG we have a function of uh we have a feature called versioning of application it's not there on the plot right now it will be there but yeah we do track all the versions of our application so there is a hard coded key in a older version you will be able to find it will be able to see the report of each version separately thanks and yeah we came up with this idea when

we encountered that some of the applications were leaving leaking their API keys in version one and they removed them in the version too so that's when the idea is right yeah hi guys great presentation so we um what we visual is doing right extracting assets from the apps right so I just wanted to ask are we using hard-coded manual rejects patterns or are we implementing some kind of machine learning or something like that in it no it's we we are using rape crap so it takes a regex pattern and it identifies that pattern in the source code so it's just that no ml is there all right thanks