← All talks

How We Got Into A Unicorn's Private Codebase

BSides Munich21:37299 viewsPublished 2022-05Watch on YouTube ↗
Speakers
Tags
About this talk
Researchers discovered hardcoded GitHub tokens and OAuth secrets in millions of mobile applications, gaining access to 159 private repositories across 13 organizations including a $120M unicorn startup. The talk covers the mechanics of extracting secrets from decompiled APKs, the scope and impact of leaked tokens, and empirical findings from analyzing over one million mobile apps. It examines why developers hardcode sensitive credentials and proposes detection and prevention strategies.
Show original YouTube description
Reading a $120 million worthy unicorn startup’s private codebase on a fine Sunday afternoon while sipping on coffee would be an exciting experience, wouldn’t it? It’s not a very uncommon phenomenon. Only last year, Twitch was added to the long line of organizations whose source code has been made public inadvertently. Twitch’s source code, which includes 6,000 internal Git repositories and 3,000,000 documents with a combined unzipped size of 200GB, was exposed to the 4chan forum. Here’s how we got access to 159 private codebases of 13 organizations ranging from small-scale startups to leading unicorns. First, we examined an unending number of GitHub tokens and secrets being hardcoded into mobile applications by decompiling multiple apps and found that the leading use case was hardcoding GitHub tokens in Git URLs to import packages. In that giant pool of tokens, we also discovered hardcoded client secrets and IDs for Oauth applications. With these OAuth tokens, someone can make an app of their own to impersonate a legitimate organization which they can use to steal sensitive data from customers of that organization. We could clone private repositories and gain full access to confidential source code. Moreover, with a hardcoded Personal Access Token (PAT), the attacker could masquerade as the person whose PAT has been leaked, therefore accessing private repositories, making commits issues, pulling requests as the person whose PAT has been revealed, and much more. Scary, right? It’s not unusual for developers to hardcode sensitive information in their source code and then submit it to popular code-sharing platforms like GitHub. According to GitGuardian, 6 million hardcoded secrets will be discovered in 2021, with India being the leading source of leaks and an increase of 2 times compared to 2020. Considering that GitGuardian’s studies are limited to only public repositories hosted on GitHub or GitLab and not secrets being committed in private repositories or self-hosted git clients it is astounding that no directed research has been conducted to reveal the different use cases of these hardcoded tokens apart from version-control platforms. Hence, in our study, we directly explore the source code of millions of mobile apps and find out these instances of leaks directly. In this talk, we plan on investigating the causes, impacts, and techniques that can be used to prevent such leaks. Further, we’ll be giving you a sneak peek into some of our interesting findings. Speakers Ashikka Gupta Ashikka is a junior year student at VIT Vellore. She takes a keen interest in anything cybersecurity-related. She is currently working as a Security Research + Technical Writing Intern at CloudSEK, India where she heads the content department for BeVigil, the world’s first security search engine. In her free time, she enjoys reading, hiking, and building cool projects. Arshit Jain
Show transcript [en]

so i'll begin uh hi everyone here uh uh here we are to present our talk titled how we got into a unicorn's private code base through analyzing millions of mobile apps moving on to the next slide i am ashika i am a security research and technical writing intern at cloudsec

i should i believe there are mute

[Music]

hello everyone uh my name is rashid jain and i'm working as a full stack engineer at cloudsick so i love to automate anything and everything related to security apart from that i love to travel and read books hi everyone i am manan and i work as a cyber security analyst intern at classic and i love to play the drums and play football cool so let's move on yeah so the agenda of our talk our talk is basically divided into three parts in the first part we are going to explore uh what exactly happened with the unicorn that we're talking about in the second part we will talk about how we analyze and find vulnerabilities in

mobile apps and in the end part three we are going to talk about a large scale study we did of over one million percent and the impact so moving on let's begin with part one [Music] the next slide please yeah so source code leaks have been in the talk in the news for many years now and the biggest one that comes of the top of my head is the recent twitch leak that happened in 2021 where we saw that malicious actors posted on the 4chan forum over 6000 internal git repositories of twitch which contained 200 gb worth of data and 3 million documents so just by the scale of these numbers you can understand how big this leak was

and earlier this year we did a little something of our own which we'd like to share with you guys so moving on let me present to you the big fish so it was a casual work afternoon for us we were just uh doing a normal thing analyzing top apps and finding security vulnerabilities in them and reporting them to the uh organizations so what we found was really interesting we found that there was this app which had 10 million plus downloads in play store of a 120 million worth of unicorn that had a really big security vulnerability with which we could uh basically view their entire source code and do a lot more a lot more things with it which we'll

tell you in in the coming slides moving on so let's explore what exactly happened so what happened was that uh while exploring the uh the code base of this app uh we were analyzing it on uh the search engine that we built and what we saw was that there was something called as a guitar person access token which was hard coded into the app's android bundle file and it was there for everyone to see and do anything with it so that's not exactly desirable especially not from a unicorn a lot of questions might be arising in your head people might be asking what is a guitar personal access token and what can a person do with it so we'll explore

that in the next slide a github personal access token is basically an alternative uh to using passwords for authentication in github using the github api or the command line so uh with this token basically uh you you get a lot of privileges and we ran this query as you can see in the screenshot on this uh on the screen what happened was that we passed the github token that we found in the previous slide and we wanted to explore what scopes uh is this guitar personal access token giving us and you can see the output in the screenshot we could see that we're getting admin privileges uh we we could delete the repo we could

delete packages in the repo we could change the workflows we could write packages we had the reposco we had the user scope so out of all of these scopes the most important is the scope repo which gave us full access to the private repositories so uh just to give you an idea of what we could do with this scope was uh basically we could uh commit changes to the organization's github repo and push them we could also invite other collaborators and ask them to do the same we could mess around with their deployment workflows we could delete the repo basically in the end to sum it up we could do anything with the repos with

the scope with just the scope and that's a a pretty big deal now moving on to the next slide uh after finding out that we could get access to the private repositories we wanted to see how many private repositories can we actually get access to so we ran this query uh that is shown in the screenshot we passed the token against this uh endpoint and we were able to find that we could access the organization's all 26 private repository and that's the impact was pretty nasty so moving on uh to the next slide uh and this is what the private uh repositories urls look like so you could conclude that we could get access to the

unicorns ios apps their apis and their normal android mobile apps finally this is how we got access to the unicorn's code base so after finding this issue we immediately reported to the company and they acknowledged the issue and fix the bug and the token is now no more there moving on so let's count the mistakes so how are we able to do this entire thing there were two major mistakes that were on the developer's part first mistake was hard coding the github packet path token in the source code in the first place so once we found that hard-coded github pad token we could clone their repos on a system and just view that code base so till now we are

just able to view the code base uh but where things went really south was mistake number two uh with which like they gave excessive scope to that token and anybody could use it for exploitation like anybody could commit changes to their private code base anybody could invite other collaborators and team like change the entire organization structure and whatnot so these were two really big mistakes that they made on that part now uh the talk is titled how we analyzed uh how we got access to a unicorn's private code base by analyzing millions of apps so surely millions of and analyzing millions of ass apps is like basically not it's something a person cannot do manually

it's a herculean task manually so we automated this process and we did a lot of analysis on our heart and i'd like to give it over to arshad my teammate who's going to explain about this part a little more in detail

why should i believe you're still

uh hello you can hear me [Music] so uh till this point like many questions would be arising that how did you start scanning the code base of this app how did you uh reach to the secret at the first phase first place so uh let me tell you a story uh one day our security team out of curiosity were doing a study on secrets that can be found on github public reports and to our surprise we found like there were many secrets that people have just hard coded in the public airport like stripe keys razer pickies aws credentials and assets like firebase urls an aws cognitor url so this was like a big mistake that the

developer have done on public reports so then we thought uh these are public reports which anyone has access to so what about the source code that no one has access to like mobile apps so then after doing some initial research uh we did not find any good tool that can just like tell us about a secrets that are being leaked across the mobile app so then for analyzing the mobile apps we build our own security searches so here are the steps that we followed to build this mobile app scan so first of all we collected the data second step was we decompiled the app that we collected into readable source code third was we created a set

of rules of rejects so to identify the hard coded tokens and the third was to build a interface where we can just do a rejects or like keyword search and it will return you all the results on all the source code that we have collected so let's deep dive into all the steps that we followed to build the system so the first step was collector of mobile apps we thought can we build a system in in which people can just come upload their mobile apps and they can get their security reports and just know the secrets and issues in their mobile app so user submission was the first source that we collected the data from the second store source was we

collected uh the android apps across the internet so that included uh apps that were being downloaded from play store and as well as third-party app stores so while we were crawling the play stores we faced many difficulties like sometimes we were not able to uh download apps from certain countries or sometimes they were refused like some apps were not compatible with certain devices but we were able to overcome all those uh problems and the reason we downloaded the apps from third party app stores was because we wanted to analyze uh the fake apps that have been wandering around on these third-party app stores because these apps on third-party app stores contains malwares or sometimes people have just tampered the

certificate and uploaded the apps in the system so currently we have about a million app that has been indexed in our system so after collecting the mobile app we thought uh let's like decompile those apps uh into a readable source code so to do that we used an open source android tool called zx so zx it helps you to decompile delve by code into java classes from uh apk and dex files it also helps you to decode android dot manifest dot android manifest xml file so after decompile all these apps we thought let's we stored all these apps into a file system so while we were storing these apps on file system uh these uh apps

size is like uh apps get increased uh due to uh like many things that are there in these apps so we did some optimization on that pass apart as well so after uh decompilation we thought uh let's uh how do we find the secrets on these uh source code uh that we have so for that we started building uh reject says uh that can be found uh on on the source code so for building those rejections we thought let's like maybe analyze uh some mobile apps and see what kind of uh keys usually do developer uses a developer use in the mobile apps so after this we started building the jigsaws and there were many difficulties that we faced

while building those rejections like sometimes the length of the key was not fixed other time we were not able to even find the rejections for it so uh for those for which we did not find the rejections for we went to open source repositories like iphone and truffle after we had all the digixes in in a place uh we took all those rejections or we took all the keys and just put up all the keys in an android file uh so apart from putting it uh specifically in a class we put it all the keys in uh android dot manifest file java classes and then then after putting all those keys we com compile the android app and

uh we put the android app through our system so that we can see whether we are able to detect the particular key or not using our rejections and that that also help us minimizing the false quality so after collecting the apps decompiling it and building the rejects rejects for all the keys we went and we automated the process so we wanted to provide an interface where people can come just do a keyword or reject search and it will just list out all the results uh that we found using our source code so and this is how we were able to reach to the unicorns uh github token using the rejects that we built for get back token

so this is the one you this is only the one use case that we found uh where using a github token you were able to reach unicorn's private uh code base so imagine what can happen with all other keys that are hardcoded in the source code so let me call mana on state he'll uh give you the uh insights of impact of these secret leaks and why do developers do such mistakes

well let's take a deep dive into github pids so we were able to find 159 private repositories from 151 github tokens that we found by analyzing mobile apps that had installs ranging from 100 to 10 million on the play store this may lead to leaking or secrets like database configurations which in turn lead to decline in brand confidence and imminent financial losses by the way all organizations involved with the upper leagues were informed of the same and corrective measures have been taken by them well we can like but numbers definitely won't till now we've been able to find over 1.6 million hardcoded sensitive tokens as you can see in the pie chart some of the prominent ones are firebase

and aws readable writable buckets dropbox api keys facebook secrets etc now that we materialize things a bit by seeing the enormous number of leads and move ahead of the abstract part let's now take a delve into the impact of these keys and what malicious actors can do with them the best way to make you all feel the intensity would be able uh would be to take you through some examples well let's take up email automation first there are email automation services like sendgrid made gym mail gun etc let's analyze what happens when the secret token for one of these services gets leaked the attacker gets access to read and send emails from all the accounts

associated with that particular key using this key the attacker might be able to start a phishing campaign from the official mailing channels of that organization which which generally would be trusted by the customers or the consumer also the attacker might be able to gain sensitive and personally identifiable information like names emails contact numbers etc of those customers similarly if the attacker gets access to tokens for services like razer bay stripe etc which are payment processing tools they might be able to do very nasty stuff that might tarnish the reputation of the organization and further cause financial damage as well other than that they might be able to gain access to transaction details of the customer and organization like

credit and debit card information well now i think all of you would have gotten a gist of how frightful the consequences of such leaks are therefore it would be a good idea to talk about why developers do this well to be able to answer this question in a succinct manner we'll have to look at some common problems faced by android app developers we all know how much of a pain it is to focus on building security and crcd pipelines instead of building the actual application and have divert efforts other than that there is an issue of awareness many developers think that it is okay for them to leave hard-coded tokens in their source code as it is

gonna be compiled into an apk before publishing but as a cyber security community it is our job to make them aware that such is not the case also companies can sometimes feel that money is better spent in other domains rather than spending it on proper security testing on their mobile apps one example would be an app called clubhouse which became very popular but it didn't implement end-to-end encryption on their rtc packets thus any attacker could perform a man in the middle attack and snoop on private conversation well now that we've discussed the problem let's go through some of the solutions for the developers scoping is the most primitive method of stepping up the security of your app you

can do this by assigning the secret keys only the necessary permissions so that even if the attackers get access to that key the damage is controlled and limited using of environment variables to store keys in another solution so that they are not hard-coded in the code but rather embedded into the operating system and relatively out of the reach from the attackers other than that making use of git hooks like husky to prevent yourself or anyone in your team to even push secrets by mistake also you can make use of walls like hashicorp world to safely store all of your secrets most important of all is building a very robust security pipeline so that your application is secure from development

till the time you publish it now i would like to discuss some of our future plans till now our research was more centered towards android application but in the future we aim to expand our scope to include client-side javascript and ios applications as well lastly i would like to mention a tool which is the visual ocean cli developed by us so that the community can leverage the asset data extracted after analyzing millions of mobile apps if you want to try out for yourself and see the security score of any mobile app currently installed on your phone we would encourage you to go to bivigill.com and search for the app there as you all know that the first steps

towards change is awareness with with that thought we invite any questions that the audience might have

[Applause]