← All talks

BG - Oops, I Leaked It Again - How we found PII in exposed RDS Snapshots

BSides Las Vegas34:0053 viewsPublished 2023-10Watch on YouTube ↗
About this talk
Breaking Ground, 18:00 Wednesday The Amazon Relational Database Service (Amazon RDS) is a Platform-as-a-Service (PaaS) that provides a database platform based on a few optional engines (e.g., MySQL, PostgreSQL, etc.). A Public RDS snapshot is a useful feature that allows a user to share public data or a template database to an application, but when wrongly used, may accidentally leak sensitive data to the world, even when using highly secure network configuration. We at Mitiga, discovered hundreds of databases being exposed monthly, with extensive Personally Identifiable Information (PII) leakage. In this talk we cover the main aspects of RDS snapshots and how easy it is to accidentally expose sensitive data widely to the world. Our research process is based on extensive investigation of the RDS service, its configurations, and limitations. In the session the participants will get relevant knowledge about RDS snapshots, including real-life examples of the risk of using this service, and recommendation of how to prevent, detect and remediate the risk of accidentally sharing RDS snapshots publicly. We will share an in-depth description of our automated process, which includes procedures to constantly monitor for public snapshots, and remove any if found. Ariel Szarf, Doron Karmi
Show transcript [en]

hello everyone welcome to breaking ground today we have Doran and Ariel who will be speaking about how they found pii in public RDS snapshots in their talk titled oops I leaked it again before we get started we want to thank some sponsors without them the help of the sponsors donors volunteers this would not be possible the located out in the vendor area but just to list a few we've got conductor One Toyota Plex track be sure to St by their booths a quick note on cell phones please make sure that they are silent and not interrupting and with that I will hand it over thank you to Doran and Ariel can you hear me right cool uh so thank you for coming let's survivors of U this conference uh today we're going to talk about how we found pii in public RDS snapshot and as we called it oops I licked it again so this was Britany and This Is Us my name is donon and this is my colleague Ariel uh we are both uh security Cloud researchers at mitiga and today we are going to talk as the name of the presentation suggested about how we found pii in RDS snapshot we are going to talk a little bit about what is RDS what is RDS what is is uh is the service what is the concept of RDS snapshot and specifically public snapshot we are going to talk about what is the problem with having public uh snapshot over there and also show you how attackers can exploit it we're going to talk about what we did in our research how we mimic an attacker technique and did it at scale and also share with you some cool examples of real cases of real DBS that we found out there with the real information and at the end we are going to share some uh ways to detect and mitigate this uh this risk but before we start we want to show you some of these cool cases that I was talking about so these are some databases that we found out there and there shouldn't be public so for example here we can see a table of one of those of those databases that includes a username user password email the gender the phone number of the user the merital status a token of password ID number and more and more and more and more yeah so let's start what is RDS Amazon relational database Service as known as RDS is a platform as a service that simplifies database Management in the cloud there are many great features for example easy database management this service automates timec consuming administrative tasks such as software patching hardware provisioning and more all of these allowing companies to focus on application development another example is high availability by replicating databases across multiple availability zones this ensures that that your applications deal with infrastructure failures without downtime this service was launched in 09 and N9 years later in 2018 stock overflow published an article about the incredible growth of Amazon RDS actually in these days this service is widespread in the cloud what is RDS snapshot so Amazon RDS snapshot is a point in time copy of an Amazon RDS database database and the snapshots are stored in Amazon S3 bucket snapshot can be taken automatically like every hour every day and so on Snapshot also can be taken manually by click RDS snapshot actually back UPS the entire DB instance it's not like select queries on all the tables R snapshot also contain the metadata and you can restore the DB using RDS snapshot and also RDS snapshot is a resource that you can share you can share it inside your account outside your account and even publicly why to share snapshot publicly maybe because you want to share public data maybe because you want to share a template DB to an application or maybe you just want to share a snapshot with someone without dealing with roles and policies it's so hard sometimes so public AR snapshot is a great feature what can go wrong as all of us already know of course databases can contain sensitive data sensitive data can be personal identifiable information is known as pii if if threat actor gets their hands on this type of data it can be a disaster to your organization they can publish it they can blackmail you using this data and so on sensitive data also can be Secrets secret can be password token XS key and so on with this type of data theor can exploit your environment based on this public snapshot with sensitive data to public even for just a few minutes it's a really bad practice now we understand it but we don't feel it think about that how easily you can imagine someone in your workplace who publish a snapshot publicly for a few minutes it might be said what's the worst that can happen and even may think it's not an issue they need to report and even after that even even after you publish snapshot to public if you want to investigate what exactly happened when the snapshot was public there is a major lack of visibility in AWS cloud trail that D will describe later in this presentation to you thanks so now we know what is what is RDS what is RDS snapshot but let's see how it can be exploited by attackers here you can see an illustration of attacker looking for sensitive data but really adversaries can easily uh clone publicly RDA snapshot the only thing they need to do is to use two API calls that leaves no forensic traces you won't be able to see it in the logs but they actually clone the uh the snapshot also think about it that traditional scans like scans for open ports or vulnerabilities will allow the attacker to understand uh some information about your organization but not actual access to the data with public Rd snapshot if for example you expose it by mistake and thearer was able to scan your premises and understand that there is public Rd snapshot at this time they have actual data to the actual access to the data itself let's see a demonstration how easy how easy it to to do so the only thing you need to do is to use the described D snapshot API call to include uh the flag include public and you will get an output of all the public RDS snapshot in the specific region this is the information all of the snapshot that are there now the only thing that they would need to do is to copy one of the DB snapshot identifier as they wish and this is a unique identifier for hdb snapshot and to pass it as the input of the next uh API call which is copy DB snapshot here you need to mention the Deb Target DB snapshot free Britney and the region and few moments later you will have a clone of this DB snapshot in your environment here you can see some information about the newly created DB snapshot in your environment you can see the engine which is my SQL you can see the master username which is Root in this case but sometimes it could be indicative of the organization that this uh snapshot is belonged to another thing that is important to know that it doesn't matter what the owner of of the snapshot or the owner of the instance the database itself do with uh the resource now for example they delete it it doesn't affect the snapshot you now own the data you have the data in your organization so what we try to do we try to mimic what an attacker does but we try we try to do it in at scale our hypothesis for this re research was that attackers can scan H the AWS premises and clone those uh snapshot that were exposed only for a few minutes so that's what we did we buil an AWS native Technique we use AWS Lambda function step function and B of treat for the automation of the API calls and uh we created uh this bot that runs H every hour and looks for the new newly created RDS snapshot the overall the high the high level Pro the high level goal of this H bot was to scan and clone those newly created snapshot and extract the data automatically so let's focus on those processes this is the overall process on the left hand side you can see the process that runs every hour it's an hourly scan and it's responsible to scan and clone new snapshot the on the right side you can see the process that runs every six hours and this is like the offline process and it's responsible to go through all the snapshots that we have copied H to prepare them to prepare the to create the instance out of the snapshot and to extract the data for manual an analysis later let's talk a little bit about the first one so this is an hourly scan as we said we run an hourly scan for all the snapshot that were created in all the available regions which is most of the region except of four regions and for this we use the described DB snapshot API call then we iterate through all those regions and we clone uh the snap the newly created snapshot in the last hour since the last run to our premises to our AWS account at this stage we also maintain a state file that include all the snapshot that we have cloned so we make sure we don't clone the same snapshot in the next run for this we use the copy DB snapshot API column this is an example of the function that we use this is actually the function that looks for the newly created uh DB snapshots the second process now at this point we have the DB snapshot the newly created DB snapshot in our AWS account so we don't need to run this process every hour for example we run it as less frequent we run it every six hours and what we do in this process is first of all we make a list of all the newly created snapshot that we don't have databases for them then we again iterate through all the regions and the that available we get the unique Arn for each snapshot that we later pass to create the instance in order to create the DB instance we use the restore DB instance from DB snapshot it's another API call once we have the DB instance ready to to work with we reset the master password and otherwise we cannot access the data inside the the DB itself then we move to deal with the data itself this step is the analyze and extract we automatically extract the DB schema which are the table some information about the tables themselves in the in the database uh it includes and also the DB content we take this content and the information about the the DB schema and we store it h at S3 and of course we later on delete the and when we got the the data itself we delete to cut delete the database to cut charget let's talk a little bit about how what we do with the data itself so we created automated process that helps us to highlight highlight tables that contains in high probability pii what we do in this stage as we said we extract the table name which could be indicative of what there is inside the table the table schema which is the name of The Columns of this table that could be could be also indicative and first 10,000 rows of each table we save everything in S3 as CSV and then we use pice Park in order to slice and dice the data before we we move on and we select those candidates that we would like to manually uh analyze we do another step to reduce the number of candidates we filter for only the tables that are nonempty which has some at least some rows and we search the column we search the column names of the of those tables against list of indicative keys this is the an example of what keys we were looking for so these are pii related Keys it includes pass password phone account IP address document secret and so on now we are going to show you some cool examples that we found in the wild cool now let's talk about our findings before we start I want to Define our research time frame our research was conducted over 30 days from middle of September 22 to middle of October 22 from now I'm I'm going to call this time frame our research month to begin I want to share with you three nice examples about public a snapshots we found the first example is a snapshot that that was exposed all the research month the DB was created in March 22 the snapshot was taken in August 22 and this DB look looks like car agency DB this table for example looks like carrental orders table each row is a is an order and as you can see each row contains full name phone email car model date sales consultant name and the occasion for example father birthday marriage festival and so on the second example is a snapshot that was exposed for less than four hours just for a few minutes or hours what what the worst that can happen this DB looks like dating up DB the DB was created in April 16 the snapshot was taken in October 22 more than six years later this table for example looks like the user's table each Ro is a user that contain the name password email gender birthday ethnicity link to an image user description and more and more another table in this DB for example contains the private messages now just take few seconds to imagine what could happen if this snapshot got into the wrong hands the third example I want to show you is an example with technical data this snapshot was exposed all the research month the DB was created in July 15 and the snapshot was taken in September 22 more than seven years later this DB looks like mobile phone apps company DB this table for example is the devices table each row is a device that contains the device ID that actually Mech address user ID that actually email the device model the app ID that was installed on this device and the exess token with this data Thor can impersonate a user of course so now you can say okay you search in all of the regions for entire month and you found three nice examples it's pretty bad but it's not a phenomenon so let's talk about it let's talk about how prevalent is this issue in our research month we saw approximately 2,800 public RDS snapshots in this graph you can see how many public Rd snapshots we saw per region the most common region of course is us us East one because this is the default region but also you can see here that this phenomenon appears in all of the regions in this graph you can see how many public Rd snapshots we saw per DB engine we saw postgress Oracle SQL server and of course the the most common DB engine we saw is MySQL and it's not surprising because of the popularity of this engine but now we can say maybe all of these public snapshots are supposed to be public so let's talk about that in our research we try to to deal with this issue and we try to think how we can clean the data in order to do that we applied two filters the first filter was we filtered out all the all the public RDS snapshots that were published by accounts that publish a lot of Rd snapshots think about that if an account publish a snapshot for example every week it might be part of their product or part of their workflow so it might be not interesting in this filter we actually filtered out approximately 2,000 of public snapshots a lot the second filter we we did is filter out all the public snapshots with boring keyword in their name boring keyword can be test template public and so on and this filter we actually filtered out just 70 public AA snapshots not much now we had 650 public a snapshots and these snapshots is our potential to contain sensitive data from now I'm going to call these snapshots interesting snapshots now I want to share with you some insights based on the metadata of the interest this these interesting snapshots this graph shows how many interesting snapshots we saw every day we can see a change of course but also we can see here that this phenomenon a stable phenomenon we didn't catch a unique Peak or something like that this graph shows how many snapshots were public for each number of exposed days from 1 to 30 as you can see public CDs snapshots that were exposed more than two days and less than 30 days are anomalies in the right side you can see approximately half of the interesting snapshots that were exposed all the research month month it means maybe they supposed to be public maybe someone published them and then forgot about them in the left side of the graph you can see the other half of the snapshots of the interesting snapshots that were exposed just for one or two days just for a few hours it means maybe someone published them by mistake maybe someone just H just want to share them for a few hours with someone in all of this graph there is another case that the publisher of the of the snapshot is threat actor if the if this threat actor Tred to be discret as possible he um he published for a few hours and if not they published for a long time this is my favorite graph every snapshot of course was taken from an RDS DB in this graph we can see how many DBS were cre created each month most of the Deb were created in September 22 or October 22 and it makes sense this is our research month but also we can see that the number of the debes that were created before of that are more than a few is more than a few why is it interesting let's think about that together let's take for example a DB that were was created in 2015 if this DB was created seven or eight years ago and this DB is still in use that seven years later it's still relevant and and an admin took a snapshot from this DB and publish it the probability that this snapshot contain sensitive data is higher than snapshot that based on a Deb that was created few months before now let's talk about our insights based on the content of the the interesting snapshots as D described earlier from the snapshots we extracted the the data and to CSV files and we stored it to S3 bucket after we did that we built as don't describe we built a list of interesting keyword to search in column names here you can see a sample of that secret billing IP phone token and so on what we actually did is to search these keywords in the column names just in non-empty tables just in tables with data here you can see the the matches to this keyword actually we found a lot of matches all in all we found approximately 5,800 columns with an interesting keyword in their name and with data when we reduced it to different Rd snapshots we found 171 public RDS snapshots that in high probability contain sensitive data just in one month now we can agree that this issue is a prevalent issue thank you so asiel said now we can agree that there is a risk there uh even even though you don't know uh you didn't know that this could be a risk that publishing RDS snapshot even for five or 10 minutes could expose the data out there now we understand the risk and you might ask yourself how what what can you do as an organization to detect this issue or to mitigate this this risk so as I say you might ask yourself how can I know if someone for example copied my public snapshot sounds pretty straightforward right so you can't during our research we were surprised to understand that there are no logs about RDS snapshot if they are public for example if someone try to touch your public snapshot for example copy it or create an inst or create an instance out of it you will will not be able to detect it it's there is no log records about it in cloud trail and this was even more surprising because we know other services in AWS for example in ec2 if you create an EBS which is uh the dis you attach to N2 and you publish a snapshot of the CBS publicly and someone for example copied this snapshot or create a dis out of it you will get a log entry about it you will know that someone from a third party account is trying to do something with your your EBS and you will be able to know if this account is related to your organization or not and if not it's probably an attacker but in this case there is there are no log records which means you are completely blind once you uh publish it either mistakenly or not but there are some things that you can still do in order to detect some some actions around uh snapshot that went public so we divided it into two sections first the first one is current states to understand if right now you have public snapshot and the second one is historical historical check let's talk about the current state what you can do so first of all you can use the AWS API like the attacker did what you can do is to use two API call the first one is describe DB