
all right hello everyone great to great to be here uh really excited, it will be my first time in Estonia and a place I've heard a lot of great things about so thank you for for having me yeah, so uh as I said my name is McKenzie Jackson a little bit about me I'm from Aoterao New Zealand originally but I Now I live in the Netherlands so it's a little bit closer um I work for a French company called Gitguardian You can find me anywhere on social media with the handle Advocate Mac and I'm also the host of the security repo podcast it's my mom's favorite podcast she recommends all of you guys to listen to it so you can check that out if you want to all QR codes in my presentation are scan at your own risk I'll promise you they won't do anything malicious but you never know all right so if we can just take a moment to pray to the demo Gods uh I was meant to do a lot of demos during this hopefully we have it sorted out but the demo Gods weren't on our sides this morning so hopefully everything will run smoothly right now but uh we will see as we go but in this presentation we're going to be talking about exploiting secrets and it's going to be a little bit of a of a blackhat mindset talking about this of understanding how attackers operate how they find and exploit these secrets so what am I talking about when I'm talking about secrets so most of you probably know this but when I'm saying secrets I'm generally talking about digital authentication credentials now typically these are things like your API keys or credential peers it could be uh a security certificate these are really the crown jewels of any organization because they authenticate Services, systems and sometimes people into different systems. Attackers are always after these secrets because enables them to do something that all attackers want to do Elevate their Privileges and persist their access so Secrets can be used as a way of gaining initial access if they find a secret publicly or they purchase one then they can use that to break into a network, they can use that to gain access to that system it can be that first foothold but it can also be used during an attack that they'll find along the way that they can use to then progress further dive deeper into these systems so attackers really really really like these secrets and one of the things that we we really need to understand about secrets is that these are made to be used programmatically so that means that they're not meant to be used by a human they're meant to be used by machines and services but that means that they're a high candidate for ending up in places that they shouldn't be because they end up in our source code and the reality is that secrets are absolutely everywhere if you know where to look look for them um so today we're going to go through really four uh four parts about different ways and these aren't the only ways to find Secrets but we're going to look at four different ways where you might be able to find secrets in different systems and how you might go about uh abusing them how you might go about finding them and then at the end we'll look at what we can do to prevent this so the first one I want to talk about is abusing the GitHub API so you guys probably are familiar with this website here GitHub this is the most popular code sharing platform in the world it has hundreds of millions of developers that that use it you probably use it what's interesting about GitHub is that even though your organization might not use GitHub the chances are the developers that work for you do of personally which means that your organization still has exposure to GitHub and there's still ways of finding sensitive information even if your organization doesn't use it at all so just some facts about uh some stats about GitHub last year so 2020 uh 2022 I forgot the year for a minute um uh there was over a billion commits that were made uh during the year so that's a billion commits that were made publicly this is only talking about public information with get you can have obviously private code and you can also have public code but publicly over a billion commits were uh made public last year throughout the year there was 85 million new public repositories so and about 94 million developers using it so what this shows us is it is an absolute fire hose of data and when there's this much data you're always going to be able to find really good information so as I said I work for a company called GE Guardian um and one of the things that we decided to do at gardian was launch a research project to find out how many many secrets we could potentially find on GitHub so what we did is we decided to scan every single public commit that was made on github.com I'll explain how we did that but we made we all those billion commits we scanned every single one of them in real time to try and find if there's any secrets uh available uh anyone here want to take a guess as to how many secrets we found uh on GitHub last year who thinks who thinks more than a million all right who thinks less than a million right two hands two Brave hands we found 10 million 10 million secrets that were leaked publicly I'm only talking about publicly on github.com uh and so a lot of people say okay but how do you know that they're real Secrets how do you know it's not a test credential how do you know it's not a kind of some generic highend string uh how can we be sure that you what we're finding is real well we can actually categorize these and we did a lot of post validation so in the categories that we had about 20% were for cloud providers So 20% 10 million some quick maths that's 2 million cloud provider keys that we found and one of the things about cloud provider Keys is we can validate these with their services themselves so we actually found 2 million valid cloud provider Keys uh on github.com again this is all just public and so you don't need to be very imaginative to know what can do with a cloud provider key you could do some crypto mining at the very least if that key is for your organization or something you can break in maybe steal some data maybe try and move into different systems we also found lots of other interesting things messaging systems things like slack tokens these are actually really interesting for an attacker because it means that if I have one of them even if I can only post a message I can post a message internally to your organization that looks a lot more real so it is a potentially a way that I could get a foothold into the organization and one of my favorites down the 3.88% of what we found were Version Control platforms so basically what this is saying is we've found access keys to your private git repository that you have put in your public git repository it's a bit like putting the password uh in the subject line of an email uh so there's lots of different things that we actually found in here and a lot of them are really really interesting and they're not always just for individuals a lot of the times these keys they're actually linked to organizations and it's because someone has either mistakenly uh contributed code to the wrong repository perhaps they're working on a personal project and not realized or perhaps there's something in a history that they've forgotten about when they've made something public so what has come specifics obviously there's thousands and thousands of credentials so we have a very long tale but some of the you know more interesting one Google API Keys is one of the most popular keys that we find including uh Google Cloud Keys is as well AWS Keys uh and Google oos token so a lot of these are really really quite uh sensitive but we find so many that one of the one of the the questions you might have is that how do you actually find these Keys how would an attacker be able to find these keys and use them so there is actually a couple of quite simple ways the first one I'll talk about isn't my favorite but it's the easiest so I'll talk about it and that's literally just using the GitHub search feature to try and find secrets in this so in this case we have uh we're looking for a file name called credentials and we're looking for anws access key this is just one example uh of this now the reason why this isn't really my favorite way of doing this is basically because when it comes to git when it comes to your source code what git does is it keeps track of everything in the history and that's actually where most of your secrets are going to be when you're using the search function it's only looking at the top layer and that's a drop in the bucket of all the source code that's actually available if you know how to look for it you're also going to find some real secrets in here but you're going to find huge amounts of false positives so you're going to spend all day doing that there are lots of different types of what we call Dos GitHub doing where you can copy these and try and find some interesting things but luckily there's a much easier way for attackers attackers don't have to go through all that effort they can do it much easier and they can use the GitHub API so the GitHub API is at this address up here API . github.com events you don't need any auth you don't need any authentication to view this anyone can and what this is is it's a real-time Ledger of everything that's happening publicly in github.com so how did we scan every single billion commits you may think maybe we have some kind of partnership with GitHub no we don't uh we just use the GitHub public API um now you may think that this is this could be an abuse of that system but it's not because there's so many services that do similar things using this API we're simply just using it for the same way now nef Furious actors are also using this so what we want to look at here is there's a couple of different events that get pushed on this API and the two that are most interesting are the public event so this event is when uh uh a private repository is turned public now why this is such an interesting event to look out for is that when you make something public you make all the history that have done public with it so if a developer two years ago working on a remote Branch committed a secret even just briefly that secret now exists and when you make the repository public you make that secret public um and the other one is the push event this is just when code is pushed to public GitHub so attackers are often looking into this and what you can do is what we've started doing is stting to purposly leak some secrets in the GitHub API to monitor what attackers are doing them and how long so it generally takes the first attacker less than a minute to be able to discover the secret that you push on GitHub and that's not because they're looking someone is looking at it this is a bot that's using probably some basic reg regular expression to try and identify specific Keys test if they work and then move forward later on with an attack so today I have a a workshop on creating honey tokens these are fake credentials and what we'll do in that Workshop is leak some of these on public GitHub and and see what Happ happens so it really takes about less than a minute for the first ones to actually start being discovered so attackers are really uh using this uh in in their method so let's talk about uh a scenario where that a public key has been leaked and was actually discovered so there's been quite a lot but last year we also had some news around a company called Toyota so small car manufacturing company um and Toyota have uh a mobile application called called t connect Now tconnect can do a lot of things including starting your car so it's quite important but Toyota actually uses a lot of contractors in their work so this is where things start to get interesting and it paints a story of how Keys May leak on GitHub even without your knowledge so in two in 2017 there was a contractor that was working on the tconnect mobile application with Toyota now that contractor accidentally pushed code to a public repository in December for 5 years that code set there publicly without anyone really noticing what was in those code was hardcoded access credentials that gave access to all the data of all the users that were using tconnect so uh eventually in 2022 uh a uh a a white hat hacker actually discovered it a security Reacher discovered it and then reported it to Toyota but that key was access and valid for 5 years on public GitHub so you can actually be certain that someone else had found that and was all and and starting to abuse it and toota can't comment on whether or not that actually happened because going through 5 years of logs to try and find malicious IPS is really quite a big challenge to do uh so this is an example of how even if you're not publicly using GitHub even if you're not thinking that hey I don't have any public repositories on GitHub I don't need to worry there's lots of ways that keys can still leak uh to this all right so that's kind of the first part looking at public code and trying to find secrets and exploit Secrets inside public code so the next part that I want to talk about is private code because this is far more interesting for an attacker if they can gain access to it with public we have to deal with huge amounts of data and we're going to find interesting things but part of the security of that is that there's so much for us to look at and go through that it's hard to identify what's actually of interest to us but if we can get access to the private source code well this is much more interesting so there's been lots of leaks that have happened private source code is really terrible at staying private and if we have a look we had some leaks like Microsoft had their source code leaked uh Nvidia Samsung and all of these companies I would consider have fantastic security posture I don't think that anyone could say that Microsoft uh has poor secur maybe you can I I wouldn't make that call um but so how did it how did how does this source code from these massive companies something that we'd probably consider to be a high value Target how did these leak well the lapsis group was really prolific in trying to discover uh source code on there and leaking it out and really one of the methods that they were doing is just buying access from employees when it comes to private source code so many people have access to it that if there are secrets buried in there and I can almost guarantee that there are then all of those people have access to those secrets and a malicious actor can either try and fish them try and purchase access or different or or do different things to discover it now one of the things we also did at get Guardian is we decided uh we have we scan private source code for secrets and so we had a look at how many we would find now this next slide involves some maths so forgive me but uh really uh in an average company with about 400 developers so it's not huge but it's also not small would typically find about a thousand unique Secrets each of those Secrets occurs about 13 times so we do maths 13,000 secrets we typically find now in a company with 400 Developers there probably going to be four appsec Engineers so each one of those appsec Engineers is going to have to then investigate that leak try and figure out what that secret is for revoke that secret if they need to issue a new secret and then redeploy all the code without creating any downtime and if we divide 13,000 by 4 we come to around about 4 thou 3,400 is secrets that those appsec Engineers have to do every single year so that would be their entire job so why is secrets and source code such a problem well Secrets often end up inside them and the problem is so large that it's almost impossible to solve so you do the next best thing which is ignore it and then when someone gains access to your source code you can end up in quite a bit of trouble so if we take a look at one of these actual leaks we'll have a look at the the the source code leak of twitch so twitch had all of their source code leaked uh which included 6,000 repositories and those 3 million documents we scanned this leak um it was uh involuntary open sourced let's say uh we scanned it and we found 6,600 Secrets uh that were inside that source code that may seem like a lot but actually that's pretty good the only if we only find 6,600 secrets in your source carde and you're that big of a company you're probably doing okay CU this is how big this problem actually is but in saying that we still found 194 AWS credentials 69 twilio Keys uh we even found four stripe keys inside their their their source code so this is a huge problem so what does this all mean well this means that attackers are really after your private source code if they can get access to your private source code then they can get access to a whole bunch of secrets that live inside there and they can potentially move into different systems and services right so how is private source code access then how do we actually access this private source code well there's a bunch of methods I've mentioned a couple of them that attackers are doing the first one is the least exciting um but definitely one that happens and that's purchasing access so I mentioned the lapsis group this is actually a group of teenage hackers that were mostly after clout and they've got access into as I said Microsoft Nvidia Samsung all these companies so how do they do it well if you look at their telegram Channel they're literally just posting advertisements saying that they will pay you if we can give us access to your network or to your source code so if you look at you know who has access to your source code and how many of them may be having a bad week that day then you can think of uh lots of ways that that people might find this attractive to Grant access to it we also have fishing fishing that's really targeted at developers here there was a campaign that was targeted at developers that were making Chrome extensions trying to get access to their source code so this is another way where developers are really being targeted in fishing campaigns to get access to it we also have exploiting misconfigurations so one common misconfiguration of git is that you accidentally expose your dogit directory this is a folder that has all the metadata of your source code in it uh and it you can recreate the entire history of that git project from that these are quite often discoverable in public places uh and the other one which is quite difficult but definitely happens are supply chain attacks so we've seen one a good example was code kov where the attackers were able to gain access into private source code repositories by turning this tool that sits in the cicd pipeline malicious I'll talk a little bit more about that but what I really want to talk about for this uh for this presentation is I want to dive in on exploiting the misconfigurations so I mentioned about finding those dogit directories so what actually what actually happens is this folder accidentally gets included u in when when code is is uh when applications are published and then that can be discoverable on networks so uh a great organization called cyber news did some wides scale scanning on this and they actually discovered that there was 2 million doget directories that were exposed publicly after doing some scanning so again this is a huge widespread problem so then you might have a look at okay how do I actually go about finding these dotg directories and actually starting to do them so I have my my first demo here we'll see if it uh if it works but this will be a video One um where we will take a quick look at at how we can actually start scanning some of these uh these systems so it's really quite quite simple how I like to do it is I use a system called SubFinder here we are to basic