
any questions so um i'm pretty interested you know by by this talk um i know you know 100 secrets is very complicated and doing it wrong you know i can have you know tremendous you know consequences you know for an organization so i can't wait you know to see what you brought us today at them so the floor is yours cool thanks very much alejandro uh so this is attacking secrets and cloud-based applications hi everybody my name is arjun sudkov i'm a security engineer at skyscanner and this is my second time at b-sides happy to be here again i've been spending the last 20 years almost all my life hacking to learn it never gets old i do freelance
security audits i have certifications and hobbies and then software defined radio is a really cool thing that i'm getting into it today the agenda looks like this first i want to discuss secrets what they look like and what they are then we'll take the journey to the cloud and finally see how we can attack and defend secrets so for the purposes of this a secret will be defined as a piece of confidential information that is used for programmatic access so for instance if you have login credentials to an email account right and you just use them through the browser that's not programmatic but if you were to write a script and put those credentials in the script
then that that means you're using it in a programmatic way and so would fall in scope of this plain text secret that term refers to the unencrypted initial form of the secret itself so if you had a base64 encoded password string right you would say that you obtain a plaintext secret by base64 decoding that stream a hard-coded secret refers to the fact that the plaintext secret is written down directly in a file cloud is managed infrastructure with a shared ownership model so it's like a huge data center that you're responsible for some stuff but the cloud provider is responsible for some other stuff the coupling is a very important concept that comes into play when we talk about
the cloud it's not specifically new but it's definitely a central theme and what it refers to is the separation of a single system into operationally independent components and we'll see how that plays out throughout the entire presentation infrastructures code is a sort of relatively new thing that came in with the cloud and machine readable files used for configuring the cloud and so just think of it as like a config file that you submit to your provider and that config file tells the provider how to deploy infrastructure what resources it needs and so on and then roles and policies again not something new but made an appearance in a big way in the cloud because these are now the
abstractions that you as the client or is the user defined to specify authentication and authorization controls um and finally the pipeline once again not something new uh refers to an automated sequence of steps for building testing and deploying applications and when we talk about the cloud is pretty much a basic concept uh because you want to be able to deploy your code in a repeatable and automated way now so the journey looks approximately like this and we start off with one host per application this was a little bit back in the day where you have your php code with apache server and mysql running all of the same place ftp your files over there right and your
servers will of course be on port 80 if everything hard coded in code and config files that was all fine so then that sort of shifted into separating the early signs of decoupling so to speak having one host with your a web server then having another host of your database server that quickly sort of shifted into using containers because of course having multiple hosts is very expensive so you'd have just the one hose but with isolated containers which could be linked together and networked together and interoperable and this was good and then sort of scaling that we saw data centers which were the big places with people who would be managing hardware and you know the network behind this hardware
linking it all together and you could rent some space in there and deploy your application there but the problem was that the scaling was manual if you had a lot of traffic and your resource you were just handling it well you needed to go in and make some decisions provision more uh resources to handle that traffic on the other hand if you have a lot of resources and then you have very little traffic effectively you're paying a lot for an infrastructure that you're not using and so that's not good and the cloud emerged out of that and apart from just managing the hardware the cloud was now managing the hardware and some of the software
and it provided some really cool features like auto scaling because docker wasn't built to scale and cloud providers they designed these solutions that allowed you to scale it i think this was a real game changer overall there was you know a secret manager as a product so there's all these different products that you could use and there was this infrastructure scope syntax that you could use to define which resources you needed and of course roles and policies made an appearance because that's you know they are the links between the different components define uh how they should interoperate and then finally that you know today we see kubernetes as a really big thing trending and so
one way to look at it is kubernetes first of all it's open source and has all these built-in features it's no longer a provider that and like features you pay for or products you pay for it's already built in kubernetes already knows how to handle auto scaling auto balancing it knows how to roll out roll back things it can do software updates at scale it has a secret manager and this is all built in like uh you know all you have to do is just define that you want to use it how you want to use it and what that leads into the tendency is application centric what that means is that uh we're trying to get developers uh to
write code and not worry about the infrastructure setting it up or managing all they have to do now is just provide this additional configuration file infrastructures code that tells the cloud provider or kubernetes what resources are needed to support the application the code itself and when we zoom in on that the applications themselves the application code and how it works is also decoupled into several components and typically there are three of them one of them being the view which is generally like the front end it could be you know graphical it could be programmatic uh this is what your users sort of see and access and then you have controllers which is like the back end and this is your
business logic it knows how to handle requests it knows how to handle data what to do with it and so on and then there's data which could be a database could be a cache but it provides some sort of model a structured model for data that the application expects to use and manipulate and then another dimension that that i want to make a point about is secrets and i think this is another damage that can be completely decoupled from applications and it's important because really secrets are the ident they are they're always associated with identity and with some permissions this is a security area and for the purposes of this presentation i'm going to be using this
icon to indicate a secret now i could really find it like a good icon to indicate a secret so i use this one to indicate the moment you know you disclose some secret information your heart skips a bit so yeah and what you can see from here is that really there will be secrets always associated between uh associated with connections between the decoupled components and so they're like sort of a glue or a necessary part of a glue that keeps everything ticking together and we'll see how that plays out now to get all of that deployed into the cloud as i mentioned we use a pipeline pipeline is a series of automated steps and this here indicates that the process
is continuous it means that developers keep on writing code pipeline keeps on doing stuff with the code and then feeding back some information to the developer whether it succeeded whether it failed whether it was able to deploy and so on and this process can keep on going it's repeatable and automated and the general sort of steps that happen first is downloading the code once the code repository receives some code it will use something like a web hook to notify the pipeline that there's new code the pipeline will use a token to access the repository to pull the code download it and it will install requirements and this can involve uh using internal package repositories and to use them you
will always need some kind of token or secret to access that then testing generally uh the tests the testing phase consists of linting which refers to code style so it's like think about single quotes versus double both how many spaces for a tab so things like these unit tests they refer to testing the application code function by function and then finally you can also add a security test and here you can throw in your automation tooling and as you will see it's it's very useful and at sky scanner we have worked on this problem and towards the end we'll see what things you could use for this so then if the tests went well the pipeline will start building uh the
application image and when it builds it it will publish it into an image repository which again people use some kind of secret or a policy associated with accessing that image repository and finally deploy the infrastructure and so it will use once again some kind of secret to send to the cloud provider the infrastructure is code configuration that will reference this newly published image and so the cloud will deploy all of that and then the pipeline will be sending notifications back to the developer again some kind of web hook probably to let the developer know let's see what's going on uh just to give you an example a pipeline configuration is just like a file it can be one or maybe multiple
files per project per code repository but generally they're still in the code repository here this one here is from github actions in steps you can see the steps that i was describing installing linking and so on they are sequential so they always wait for the previous step to finish before starting the next one and here at the end this step needs a couple of secrets and what you can see here is that the pipeline exposes the secrets as environment variables and through these secret placeholders so it's not the real secret it's just like a username for a secret and this moment right here is key to decoupling secrets from the application code and we'll see
how that plays out to put it all all together end-to-end this is the diagram uh there's a lot of information on it so just work through it with me right here in the middle we have the pipeline as i was just describing it then here on the left we have the developers that are going going to be writing some code and they're going to be using some kind of token to send that code to the repository then we saw what the pipeline will do and so once it deploys the app the application itself in the backend may have multiple secrets for accessing databases external service providers some kind of integrations what have you in addition the graphical user interface
may have multiple secrets if you're using client-side widgets like a map or a customer service or some kind of contact bought they will all probably use some kind of token and this token even though it's obviously public being in the gui it still needs to be restricted now your users they're going to be accessing the application through the graphical interface and again they can authenticate themselves in the system using some kind of secret username and password typically and then you might also have some kind of client services that are consuming data from you or pushing data to you and they will also use some kind of token to authenticate them and so in all of this you can see that
there's a lot of secrets it's not something specifically new but you can see how they have multiplied and spread all over and they have different purposes different scopes and and different types and there's different ways to attack all of this to gain advantage and to hack the application and so the general areas of attack there's three of them the first one being the repository and to attack it we can attack developers uh through social engineering malware infection or sometimes people unintentionally disclose their secrets and so using ocean potentially an attacker can recover that and then use a developer secret to read or write to a code to a private code repository having that of course you can exfiltrate
information and potentially uh execute in the supply and chain attacks we'll see how that works in the later slide in addition you can also do code injection through compromised automation tooling so today more often than not in a repository there will be some kind of bot or integration that's going to be checking the request maybe notifying people perhaps commenting with a cool gif uh all of these are third-party things applications right and they're writing inside your code repository and so naturally they may have access to secrets or just code in general and here a key point to make is that if a developer writes code that has hard-coded secrets these hardcoded secrets are exposed directly in the code repository
as well as everything that goes further down the line the pipeline and the runtime in the cloud if the credentials are hard-coded we may have heard [Music] uh stealing the web hook for example if you have in your pipeline configuration file as we saw if you have it hard put it in there to notify you so the attacker could read that file get the web hook and potentially uh continue on triggering build denying you sort of for building your own services because your build holes will always be building something now when we look at defending that first line of defense is due diligence which refers to evaluating what third party integrations you're bringing in uh
do security self not some securities assessment questionnaires which is asking the providers how they handle security how they manage data do your own security audit you know reality versus expectation kind of thing check it um minimal privilege this is a big one restricting scope of the sequence so if your one developer token is able to access all the repositories read them and write to them that's very different from you know stealing a token that only has a specific repository in its access oops use unique expirable tokens so that you can identify in case of a leak you can identify who was affected how and so on uh using unique web hooks for triggering builds so and sort of functional
segmentation right if you have one web hook that the attacker has and now the attacker can build any repository with that hook it might give a much bigger advantage and finally avoiding hardcoded secrets as i mentioned is a huge one because avoiding them will prevent them showing up all throughout the pipeline and avoiding them is mainly using these placeholders as i already showed in the pipeline example in code you can also use things like these environment variables so your app expects the api key to be there and when it starts running it first checks that key and then does stuff with you in this case you're avoiding a hardcore hard-coded api key in here and if you
have a config file you can also use placeholders here the format is you know you can pick any format i guess it will largely depend on whether your pipeline is capable to of automatic substitution in these static files or not if not then what you can do is just write a simple shell script or python script what have you that would substitute these variables for you again using something like this alternatively if you don't want to use environment you can use a request to a secret service api like a secret manager you would request it with for example this key and then the service would reply to you with a secret value which again would avoid having the
actual hard-coded secret value in there moving on to attacking a pipeline uh the pipeline can be attacked through a supply chain attack and so this attack might happen because of compromise packages or dependencies and in addition dependency confusion it happens on this page when the pipeline starts building the code and it's pulling some dependencies that it needs to install them so that the app can run and one really really cool case was by this guy alex berson uh he came up with this clever trick so essentially what happened was here you had some uh in this case paypal it's a paper project it had some dependencies and these right here highlighted in red these were private things right they had
them in their own internal repository and they weren't in the public npm and so what alex did was he just basically published a public npm package cpp logger version one and then all this pipeline tooling or sorry the code repository tooling that bumps up your versions to keep your repository updated it basically saw that there was a newer version of this and it pulled it from the public repository and so here you can imagine that's like super very critical because you're pulling code right directly into your python and you know to make just to mention that we we're gonna have around five more meetings you know for the presentation uh okay you know uh hurrying up
uh to make things worse uh this thing the pipeline normally runs as root so you're having a arbitrary code as repenting in there overriding pipeline configuration if possible is another attack vector which would allow you to maybe leak secrets from the pipeline an injection additionally can be done through pull requests and there was another case with travis ci and where they had masking of pipeline secrets but then something happened and this bug was disclosing uh secrets through an injection of pull requests through injection injecting environment variables in the polyrequest and so finally common injection through compromised automation doing the same if your pipeline here uses any kind of docker image or what have you a package that is not managed by you
it's third party and if somebody compromises that once again it's very similar to a supply chain attack uh defending that the first line is once again due diligence uh making sure you know what is running in there uh hardening the build host so if you if you can run it not as root perfect restricting connect to network access so it doesn't have doesn't so it can't access all the internet in the world uh locking dependency scope in your packages to make sure that when you download the internal packages they're actually downloaded from your internal repository in npm you can scope them um taking over registering in public repositories registering the same names to prevent people from uh hijacking them
in the way that alex did and of course avoid using secrets and pull requests in general uh finally attacking the application uh part here all the same wasp stuff still applies in particular path traversal or local file inclusion and remote code execution are nice because you can disclose hard-coded secrets and potentially with our ce call other endpoints to retrieve them if you have unrestricted api keys these can be abused [Music] and ssrf server-side because for sure this thing made an appearance in a very big way in cloud because of all these decoupled components you can abuse the trust between the two components there's capital one breach example what happened was that there was a role
with all these permissions for accessing all over the place and there was a bucket an s3 bucket with sensitive information so the bucket was protected the role was protected and then there was this person who had access to the role and the trust between the role and the bucket was absolutely because it was supposed to be like an internal thing and so the person was able to exploit that trust and it says maybe a server side because forgery i don't think it exactly is a server-centered place for you but the idea is there it's the abuse of that internal trust uh in the aws there's an endpoint with metadata that you can call so if an
attacker can't compromise your ec2 instance for instance they inject through the pipeline supply injection and then they call this method at the endpoint that you can retrieve active instance credentials for your ec2 uh the defense for this you know the typical staff do early design reviews and threat modeling of your application to understand all these different points where your secrets are uh what permissions they need and what are the vulnerable points to contest security reviews use automation tooling as i mentioned in your pipeline in security test run static code analysis from static file analysis and new secrets placeholders in your code to decouple uh the secrets from the application at sky scanner of course we have the
opportunity to look at all of these problems and whispers is one of these uh applications that i've been working on it's a tool that you can use to detect static file credentials and so the difference here is that static code analysis tools they typically cover a dynamic called like java python what have you but they don't cover xml or json files uh to look for credentials specifically and so whispers does that and what you can do is you can put it in your pipeline for security testing or you can set it up as a pre-commit hook to even prevent your developers from committing potentially sensitive information it's expandable with rules and plugins and it's for non-executable
files including iac which is making a big appearance and it does like private keys certificates and so on we have other security automation tooling recommended check it out cfripr is really cool if you're using aws confirmation it will tell you uh if your roles and policies are correct and this is a huge advantage because it's very difficult to understand it and if you have this tool i think of it as security thing for a yak it's super helpful then we have uh lambda guard which is for uh infrastruct serverless infrastructure auditing and certain secrets for hard-coded uh for detecting hardware secrets and java javascript but this one is sort of getting phased out because there's now
other tools that are better and more efficient nevertheless check them out and yeah to summarize different types of secrets developers accessing code build secrets assessing repositories packages components runtime secrets accessing services and data different ways to provision secrets one of them is hard coding them which would be at coding time another way would be using environment variables which would be better and using a secret manager which would also be better with placeholders and in this way we want to couple secrets from the application there are many many many many problems and questions that must not only be asked but they also have to be answered uh around secret management where destroyed plane takes secrets uh
how to share it um how to go off boarding you know when people are leaving the company and what happens if secrets are stolen or lost stolen or lost how do you rotate them how do you keep track of all these random expirable secrets minimal probations and in the end what you have is you have many people using many types of secrets for many different purposes and you can appreciate that this complexity goes away beyond uh you know the password security requirements that revolve around this best practices i think we're a little bit you know over over time so okay well this is this is pretty much it uh thanks very much for your time and um
reach out to me if you have any questions you want to follow up on besides slack or on my linkedin and just to mention we're hiring in security so pick me up if you're interested thank you brilliant so i got a question but i would drop that in the in this live channel you know because of the time constraints but thank you so much you know for participating in visa personas as usual you know we've seen your your face a few times and you're always welcome you know to come again so thanks so much and for your presentation i appreciate you thank you thank you