GF - The Art of Letting Go: Secure delegation of permissions in AWS environments

Name: GF - The Art of Letting Go: Secure delegation of permissions in AWS environments
Uploaded: 2023-10-25
Duration: 33 min 38 s
Description: Ground Floor, 17:00 Wednesday This talk will tell the story on how we used SCPs (service control policies), IAM permission boundaries and IAM policies across our AWS Organization to set up the necessary guardrails to allow our engineering teams to use privileged IAM actions in AWS environments, ena

BSides Las Vegas33:3842 viewsPublished 2023-10Watch on YouTube ↗

Mentioned in this talk

Tools used

AWS CloudTrail EventBridge

Platforms

AWS Lambda AWS Organizations

About this talk

Ground Floor, 17:00 Wednesday This talk will tell the story on how we used SCPs (service control policies), IAM permission boundaries and IAM policies across our AWS Organization to set up the necessary guardrails to allow our engineering teams to use privileged IAM actions in AWS environments, enabling them to move fast without the need for manual approval workflows for the creation of resources. Additionally, we used an event based solution powered by EventBridge and Lambda to analyse for compliance, perform automated remediations and send notifications, which increased our visibility without adding to our workload. Cloud service providers forever changed how engineering teams work. Many companies have moved, or are starting to move, away from maintaining and operating cold and unforgiving server rooms, allowing that to be someone else’s problem. The time and effort required to have a server up and running went from weeks or days to seconds or minutes. Infrastructure as Code elevated that, allowing teams to have consistent working environments thus enabling the business to support as many customers or features as they wish to, reliably. Security teams’ need to find comfort in flexibility to empower engineering teams. Identity and access management, are a vital part of that journey. Sara Perez

Show transcript [en]

hello everyone uh good afternoon and uh welcome to bides Las Vegas uh I'm Hara nikar and this talk is about the Art of Letting Go secure delegation of permissions in AWS environments presented by Sara Paz and U before we begin I have a few announcements to make we would like to thank our sponsors especially our Diamond sponsor Adobe and our gold sponsors Prisma Cloud blue cat Toyota it's with their support along with our other sponsors donors and volunteers that make this event possible these talks are being streamed live and as a courtesy to our speakers and audience we ask that to you to check to make sure that your cell phones are set to silent mode and if you guys have any

question please use the audience microphone so that YouTube can hear you as well and with that let's get started and please welcome Sara thank you thank you so much um thanks everyone for being here this evening um as it was just mentioned my name is sth and I'm a principal Cloud security engineer at OCTA I'm originally from Barcelona in Spain and I'm also a telecommunications engineer by trade but somehow I ended up in Inc anyway um I spent some time doing penetration testing and delivering training at conferences but then I moved to the other side and um I've spent the last few years building Security Solutions in the cloud about today today I'm going to share with you the

journey we follow to allow our engineering teams to use restricted IM actions um I'm going to dig into why this need came up why all of the sudden they needed to perform actions they hadn't used before I'm also going to share with you how those actions were restricted and why because that's going to give us a tool to understand how we solve these problems I'm going to Deep dive into the two use cases we had and we'll do some quick recap talk about future work and then hopefully have time for questions if I don't run over time so how did this all start um for us as for any company we thought it was time for change we wanted to refresh the

way our customer environments were created and we wanted to do that because uh we we wanted to have a faster time to Market so be able to deliver those faster and also we wanted to improve our operations and reliability of the environment and with new infrastructure com comes new requests how does it look like we moved from an architecture where everything was easy2 instances everything was installed in those easy2 instances and was managed and that took a very long time for us to set up we moved from that to using eks and AKs clusters we also moved from being just the native us house to also provide these services in Azure something that has had been

requested from us for a very long time and um in this process you can see here we have a control plane that is in AWS and we are able to create customer spaces which are different uh 8 years accounts with the the customers on them and also aure subscriptions with the same kind of infrastructure to provide our services on them and these environments because they took a while to um to create they also suffered changes so initially we had used kium uh to be able to Grant access to our service accounts and kubernetes to allow them to use IM IM rosos but then we decided to we it was time to move into the recommended way that um aw um

facilitated for us it so the project had been going on for quite a while and in that process new things came up and we wanted to adhere to those things best practices and also as the project evolves we developed the need of not only having communication from the control plane into Azure but also from services in those spaces back to our 80 years accounts and that's that's how the the initial requests came in the first one was we need from each customer space and aure we have a service that is going to need to send data to an est3 backhead there were there are different ways of establishing Communication in between clouds at the time using IM am users was

found to be the fastest and best option so that the request was essentially we need to be able to create IM users they were not able to do this before following um what we saw of the architecture we were using Kim and we wanted to move to the recommended way uh by AWS which is using an oidc provider when you create a cluster an eks cluster in ad us it comes with an open ID uh URL and you can use that URL to connect to create an open ID connect provider that then the roles are going to use for those service accounts so again the request was we need to be able to create open a DEC connect providers is what was

not allowed up until then and this is how it came to be all of a sudden to us to the cloud security operations team we get this request we need to be able to create IM users and also oidc providers how did we have these actions restricted and why the quick answer is aw organizations scps and policies and I'm going to dig into that a little

bit for us in ad years we have we organize our accounts using AWS organizations which is a service from AWS that allows you to have one main account for management and then organize your accounts in different backets which are called organizational units which are these uh top layer that we have here you can name it name name them whatever you want and organize them however you want um we decided to create one new one for the new environments and then separate them um also per environment so development production sandbox and within those we can have our 8way years accounts all of these is managed from a management account that has a different login to the rest of the accounts in the

organization which means the access Su is more restricted why is this useful we this allows us to apply uh guard rails at different levels if we wanted to so we could apply guardrails at the prod level and different ones at the death level LEL which guard RS service control policies service control policies um allows us to specify the maximum level of permissions for us we follow the the default way from AWS which is using a Deni list approach which means that when you create a new account that new account is going to have a policy that it's going to allow for any service in that account to be used these do not Grant permissions they just allow

services to be used and then we have other scps on top of those that deny or restrict access to specific actions and and that's what we will see here later the next layer we've got is the account layer and those are identity policies and permissions boundaries which we will also see during during this talk those ones are attached to RS or um I am users and there are many different types the ones we'll see today we'll see inline policies those ones are attached directly to roles or users and they are one-time policies as soon as that user or role disappears that a polic that policy goes away on the other hand we have the manag policies which

can be AWS managed um most of you or some of you may be familiar with the administrator access policy um that one's created and fully managed by AWS you can't remove it but you can use it across multip uh principles and customer managed those are the ones that you can create you can reuse across different users and roles and yeah update them any time for permissions boundaries permissions Bondies are a specific use of a customer manage policy you can create a customer manage policy and use it as a permission boundary and we will see some examples of this later last but not least the last layer of the cake is the resource policies those ones are attached to resources

such as S3 buckets or KMS Keys these ones we we will not see them in detail today just because they did not factor into the solution that we came up with so they do not uh have any relevance for today right and this this is how right how we how we had those permissions uh restricted at different layers we're going to see an example when I was talking about uh denialist strategy for our sccps this is what I meant for the first ask about the IM users we had a policy a statement within one of our scps that denied the use of these actions on any Resource as you can see down here for any role except for the ones

specified here so it's basically an allow list of which roles are allowed to perform actions such as create user remove user etc etc uh similarly for the creation of Sam andc providers we had exactly the same so we had a deny action a deny um effect for the following actions for any resource and any role except for the ones specified here so again we had uh a restriction at the top level as you saw top level of the chain that said these actions can only be performed by a set of roles right and why did we have these ones restricted these ones were restricted to very specific roles in between them uh the cloud security

operations team such as us and some automations that needed to perform those actions no one else why well for for IM IM users for those that you are familiar if you are familiar with them they have long-term credentials they become very costly to to maintain you have to have uh processes to rotate them in in the case that uh they are compromised that ended up on a slack channel on a g repo by uh by just very bad luck uh so you have to have mechanisms to rotate them but also if someone leaves that had access to them you also are obligated to rotate those and last but not least they were not really conceived to be used by

workloads ad us recommends you use the as last resort if you don't have any other options for some of providers we use um or I providers we use IDC in our accounts and we wanted to make sure that only approved IDC providers can be created in those accounts otherwise anyone could just create one and um Grant access to whoever they wanted to to to those accounts or the account that the provider was created on right and now that we we had a clear idea on what systems we had in place to restrict these actions and a clear idea of the needs we needed to build our solution and for us was very important to uh fulfill the following requirements

the first one was flexibility we need to be flexible Cloud security operations team is not very big and in engering needs to move very fast so for that we wanted to come up with a solution that will make them as independent as us as possible and that it wouldn't require any money or processes or approvals uh we also wanted it to be secure of course we didn't want to have we wanted to provide flexibility without sacrificing insecurity so it was important for us to have guard roles that will fulfill our needs and also give us visibility what's going on and what's happening and last but not least it has it had to be scalable um we were doing

all these effort to uh improve our time to market operations and re reliability these Solutions had to be able to grow with it and with that I'm going to um share with you how the solutions for the first the solution for the first use case came in which is was the that request of we need to create IM users for every single Azure space how did we solve this ens short scps identity policies and permissions boundaries so we had a scenario that looked like this we have a control plane down here um which was uh that account that we had creating spaces in both sides we have the Azure customer spaces on this side that we're going to use

those users to then send data into one of our um storage ad years accounts to be processed our first question was who needs to be able to create these IM users and the response from engineering was a control plane role um this role is a is used by Automation and he needs to on the creation of a new space it needs to be able to create these user to then use those credentials the good thing about this role was downside it was created and managed by engineering so it wasn't under our control we have specific guard RS for roles we create but on the on the bright side it it had a very good Inland

policy it was very tidy which made the whole process um easier for us when I mean tidy I mean it didn't have star sty to everything um so it had only those actions that it needed for those resources that it really needed so how did the did this look right we will have the control plan role with an inland policy they we need to create a user that user for service X would have its own Inland policy to be able to to put object into S3 so how the we let's imagine we allow control plane rooll to our SCP allow list we added there so it's able to perform these actions but we want to be

able to control how those users are created we don't want this role to be able to create users with any kind of permission that would be a terrible thing to do so what did we decide to do here we created a permissions boundary we created a permissions boundary that would uh on top of the Inland policy add restrictions on how those users could be created and we will see we will see this uh in practice in a bit that permissions boundary would force the control plane role to create users with another boundary attached if the user was created with the correct boundary attached um everything would be successful the user would be created it would have its

inline policy and the permissions boundary if it wasn't it would just simply fail the creation would not be successful at all of this we would use scps to also protect the resources involved in this so we would protect uh the permissions boundary for the control plane role because we don't want anybody tampering with that boundary and giving it more permissions or or causing any trouble with it the same for the user boundary that's also protected at the SCP level and when when I mean protected I mean it's added to an to a list of resources that should not be tampered with um except for specific roles same as we saw before for we're going to try and see this I

hope it's clear we are I'm doing this in an account on the UI to make it uh more clear for everyone I'm logged in as the control plan role and I'm going to create an IM am user for service X and I'm going to do so without attaching any boundaries just because I don't want to commit to the exercise and that's going to give me an error it's not going to to allow me to create it telling me that the permissions boundary does not allow it so I go back I go back and I'm going to try to attach a permissions boundary this time but again for service y instead of service X but that's also not going to work cu

the permissions boundary is not going to allow it so finally I decide to comply with it go back and attou the one for the service

X that allows me to create the user but what happens now what if I want to tamper with it now just go and I want to change the boundary for example because I really want to apply the boundary for service y so I just go and try to perform this action and again it won't let me the permissions boundary would say you're not allowed to do this what happens if I just try and remove it instead of tampering with it we're going to find the same problem it's not possible so once the user is created with a it's only created with a correct boundary attached and it can be tampered with after the fact so we we we

got the solution that we wanted we wanted to make sure that those users were created with those guard RS that we will be be confident with to show you how this control plane R look like and how does that Inland policy work and how the permissions boundary work so we see here that it says customer inline these are the ones that can only be used once they are attached directly to the role and to the IM user themselves and then we see we have the permissions boundary we create that boundary sorry we create that boundary we control it here we as platform uh engineering to add the IM actions to any resource and then the permissions they already had

for ec2 instances and secrets manager and what not and we made sure they added the ones for management of users without specifying resource why because we are controlling that at the permissions boundary level we don't care the permissions they have on those servic we Grant star we allow the Inland policy to deal with that and what we do here is list and get we allow them to do anything but for the creation and management of users we uh establish a restriction based on naming convention so for service X you're able to create it as long as it has these boundary attached to it and that's why when you were trying to use a different boundary that was not

being successful ful if we had a different service we could use exactly the same to to perform the same restriction and then we also added some restrictions um to avoid the tampering with um the permissions

boundaries to and and this was almost the most complicated case of evaluation I just wanted to quickly also show you what happens on the user side on the user side so we're going to have a user that is going to have the Inland policy you can see here on your left um and then it's going to also have the permissions boundary and they look extremely similar this is because when you evaluat there's an evaluation in between the Inland policy and the permissions boundary to determine which permissions are allowed those permissions have to match in between one policy and the other one if anything the identity policy could be more uh open so it could have A3 star

but because my permissions boundary only has those actions specified only the user would only be able to perform the actions in the boundary so the boundary sets the top level this is the maximum you can do even if you have a St are here so for us we did it very very tidy we allowed the identity policy to set the the parameter customer on the S3 bucket bath you can see that in the boundary that's said as a star because we have no way of knowing the different names of all the customers and it would make it impossible to build a boundary otherwise if someone tries to create a user with a different L policy as I said

E2 star in terms of when the user goes and tries to access E2 star that evaluation will fail because E2 permissions are not in the permissions boundary if even just in the permissions boundary there was easy to describe that will be allowed but there's nothing so the evaluation fails so gotten to this point for the creation of IM users we were successful in allowing engineering to create those IM us users with a specific guard R and without needing us for absolutely anything which was the original goal for the creation of ODC providers though things are slightly different we were able to Sol them also using scps and identity policies but policies can get you so far so we had to add an

architecture of events in Lambda to cover those gaps that the policies were not covering again what does this look like so we would have our eks or eks cluster with different service accounts and we want them to be able to use different IM roles to access different IM uh to access different services for that we create an IDC provider again who needs to create it in this case the infra deployer role differences from the previous use case is that this role was created by us by Cloud security not so good is that it was way more permissive that that the control played role so we had to play the game a bit differently again we would have a role

with an inline policy that needs to create an newc provider in this case because we managed that role we could it was already protected from tampering at the SCP level we decided to uh come up with a statement to add to its in loan policy so it had lots of commissions and we were able to add a statement like this so instead of adding a boundary or doing anything extra we just updated its own Inland policy and it looked like so so we have an effect deny and if you look here my pointer is not working but we have the not resource instead of resource not resource allows us to tell it along with the deny that these actions are denied

for the cre cretion of any ODC provider except for the one that is specified here so as long as the provider you're creating matches this naming convention you'll be allowed to do it if it doesn't you won't be and this this was very useful for us because we didn't um have to try and specify all those providers that the role was not a should not be able to create that would have been an impossible list but only to specify those ones that could now gotten to this point uh we get to the point where policies don't cover everything and one of the things that it didn't cover is that for a nikus cluster you can create

that ODC provider on any 8 years account you wanted so effectively if you have one cluster in one account you could create the IDC provider for the other account uh for that cluster in a different account and that's something that we agreed with engineering that should not happen that if there was ever a use case for that cross account access it would happen with ro chaining and not creating many many many IDC providers in different accounts and that's something we could not cover with policies and that's where we created the architecture of events and Lambda um this solution we also decided to use it for other providers not just eks so the the whole solution

contemplates other validations not just this one so for this what's going to happen I'm going to create the IDC provider that's going to trigger an event Bridge rule we created an event Bridge rule in every single account on triggering that was going the event is sent to a centralized bus in a centralized account which in turn will trigger Lambda so we're going to zoom in onto that right hand side right our Lambda receives the event and it's going to perform a set of checks uh some of these checks are related to the eks providers some of them are U related to other IDC providers that we decided to cover with this solution to so it's

going to check the type of Provider if the principle the role that it's creating it is correct or not Thum print audience and that if it's created on the correct account or not if those of any of the evaluations fails we're going to go ahead and remove the provider and notify the team so we can review the activity if all of them are successful everyone's happy and everything's uh working as it should but we we would have logs that that happen to confirm that everything's working and again as same as before we use scps to avoid tampering of the infr deployer role and the different event rules otherwise someone could go into an account just disable the event Bridge

Rule and just render the whole solution and usable but with this we are able to to protect it also the auto remediation role used by the Lambda that's also protected at the SCP level and now we're going to see hopefully that's clear we are in an account we're logged in we have a cluster in a different account and this is the IDC provider URL that it has and I'm in a different account as the infra deployer role and I'm going to try and create ANC provider for the cluster that is in a different account this is one of the use cases that we don't want it to happen so we just go through the [Music]

process and this is legitimate so there's nothing that is going to actually prevent it from happening it's created because it's an eks cluster and now we're going to have a look at the logs of that Lambda we see that it receives the create open at connect provider event um this is one of those events from cloud trail and we can see that it was created by the infra deployer role and that everything theoretically looks good cuz it's an eks orc provider but then we see that one of the validations failed because it was able to identify that the provider and the cluster are in different accounts and that's not allowed so later on we can see that

there was a delete event that our Lander going to perform the remediation and that sends an Alerus lack for us to see it and go and see what happened and review with B the team or uh trigger an incident if necessary now we refreshed to check that it was actually removed then it was it did what it was supposed to do right so got to this point we had also been able to facilitate the creation of those IDC providers without us having to approve anything we were able to so we committed to the first point we wanted it to be flexible it become became flexible for everyone uh for both engineering they were able to

move as fast as they wanted to and we didn't receive a bunch of tickets every every day security uh we set up all the GU rules that we needed and we will um that would allow us to feel comfortable with the solution and also we gain visibility because we have this framework now previewing every single creation of providers in terms of scalability though not so good why because you have a hard limit on the number of IM users you can you can create per ad years account I think the limit right now is at 5,000 it seems like an enrichable limit but we when we started all of this we had one service from Azure that needed to speak

to ad Us by the end of it we had three so if you start counting three it users per customer in Azure that can grow very quickly so for future work and this is actually happening with we decided to replace the creation of IM users that you saw by a im rols anywhere that's a service that came out mid last year if I recall correctly and allows external workloads to use IM roles um as if as if they were just in in ads and we are replacing all those IM users with IM rols which will prevent us from having long-term credentials and will make us um this more scalable and I have added a bunch of references

here this is basically aw uh aw dogs but these are the most useful ones I ever found sorry that was so quick and that's pretty much all for me uh this is my first ever talk at a conference so thanks for being here at this time um and yeah if anyone has any questions happy to take them no questions sure um I'm curious how from those are users there's consideration for the keys that were created for the access keys and secret Keys we did have so we did have a rotation in place um there was a mechanism that was rotating those keys every x amount of time time that was created well it was a demand from us to

platform teams and um we wanted to add further uh protections at the policy level we wanted to restrict uh the sourceip for those credentials but because we started the replacing that withd RS anywhere we didn't get to to implement that but that was the next step for us restrict the source which is something um quite attainable to do no more questions no well thanks everyone cheers

GF - The Art of Letting Go: Secure delegation of permissions in AWS environments

Related talks