
um so yeah we're going to be speaking about how to secure Cloud machine identities so my name is KL I am a founding backend engineer at pzer security and I'm Nathan I'm one of the co-founders and the vice president of engineering so at p security we are building out tooling to help secure access to the cloud and as part of our work we uncovered a lot of complexities and challenges surrounding machine identity and management and that's what inspired us to give this talk so who is this talk for um it's going to be pretty practical so it's for security practitioners also although the topics covered apply to all Cloud platforms we're going to be focused
specifically on AWS and gcp this is also going to be pretty intro level so it's for people who might be relatively new to machine identity security what will you get out of it well we're really hoping that today you're going to leave understanding the risks of mismanaged machine identities and feel empowered to take concrete steps to mitigate those risks in your environments [Music] so I want to quickly walk through the agenda for today we're going to begin by motivating why you should care about machine identity security then we'll go through a quick refresher of AWS and gcp I am terminology and then the meat of this talk is going to be for main security best practices and how to
implement them in AWS and gcp those are going to cover two categories credentials and permissions and we will end with a Q&A cool so let's talk about what a machine identity is so machine identity is the the identity that a service would use in order to execute some action and for the purpos of this talk we're really talking about actions in your cloud service provider so think of these Services as processes running on VMS uh they can be kubernetes workloads they could also be other managed services like Cloud functions or Lambda functions yeah there we go and kind of the meat of this talk of like why are we giving this talk now right is that
they're a getting to be a bigger problem than the classic problems we would have with managing human enties there's tend to be a lot more of them it's more of a scale problem and they often have access to pretty sensitive data in your system right because they're built in your applications so let's take a look at what can go wrong when machine identities aren't properly secured so we're going to look at the OCTA 2020 breach um and during this attack an attacker gained access to an employees account where they had saved a credential uh for a service account this service account then had access to some sensitive files in octa's customer support system so at the center of this
breach was this unsecured service account credential now before we dive into our best practices I want to review some I am terms that we're going to be using so in gcp our human identities are going to be users in groups and our machine identities are service accounts and authorization is done by mapping identities to roles via bindings in AWS human identities are also users in groups and our machine identities are I am roles authorization is done by mapping identities to allow and deny actions via policies cool and since this talk is really a talk about how to do security at scale we're going to give you some recommendations throughout this talk the recommendations we're going to give you
are kind of Click Ops or console Ops uh recommendations but we also want to give you the ability to kind of take these home and execute them in a real security program right in lots of projects and accounts so for any sort of read action we're going to give you scripts or htb commands you can run and then for any sort of WR actions we'll give you the terraform that you'd run we're not going to talk about it it's just going to be in these green boxes throughout the talk cool so let's start with the actual best practices we should talk about we're going to start out with authentication that is how do we know
that the identity that claims to be your service Identity or your machine identity is that identity um and we'll talk about starting out their first problem which is the danger of using longlived credentials so I'm going to show you what can go wrong if you're improperly using longli credentials so here we have a service account key that's saved in Google secret manager and I could be a developer that has access to secret manager for a variety of reasons like maybe I'm working with API tokens what I can do once I have access is I can just view this service account key so I'm just going to copy this service account key I'm going to save it
right here on my machine sorry um and now I can authenticate as that service account and I have access to whatever the service account has access to um so let's look at what I can do and here I'm going to access some sensitive cloud storage data so I've been able a list files and I'm actually going to be able to read this secret file and so let's look at how these actions are logged in Google so here is the action that I took to list storage files and if we go into the authentication information we see the service account as well as the service account key but there's nothing linking this action to my identity as
Koo so we see two problems here now that I've downloaded the key on my machine I can use it whenever I could leave the company and still use it and also in the logs there's no way linking there's no way to link this back to me so this brings us to our first best practice which is don't use Long Live service Keys um okay great uh thanks Nathan so um really the first best practice is we're going to show you some Alternatives that you can use to avoid creating these long lived Keys um so first in gcp we're going to not use service count keys and we're going to use two things service identities and workload identity
Federation I'll talk about both of those so let's start out with a service identity a service identity is a machine identity that is given to you by the cloud service provider Itself by gcp and it will automatic it will assign it to a service that's run inside gcp so in this example I have a cloud function So This Cloud function does something whatever it is uh and I've created a service account and attached it to this Cloud function so whenever This Cloud function runs Google will automatically figure out all the credentials for you you don't have to figure out how to get these keys or Secrets or anything into the cloud function it'll just work and there's some terraform if you
want to set that up on your Cloud function now this is great for something where your cloud service provider is actually managing the thing you're running right but what if it's some third party service for that you want to use something called workload identity Federation and this is saying hey take that same concept right a service identity in your third party system and use it in Google and so in this example I have a GitHub action this is so this is my CD system right and GitHub will actually create a service identity for every single run of this action and that service ID will have a unique identity for the branch it's being run on and the
job itself the repo your or and then you can map that to a service account inside Google and give that account just the permissions it needs to do that deployment uh it's going to be a lot more terraform but it's possible right and uh yeah uh so in AWS we're going to have a similar kind of thing just has different names so instead of service account keys we're going to try to avoid using IM am users in order to represent service accounts and we're going to use service roles and oidc Federation so service role in AWS is basically the same thing as a service identity is in Google the kind of thing that's nice in AWS is that
for a lot of services in AWS is AWS will actually set this up for you automatically if I create a Lambda function in AWS it will actually create out of thin air a service role just for that one specific Lambda function oidc Federation is actually the same oidc Federation you use to Federate user say from OCTA so you set that up exactly the same as you would set up uh authentication from OCTA so in this case uh it's pretty similar process in GitHub you're going to tell it how to present itself as an oadc login to AWS and then in AWS you'll set up the same identity provider uh that you would use for something like OCTA and uh assign that
service identity that's in GitHub to your service role um I guess the terraform got lost but that's okay uh it's a little more complicated just be aware of that now once you've gotten these things in place you can if um if you're lucky enough to get all these in place you can then deny the creation of static credentials altogether so in Google you can do this with an organization policy despite the name you can assign an organization policy to the organization as a whole or individual Google projects or folders the organization policy you're going to want for this is disable service account key creation there's the terraform and in ads you'll use a service control
policy little more ter and that takes us to our second best practice so in an Ideal World we would never use long lived credentials but unfortunately that's not really practical you're probably going to encounter a use case where you simply can't get around it so what do we do in that case we want to make sure we regularly audit and rotate our credentials so this is important because um it's going to limit the likelihood that a credential is compromised and auditing also allows you to discover and remove any unused credentials so let's look at what we can do on Google Cloud um you can enforce a service account key Exploration with another organization policy um so this
is going to set the same expiry duration for any service account Keys created on um any of the child projects of whatever you've configured this on for auditing purposes we can use the cloud monitoring API so this tracks service account key authentication events and that's going to include details about the event such as the apis that were used during the authentication event in AWS this is actually pretty straightforward um the IM service allows you to just download a credential report and that contains all of the information you need um so information on all the credentials in your account as well as their [Music] usage great so let's say you've solved this problem right I made sure that uh
my machine identities are who they say they are well as we know uh nothing in security is certain so at some point you're going to get some sort of vulnerability that's executed that allows some entity to adopt a machine identity that's not the actual identity you think it should be or that identity itself service will get compromised in this case you want to make sure you limit the damage of such an exploit by restricting the authorization of that machine idty and by authorization we mean what action is allowed to perform in the system and let's talk about what can go wrong so the danger here is excess privileges that is things uh that the machine identity is given permission to
do that it doesn't actually need to do um so let's look at another example of what can happen when machine identities have excess privileges so here I have a VM instance in compute engine and I'm a developer with permissions to S into it so I'm going to go ahead and do that and what we're going to observe here is actually the service account attached to this instance has a ton of permissions on the Google Cloud project it's granted editor this is actually a default configuration for the compute service account when you make a new Google Cloud project um so once this SSH succeeds I'm going to be able to take some perhaps surprising actions in our Google Cloud
environment so I'm once again going to be listing our sensitive bucket here and I can once again read this secret file that we have stored so as you see here um if our machine identities are over provisioned then giving access to say SSH into a VM instance to a developer often grants permissions that you don't intend so what can we do about this our this brings us to our third best practice that is ensuring that the machine identities have leas privilege permission so lease privilege is just jargon for can only do the things that it actually needs to do in gcp we want to avoid assigning what are called basic roles so this is editor viewer owner and instead make
sure grants to that machine identity are only on the resources that that service ID needs to work on or that machine idti need to work on use IM am conditions to restrict what the idti is allowed to do and also create custom roles that consist only of the Privileges uh that identity needs so let's talk first about how you would create say custom roles you can use a tool called policy intelligence this is built into to Google Cloud's I am uh so if you go to say the IM am page of your Google Cloud project um it will give you security insights and these will tell you which permissions are used and unused you can
then go and craft a custom rooll for your identity in question that only contains the permissions that it actually uses now caveat empor this does require a paid security Command Center premium tier subscription at to be used at scale so if this isn't available to you or you want to also layer in hey what resources does this identity need access to you can use logs to then uh track which things is actually using go back parse those logs and figure out uh the resources and per permissions that you should Grant um by default you get admin activity logs so these are sort of any sort of configuration change the idty makes but you'll also need data access
audit logs by default these are disabled you can turn them on like so in AWS uh it's pretty similar kind of set of um recommendations we want to avoid any sort of wild card grants for actions or resources so we don't want to say action star resource star instead we want to specifically enumerate which actions and resources the identity should have access to and also we can use conditions to them further restrict what the identity can do and then finally in AWS because it has this layer policy system we can then furthermore add deny policies for things we really want to make sure it doesn't do so for instance um in this case we could just restrict
all access to S3 from a service right uh in AWS you can use the IM am access analyzer in order to go and figure out what access is unused so this will tell you which actions aren't used um by default this isn't created you would to go and make an unused access analyzer yourself you can also have ad this automatically generate policies for you based on cloud tral events so you don't even have to do the log analysis yourself and finally there's an open source tool called repo kid you can turn on uh what this will do is automate in a continuous manner the least privilege uh role creation and assignment for you so even if you've given all of your
machine identities least privileged permissions often times they're still going to need access to perform sensitive operations or read sensitive data so in that case we want to make sure we really limit user access to machine identities so this is important because an attacker can often escalate their privileges once they have compromised a user account through machine identities it's also really hard to audit the access a user effectively has because their access is not only the stuff granted to them but also the stuff granted to any machine identities that they can access so how do we do this well in Google Cloud this means we want to restrict permissions on service accounts so you should only Grant
resource level permissions on specific service accounts when your developers need access to service accounts you want to avoid granting service account permissions on the project folder or organization because these will apply to all service accounts and any new service accounts that might be created in the future um so the permissions that I'm referencing are on this slide these include permissions to act as a service account or get that service account's access token you can also use this open source tool called gcp IM privilege escalation which will scan your configuration for permissions that allow privilege escalation um including these permissions so you also want to monitor for any suspicious authentication events on your machine identities so the cloud
monitoring API again has um a metric for service account authentication events um and these will also include the original principle if um it was assumed via IM am permissions to get more details you can view the usage logs for the service account um so this will like contain more information including the ual contents of the request that was made so what do we do in AWS um in AWS we want to restrict access that users have to assuming roles so that means we want to avoid using wild cards or colon rot in the roles trust policy because this can allow any identity on an AWS account to assume that role we also want to avoid granting the permission I am
pass role because this allows a user to pass a role to the service and then use the permission of that role and it bypasses um some it bypasses AWS Cloud Trails event logging for assume role events what we should what should you do instead you can explicitly specify the principles in the trust policy you should also configure a WS um to set the source identity so if you're using Federated access this can be done in the identity provider and what this will allow is AWS will track the source identity that assumed a role so you'll be able to see the original identity that that did the Assumption even through role chaining lastly you can also use deny policies in trust policies
so you can deny most users access to assuming sensitive roles um there's also an open source tool called P mapper that will allow you to visualize um identities that have access to roles as a graph on your AWS account for auditing purposes you can monitor assume role events using AWS cloud trail and if you've configured your Source identity these events will actually include the source identity of the assume Ro [Music] event cool awesome so that was kind of our Whirlwind overview of uh G say getting started with an machine identity security program um but in summary these are the four best practices we would urge you to adopt uh wherever possible use alternatives to any sort of Long
Live static credentials where you have to use Long Live static credentials regularly audit and rotate those credentials um ensure that any machine identities you have have least privileged permissions that is both only permissions to do the actions they need and only on the resources that they need to perform those actions on and then finally to limit and monitor user access to machine Lear en these and uh oh go sorry um so we're gonna open it up to questions now um and you can find the link to the slides either by scanning this QR code or by following that link thank you thanks [Applause]
folks if anybody has any questions you can actually post your question in slido um you can scan the QR code that's outside of every theater or you can go to bsides sf.org qna that's Quebec November Alpha um and also remember the Kamal and Nathan are available at their Booth today for further questions they're also going to be up in City View shortly if you have any in-person questions you wish to ask them um we want to thank them both for doing such a lovely job you are the best here's some gifts for you to thank you for presenting of course um we couldn't do this without our sponsor and all of our volunteers and staff so our lovely
sponsors are here but the gifts are actually from socket security so thank you so much we appreciate that um also we would love your feedback on the bsides uh website we always want to hear how we're doing good bad all the things um and also um lunch is happening now and is open from 12: to 1:30 on the fourth floor City View which is where there'll be shortly um please go and eat some lunch uh we want to clear these rooms during the lunch hour um so if you could please do that we would appreciate it and also we have head shot available for you to take uh for your LinkedIn profiles and um that's right outside of
the talk tracks all right thank you all so much for coming at uh bside San Francisco 2024 and we look forward to seeing you again soon next year