← All talks

Cloud's Dirty Little Secret: It Was Misconfigs All Along

BSides NYC · 202527:2720 viewsPublished 2025-12Watch on YouTube ↗
Speakers
Tags
About this talk
Cloud misconfigurations remain the leading source of data breaches across enterprise environments. Karl Ots examines four categories of cloud security mistakes—legacy mindsets, insecure defaults, overpermissive access controls, and publicly exposed infrastructure—with concrete examples and remediation strategies. The talk provides actionable recommendations for both red and blue teamers, covering policy-as-code enforcement, identity management, and runtime monitoring across AWS, Azure, and Kubernetes.
Show transcript [en]

At least I like like them when it comes to also enjoying them on that side of the table. Uh this is really a session about cloud misconfigurations specifically coming from uh from my own experiences uh both as a builder as well as a breaker of of clouds. So let's uh let's get started just very briefly. Uh my name is Carl. I live here right in maybe this one. Oh, okay. Both of both of them. All right. Um, double micing. Love that. Um, I I'm a consultant. I'm heading cloud security at a consultancy called EPAM. Uh, we mostly build things, but I represent the part that loves to break things apart. Uh, on on that side,

mostly I'm working with uh with enterprise customers uh mostly within within the Azure space. I've been working with Azure since since it's being called Azure. But of course, I do dabble with with lots of other clouds and all of the things that I'm going to talk about today are very much applicable for for all clouds. I do have some exper examples there at the end that are specific to multiple different environments as well. Um so yeah, uh very very quickly uh this is uh about misconfigurations. I know some of the other cloud presenters have talked about this before as well, but I really wanted to set the scene a little bit in here and kind of I I really don't

want to talk about misconfigurations anymore, but you know, if you look back what happened 10 years ago, it was misconfigurations that were the biggest sources of all kinds of data breaches. They weren't any uh any type of new exploits, anything that the fancy stuff, the zero days, anything like that. Five years ago, exactly the same thing. And now it's still the same thing. I I wish we would actually move move away from uh from that and actually start looking at something else. But it's still kind of uh as we just talked about it's the human in the loop. It's still the human that is making those mistakes. We are still building those solutions. We are

using the out of the box configurations. We are using the defaults. Uh like it or not the defaults are not uh secure uh for many of many of these vendors. And even if the individual defaults would be uh putting these building paths uh building blocks together actually makes this uh into a very much a potential attack path. If you were checking out that earlier uh Azure exploitation uh session session as well. We kind of started building these attack paths not from a from a perspective of misconfigured environments but fully individually properly configured environments but only when you started to put those together uh in uh it actually become and became a problem here. Um just to get us on the same page

misconfigurations I'll just open each of these up in here. Misconfigurations can be kind of the classic one. and I accidentally open up a storage account in my p public uh to the public internet. Uh it can be that I I have secured it but I'm secured it using a static key this kind of legacy authentication and I kind of accidentally publish those to to GitHub. It can be like a Kubernetes cluster that I expose uh expose publicly to the internet that Shogun can pick it up. Anything like like that is can be classified as as a misconfiguration as as well. So, uh, we're going to do a little bit of a story here. So, we we're going to

take look at four of these kind of harbingers of doom in terms of misconfigurations in here. And we're going to see a couple of examples of each of these and how to fix them. And I'm going to tide in the end uh with uh with some examples and go to actions for blue teamers and red teamers alike as well. Um, I did write this for a little bit of a longer version of a session, so bear with me. If if the RPM comes a little bit higher, you can always kind of slow it down later on the recording as well. So, let's uh let's get started. Um so really this these four horsemen or these kind of harbingers of this uh

security or misconfigurations these are really kind of from what I see these are very broad categories because I I really think this with these categories I'm going to cover 90% even higher higher volume of the typical misconfigurations that you'll see. Um so the ver first one that I'm going to show here is is really this kind of uh legacy mindset. This kind of uh approach that we we used to have a specific control that worked very well in our previous environment but we lifted and shifted not just that application itself but actually the security controls or the mindset or the security architecture as it is from the from that on premises world to that cloud world. This can be uh you can see

this very well in different access control environments. You can see this in this uh network environment. Typically a network network engineer or network architect is someone who is very specialized and if they if they need to be uh brought into the cloud discussion, there's not always the same set of skills that that are applicable in there. Having that additional context of what can a cloud access control uh configuration do for my uh for override my network environment that's very very um fundamental for uh for not just lift and shifting those cloud network controls but also other uh types of controls in there as well. So this can also be you know stuff like if you have

a static set of fire firewall rules or you if you have a uh very heavy uh heavy process even if you go for this kind of fancy shiny network security security security tools that are out there even the ones that are kind of by their very name you know you put most money it's kind of name after the fanciest and most expensive neighborhoods in the world even then it's not guaranteed for your safety that you need to actually apply and make sure that those are actually applicable and catching your ephemeral work workloads as well as actually adapting to that environment in that cloud. Um so a couple of examples uh in on the on that side. How can we actually start

start fixing this part in here? Really it's all about kind of understanding that it's not a static environment anymore. Even if you don't think that you you have a or even if your enterprise counterparts don't think that you have a fancy environment with lots of deployments, hundreds of deployments per day or anything like that. Even if it's somewhat of a static environment from a workload and development perspective, there's still a lot of noise, a lot of things that are happening, workloads being provisioned on and off uh in very quick uh very quick fashion compared to our existing previous previous world and existing controls. So we really need to adapt to that kind of ephemeral nature of those

cloud workloads. We need we need to get away from that kind of agent-based mode. We need to get event driven which means that we need to be able to uh tap into that kind of control plane of the environment that we are working on whether it's cloud whether it's kubernetes whether whether it's kind of some of these other orchestration layers that I can put on top of my cloud environments that's where we need to kind of get into that kind of brain of uh of our operating environment there and there's a t ton of identity based access control that we can put in place identity is typically not uh ephemeral it's the network and the actual workload

themselves, the compute that is usually ephemeral. The storage and identity typically are not at least as ephemeral as as these other environments there. And if you are doing uh doing your your network controls in a very heavy configuration, uh if you're using like Kubernetes, other type of clusters like that, take a look at some of those service meshes that do a lot of that heavy lifting for you. they tap into that nervous system with with stop life EPBF or some of these other kind of sidecar proxies in there. Um so that that's the first one kind of that legacy on premise mindset there. So moving on to to this other one that I like to call out kind of just give them

more and more access until they get their job done. very often, especially when we see this uh this lift and shift environments, but also in environments where there is, you know, just a bunch of devs trying to do their best uh but they are getting those limitations of the new cloud environment that they may not use uh may not know what are the actual limitations in there. They usually end up asking just more and more permissions to do that one action that they that are they are blocked with. And if you're not, you know, smart about it, they may get stuck with those most permissive uh permissions in there. And that's what we want to avoid of course

and this this means a couple of different things when you're looking at user access versus uh versus uh workload or machine access. But in in principle this is kind of this creep of different type of uh credentials that I'm getting in the cloud. it's additional workload identities that are not getting really used and it's this additional set of permissions that I'm getting pre uh pre provisioned even if I actually use only a small percentage of those roles and you can really get into this kind of analyzes paralyzes with cloud credentials because in Azure like 1200 uh roles that are built in you can h have your own custom roles you can have multiple levels of uh of those ro

controls it's it's very complicated uh if you want to pre the pre If you want to be prescriptive of that, if you want to design that upfront, you would rather have a machine looking at this. There is lots of different tooling both commercial and open source that are available there that actually take a look at that uh what is the actual use of our permissions uh compared to what's actually being provisioned in there. Um so that's that's one main thing in there. Whenever you have a a an access control in the cloud, there's still a lot of environments that have their own disconnected access control that are key connection string based. Get rid of all

of those. There is configuration that you can enforce to actually get get out of uh out of that. Get that out of your system. Always go for a centralized IDP, Azure size, Entra ID G, Google side, use your Google identity, enable AWS, just use a centralized identity. Pick one. uh any anyone is is is better than that kind of disconnected environment there. And and whenever you can uh go and actually look at instead of the specific role that you are you are giving uh for your principles whether those users or uh or machines instead of the role start looking at very very specifically put that analyzes paralyzes motion into uh into looking at the scope where are you

provisioning that scope in rather than what is the exact set of set of calls that is needed up front. You can do that later. You can do that based on the analytics of what's actually being used. And once you know what you're doing on that side, start enforcing that uh start actually building all of your identity and access controls uh through identity and access as code. Uh build that into a prescriptive language uh as you can. I don't want to install an update uh right now for this Lenovo device. So uh let's uh let's see how how much we have. Uh here's a quick example of u uh of this kind of identity and access as

code. I know it's a it's a it's it's a dark uh dark environment there but uh this is a tool open source tool to tool called air am uh this is uh this is from the same uh people that built Czecho and some of these other tools nowadays part of checkmarks uh this basically just dumps all of your existing AWS roles and uh parses those as uh you know as machine readable analytics looks at what are your permissions looks at your logs and pinpoints. Here's like a root user that I have in here that called booth admin that hasn't been used for a 900 sorry 192 days. So that's probably something that we want to get rid of. Uh

we want to get rid of those overly permissive users and also those users who are not active actively using uh this environment at all in order not to create those attack paths. uh there is like commercial tooling in this in this space uh this called keem uh cloud identity ent infrastructure entitlement management set of tools uh but you'll see in all of the capab and other tools there's similar similar capabilities built in uh in those tools as well um and and this kind of talk about identity and access control uh is not only on on the side of uh of what we just talked about on that what is the least permission in the context of of Azure or

in in the context of the cloud world. But this is really all about kind of uh when we kind of break this this mindset of uh c of that centralized identity that I was talking about. So this is in in Azure for example, you can create those connection strings. You can have those legacy authentications uh available for your uh for your storage accounts, for your keys. uh all of those services that host your data, they have an alternative method method of authentication. Um in the on the storage side, uh typically you need to actually change the default configuration to either take get rid of them or default to those centralized identity providers. And exactly the same thing if you are

building in in other environments uh like a kubernetes based environment where you by default have just a set of uh set of a built-in cluster uh cluster local credentials essentially. Um just just very briefly uh you know whenever you have you know this is those classic misconfigurations whenever you have any of those storage access storage accounts buckets you name it based on the environment you know just remove that uh that default access there even if you have a a case where you genuinely require a specific environment to be used with a keybased authentication there's always always an alternative you you can lock it down from from the future keys you can you can Make sure

that you have compensating controls available there. Most of the time you really don't and you should use data plane level of role-based access control or attribute based access control if you're in GCP or or AWS and actually make sure that you are enforcing your network and network and other levels of controls also in your role assignments as well. Um, all of this should be enforced using uh using policies and I'm going to talk about policies uh in a moment later as well. Uh, here's just an example classic thing by default uh accessible by public. Yes, you need a key, but this is already one level closer in there. Um, and then the last one of this uh

this uh u horsemen uh is is really the classic one. Again, I I said in the beginning 10 years ago I I started talking about the misconfigurations when we kind of coined the term as an industry. This is still one of the things that we had back then. This is really all about accidentally exposing us to the to the public in there. Uh here's here's what happens in five minutes when you have a container instance uh running in the public cloud. Uh it's going to get scanned. It there's application level scanning in there. It's there's pings happening all the time. Whenever if if you have anything that's that has a port open, of course, you're going to get bombarded. you are

you are eventually going to get logged in uh for sure uh just do yourself a favor uh and go kind of disconnected by default default uh in those environments. So there's bunch of different things uh that that we can do about these things. Um we have uh based on the life cycle how we are building this if you're coming in later there's limited amount of stuff that we can change but if we are starting from scratch from a uh from a cloud buildout perspective of course uh securing something before we actually deploy it is you know typically the uh seen as the cheapest or the easiest way to way to do that. How do we do that at

scale? It's uh it's by enforcing you know PRs enforcing scanners for our infrastructure as code. If we get to a point where we actually have all of our deployments as infrastructure as code then we start then we can start enforcing tools like uh tools like terascan or other IA security you know static application security but just for our IA uh in there that's perfect for something that uh that is prescriptive uh we can also bring that or shift that left uh tools like check actually build those same same policies and same guidance even all the level of uh my VS code so that I I can actually teach my developers about this, not just break their bills in

there. Uh if you are uh doing uh Kubernetes like many of us are, there's a bunch of other different uh different tooling for that like Cubse and others. Um and if you are building building this in an environment that has more than one developer, then likely you're going to use some sort of version control. So go ahead and trigger those uh those uh uh scans and um and analysis based on some of some of your uh GitHub commit uh hooks there. So from a tool set perspective when we are moving away from the deployment or when we have actually deployed this now this ephemeral stuff happens we deployed some configuration but now we start actually using that if

you are in a cloud environment overall you know this this current hyper that you use is is SAP cloud native application protection tool. This really just means that it covers more than just virtual machines. Uh they talk about eventbased. There's many different vendors out there. Uh some are allegedly worth 30 30 billion. Uh some are some are free and there's everything in between uh from there. But basically uh use either the built-in one that you have on your cloud environment. Typically that is at least closest to free uh until you until you start adding some premium features in there. Uh but at least that gives you that visibility of how is my posture doing? Are do I

have some of those uh well-known misconfigurations in there again if you're doing containers uh you can get a little bit more detailed you can add stuff like falco falco uses uh you know actually on the cluster level actually goes in and does epf based analytics. So anytime there is a call from your from your um container application uh to the node itself uh all of that gets logged in. All of that is something that we can analyze from a security behavioral uh perspective. Uh if you're more more in that uh kind of in inspection inspectional space uh go ahead and look at tools like scouts feed and these others that you can do run a little bit

of red teaming but also kind of good kind of posture management for yourself as well. get to know how is my environment configured kind of can also act as a good uh kind of a double check for your synap tooling. Uh does it actually check what what I'm doing? Uh and again enforce all of this stuff using policies. Policy as code is really one of the hopefully the last ascode that I'm going to mention. It's really something that you want to use to enforce all of your security controls. not just uh not just for the existing ones to audit your policies but also to prevent this stuff from happening again so that you can actually focus on the

next uh next big things that you have in there. uh the native native policy tools on each uh each cloud providers. Uh there's uh if you if you are uh paying pay paying for your terraform you can also use hashorp sentinel uh if you are on kubernetes you can use uh tools like kuberno that actually give you that same policy as code approach to be prescriptive of what's allowed and what is not allowed and what is audited and how it is uh how do we provide that audit log uh for our uh as a proof uh for for our behavior in that environment. Um so a couple of call to actions uh here. Uh so so from a red teamer

perspective uh really you know just do the simple scan. You may think that this is something that everyone knows we all know but there's always you know exception this is only for five minutes or this is only for dev. No one cares if it's only for dev if it's your if it's you know when you're when the press hits no one really cares if you kind of deem it out of scope or not. So just scan those uh scan those basic stuff, public storage accounts, public container registries, uh public dashboards from your Kubernetes, all that sort of stuff. And specifically look at those legacy connection methods, connection strings that are not tied to centrally managed

IDPS. And and uh getting a little bit deeper, look at uh look at your uh service accounts for Kubernetes. Look at your service principles. Look at your highly highly powered and especially overpowered uh machine identities. uh and start looking at how often they are used, how are they provisions, etc. Uh and yeah definitely start definitely as a red teamer your kind of golden nugget is that if if you are able to find some of those um uh Kubernetes components exposed uh exposed to the public in there especially if uh if the if you identify that there's not not kind of this cloud uh provider managed kind of EKS automatic or uh AKS AKS automanaged uh environments in there. The more

manual those builds are, the more things can go can go wrong in there. Uh on the blue teamer side, um do me a favor, just deploy at least one policy out there. It doesn't need to be something that prevents stuff from happening. It doesn't need to break anything. Just deploy at least at least some uh some uh basic level policy which you think no one in their right mind will be doing. You'll you'll be surprised. uh if if you're using Kubernetes again I really do uh recommend you using stuff like Falco uh they really give you a good good set of tooling from that runtime analytics perspective pair that up with uh with something like kuberno they have

audit only policies that you can put in place you don't need to prevent anything but you get a very good list of things there's always something surprising even if you are using that cloud provider managed environments you know the first thing my clients always ask why is this cloud provider's own monitor ing agent running as root in here and you know then then we get to the good discussion there um and start logging if you're on Azure site specifically start uh turning on this uh graph activity log uh it's one of the it's not the same thing as graph logs it's not the same thing as activity activity log it's a thing of its own that actually prepackages all of

the different activities from uh from all of those different endpoints uh from all of the tooling perspective really great source of tenant level tenant uh monitoring for uh for your environments there. Um and do audit uh those service principles. I really do think most of those are unused and uh nowadays we have a maximum of two years of their lifetime but you know maybe maybe turn it turn it down a little bit more. Um and yeah all of us uh this is this is really something that we can we can all do. Do check if I actually have my logs turned on on my cloud environments and Azure side diagnostic settings etc. Um do check

those uh container images uh if you have some like credentials in in code do check if you are running any of those containers as root and you know just before you uh before you you know head home or before you head to the after party maybe log in uh log to the mobile version of uh of your dashboard there and do check if you you know can rotate one of those credentials you know what happens if I do this on you know Saturday night uh does does some something break and yeah Maybe just think about don't do it uh Saturday night, but uh once once you're back on Monday, uh go ahead go ahead and rotate

at least one set of credits for me. Um that's really it. Uh misconfigur misconfigurations are uh really bad. Uh cloud is not the same thing as anything that I see here in person with me and policy as code is your friend. That's uh that's really it. So thank you so much and if you have some time then it's a Q&A.

questions. >> As long as they until they kick me out. >> Yep. All right.

>> There's like users and service accounts, right? With this whole push for like AI and cloud, are AI identities treated differently or are they pretty much the same sort of ideas for like identity code? >> Great question. Uh for the recording, I'll just very quickly re recap and that helps me kind of also if I understood wrong wrong then you can correct me. So are uh service accounts any different when it comes to AI? Uh not really. Uh the same problems kind of uh when it comes to this chain of there is a system that uses a system that uses a system. If you do any of those parts uh custom uh if you have some token change that

you do yourself there those are the parts where you can do things wrong. If you have like an API gateway in front of your uh where wherever you host your model bedrock AI foundry any of these other uh other places um the same problems of uh tagging that identity like who is the actual caller and what are the permissions in my actual application uh and making making sure that uh we disconnect the control plane activities which is stuff like hey I can create a new completely new database or completely new cluster or delete all of these things from this application and specifically my AI application specific set of uh permissions in there. If you just go like automatically I have all

the all of the world permissions in the world, it's going to create your owner owner and you're going to use those over reuse those same machine identities for both creating your cluster uh creating your users and uh using letting your AI agent use that same uh same. That's not what you want to do. Great question. Thanks. Do we have time for one more? >> No. >> Oh, no. No. All right. Uh,

>> uh, we can talk about tooling as as I'm getting kic kicked out. Happy to name name drop some others here as well. But I think Scots suit is the one that I use there for multicloud just because it does cover it's not the most coverage or most depth but it does cover the most cloud. So that's that's great one to get started with. Yeah.

Yeah. Stop.