Secure on the Whiteboard; Broken in Production

Name: Secure on the Whiteboard; Broken in Production
Uploaded: 2026-04-27
Duration: 30 min 1 s
Description: Cloud security architectures often fail in production not due to code bugs, but because design assumptions don't account for operational reality: asymmetric routing, shared services, temporary fixes becoming permanent, and multiple traffic paths bypassing controls. This talk explores five common gap

BSides Charlotte · 202630:018 viewsPublished 2026-04Watch on YouTube ↗

Speakers

Srija Allam

Tags

CategoryTechnical

TopicCloud IAM Container Security Threat Modeling

StyleTalk

Mentioned in this talk

Tools used

kubectl

Platforms

Kubernetes

About this talk

Cloud security architectures often fail in production not due to code bugs, but because design assumptions don't account for operational reality: asymmetric routing, shared services, temporary fixes becoming permanent, and multiple traffic paths bypassing controls. This talk explores five common gaps between whiteboard designs and runtime behavior, demonstrates how attackers exploit these gaps in Kubernetes environments through service spoofing, and shows how zero-trust principles—cryptographic verification at every hop, east-west segmentation, and workload identity—can close them.

Show original YouTube description

Srija Allam presented her talk "Secure on the Whiteboard; Broken in Production" live at Bsides Charlotte on March 28, 2026. https://bsidesclt.org/ "Most cloud security designs look solid on a diagram. Traffic flows are clean, controls are layered, and everything feels well thought out. Then production happens — scale kicks in, exceptions pile up, routing gets weird, and suddenly the system behaves very differently than anyone intended. In this talk, we’ll look at how cloud security architectures quietly break once they meet real traffic and real operations. Not because teams are careless, but because production introduces things diagrams don’t show: asymmetric routing, shared services, temporary fixes that become permanent, and control paths no one remembered to threat model. Using familiar cloud scenarios and a short demo, I’ll show how attackers take advantage of these gaps simply by following the actual runtime behavior instead of the designed one. No zero-days, no malware just understanding where security controls stop being in the path. This session is about learning to think beyond the diagram. Attendees will walk away with a better instinct for spotting architectural drift, asking the right “what actually happens?” questions, and designing cloud security that holds up when things get messy because they always do."

Show transcript [en]

Go. >> Hello everyone. I'm Sria Alam and I'm a cloud security architect at Forinet. I do cloud security with enterprise, medium and small customers every day. Um, and given my experience, I wanted to talk more about how systems look secure on whiteboard but they are broken in production. So modern applications today are no longer monolithic. We all know because of AI, um, microservices, Kubernetes, several of these, right? They are all distributed, dynamic and highly interconnected systems, they are usually built on microservices and APIs. For example, like cloud applications. Um, in my experience, right, they are very ephemeral. Things change, virtual machines come up, they get deleted, um, they get restarted, etc. So, everything is constantly being changed. And these

environments we rely heavily on gateways like it could be a web application firewall, it could be a next generation firewall. Um it could be any of this you know gateways that is uh built and is sitting at the edge. Um but we also use identity layers when we design an architecture. Um and sometimes we do segmentation to enforce security. So it could be uh maybe in a Kubernetes environment it could be a part-to-part communication or in a cloud environment it could be a virtual machine to virtual machine or application to application in a serverless architecture right could be a lambda function anything agent to agent in AI um and that's where we enforce segmentation right to secure

these environments but here is the problem I always um see and I really want to explore today is systems that look secure on whiteboard or in the architecture diagram often fail in production and the reason behind it is because of not because of the bugs or uh in the code but it's really because of our assumptions um and um that's called the whiteboard lie right um and just a simple agenda here first I will we'll walk through um the assumptions behind what we call the secure design um and the next thing I would like to bring in is there are five gaps between what the design architecture is and the reality when we deploy those architectures in the real

world or in production and I have also prepared a small demo with Kubernetes architecture here um and I'm going to show how internal trust can be abused and finally we'll just end it with a zero trust um uh points like how zero trust changes that model um and these five gaps here what I'm mentioning today are just from my experience experience. So there might be things um that might be uh that might be that might exist still uh not just these five gaps. So um but these are something that I wanted to show and present to you all um depending again on my experience. Um so um usually what the illusion of secure design is that when we design

systems everything looks clean and controlled right when we have an architecture diagram everything has a path it follows a path it looks controlled um we design gateways at the edge we always say something is internal internal is always secure in our opinion so that means we assume that controls are enforced consistently right at the ingress right at the address. But production systems do not usually follow diagrams. What they follow is what is reachable. That means there might exist multiple paths or multiple um places where the traffic can go. Or the other thing is that um we always check what is uh where uh we check that if we have to authorize something we check if the

authorization exists but um we do not check where it is coming from or we could say who is sending it right it could be an attacker who is sending that authorization imposing the system but we usually do not check that. So that's the illusion of a secure design. So all of us here might have designed something secure but we might also have seen something to be failed. Um imagine say if we are basically um running some firmware updates there might be failures right like um during the firmware upgrade but we designed it to upgrade it uh continuously come up uh restart um in a more secure way but there might be ways where during the upgrades the

security um is not enforced or it might be flawed and those few seconds might be enough for an attacker uh to take control. So the real gap is where uh attackers are not assuming control because of some breached uh environment controls but it's more because of the assumptions that we make. So in this diagram right just the perfect architecture example and again um please bear with me this is just a simple perfect architecture this could be an AI system and of course this is not how simple it's in production because we have several components but because of this talk I just wanted to keep it simple uh so that you're all align uh with what I'm I'm sharing here today. So

users uh we have users they use client like a mobile or a laptop whatever the client is and the traffic is enforced through a gateway. Again this gateway could be a Kubernetes API gateway it could be um an MCP tool it could be an um next generation firewall web application firewall given or take any um router or you know any of these sitting at the edge. The internal side is where an API is uh is sitting right like it could be an API management tool an API that is built and then there is database where the API is retrieving those information from uh from that database. So here API is basically a trusted source or trusted component

whereas the database is an internal component right um and everything looks secure because it flows through a path um and everything and the controls are applied at the edge. However, if I go to my next slide here, there might be an unintended path, meaning the gateway can be bypassed. Um, the internal service um might be bypassed. So, the user could be there are good users and bad users, right? So, users might be able to a bad user might be able to get uh control of this database and retrieve sensitive data. So, there is always an intended path that exists. So this is not just limited to an unintended path. Um it could be something where we fail to um

enforce controls or we fail to verify uh verify where this is where a request is coming from continuously. So that's the reality. um and what is the whiteboard lie right given okay we we've seen this right so what we assume it's not just again um only with the unintended path that I've shown in the diagram um but there might be where we assume traffic follows intended paths we assume controls cannot be bypassed we always assume internal is safe and internal equals trusted um and also when we design an architecture we we start with something small and we scale it eventually So um that and we assume that scale doesn't introduce risks sometimes or we overlook how scale might um might um you

know introduce security flaws. So these are all again um all of these controls of paths are usually not guaranteed and they are simply design assumptions. Now um what I um found is that every architecture usually relies on assumptions like all traffic again goes through the gateway or internal systems and just I want to reiterate is that attackers don't break systems but they break assumptions right um and once this assumption fails the entire trust model is collapsed that means um a breach happens and then the architecture um can be controlled right and gaps right But there are few gaps like I mentioned the difference between the reality and the actual architecture um when we when we

move it into production is that identity is validated once at the edge. We don't check it continuously once it's internal. So we are only doing it once at the edge and most of the times there might be tokens that we generate either it could be an API token or it could be a username password it could be a service account um password that we generate but and then we reuse these AC across multiple contexts and that's a problem because once a token or a generation is compromised that means an attacker can get hold of the entire system where those are being used. Um and also the context thing which we are usually um the tokens are not tied to a

workload identity or context but this is something which we again fall back right like these are not contextbased uh tokens. Um and we also allow identity to be used across multiple paths. So there might be again multiple ways of communication or traffic flow to a certain database but we might enforce security across one. Um and we assume that the security can be reused on from one path to another but not really that again is an attack surface and authentication will tell us who but we usually um do not verify if it's coming from a certain source or where is the source being originated. So if we if someone will uh assume their role um as

a certain identified or trusted boundary that means the system will allow it. So usually what we are not seeing here is we are not cryptographically verifying where the request is coming from or the identity is coming from from which context. So that's one of the gap and the second gap is that um imagine in a kubernetes environment again here there are multiple paths right one is the traffic can be sent through an ingress the traffic can be uh flowed through a cluster IP or there also exists part-to-part communication and if we apply security on one path one intended path it doesn't apply to all the paths because again multiple paths exist beyond these diagrams um given this

micros service how to get to this micros service. Um and if a path exists it becomes an attack path. So um this dotted lines maybe we think of this dotted line on a diagram right there may be a solid line that is showing a communication path. There are also dotted lines and we usually um assume some security controls. it's the same over the uh over the uh line um a straight line and also a dotted line but actually not. So we need to make sure that all of the communications are secured irrespective of how they are configured right um so if a back path exists again it becomes an attack path and that's something which we need to

secure. The next thing is the controls enforcement. Um, usually security is is as often applied only at the gateway. Um, like I said, like I mentioned, right, web application firewall is a great example where we enforce profiles or protection profiles at the entry. Um and then once the traffic leaves the gateway which is internal only um we might not be we might not be checking where uh that traffic or the source of the traffic continuously or we are not applying authentication continuously. So security is applied only at entry points and sometimes again in this web application firewall example right um the backend might be uh reachable directly reachable that means someone can bypass this faf

and then get access to that database in the back end um that's because of not because like um you know and this again this is not this doesn't exist everywhere we do lock down the back end to only listen or to only enable forward the traffic from the front end which is the web application firewall. Um but sometimes we again fail because there are multiple paths that exist. So um the other thing is that if traffic bypasses the gateway, we also cannot implement the controls because we apply the controls at the web application firewall level. But what if someone is bypassing the gateway? That means we do not have anything enforced at the destination level which in this case is a database.

we are just allowing um that to be forwarded. So we we we are basically opening up an attack surface again. So security becomes path dependent in this case instead of systemwide. So that's something which we need to keep in mind when we are designing an architecture. The next thing is scale break security. This is another thing given or AI or you know the world has taken a turn right and given the cloud applications um everything is changing everything grows in time very scalable architectures uh in cloud for example because that's where I uh my most day-to-day is what I see is that we start with something small we solve a problem for something small and then eventually we increase

this to scale so um and we assume the security that we apply at a small part of this architecture will also apply to this greater scaled architecture. But um that might not be true because again whatever the security that applies to a simple virtual machine might not be we may not be able to apply when that virtual machine is part of the huge uh bigger context or a huge uh landing architecture. So controls do not scale with infrastructure. That means we cannot overlook controls when we scale the architecture. And we also need to take into consideration that as we scale um we also need to scale the security and apply controls. Um and one more

problem is that when we start with an architecture we and we design something small um the security works great but as we grow there might be places where um the visibility is out of control. Of course, there are so many tools like uh cloud uh the posture management cloud uh security posture management tools or it could be some other posture management tools but visibility in cloud environments is really hard or even in Kubernetes where these are very ephemeral right the parts get created deleted uh the visibility decreases as uh systems grow so security is not consistently applied at scale so we need to keep in mind that we design security for a huge bigger picture than just

concentrating on that uh small problem-solving architecture that we start with. Um the next thing is the humans gap. Humans break everything. Um which is true because in an IBM study interesting fact is that uh 95% of the attacks or breaches happen because of humans decision. Human decision is basically do we have to click a link or not? and we end up and most of the times humans um you know make a decision. So the study shows that 95% of attacks are are are based on humans. Um the breaches happen because of humans decisions. I mean and if we could avoid that somehow right um 19 out of 20 breaches can be avoided. So um what happens again here

is that sometimes temporary access temporary access becomes permanent and sometimes the we are troubleshooting debugging a system failure or system is down um and over time there is a there is a permission sprawl so no one will go back um um and then you know clear out those permissions or sometimes we just leave it because we can come back and maybe reuse it in two weeks or 3 weeks and we forget right um so there is a privilege um uh creep there which increases over time and actually I just wanted to share an example here is that in one of my jobs um this is a simple example again I was part of the u a team and then I

moved to a different team um I was part of the distribution list email distribution list but I was never removed from the distribution list even after I uh I moved right like to a different team so two years into the new team I still was seeing the emails from the previous team so This is just a simple example of again how this privilege grip happens and how uh permission sprawl um could grow over time. So it's not just about um employees changing the teams but also employees moving from one job to another and sometimes you know misconfiguration again temporary access all of this become uh permanent because of the um again debug paths uh troubleshooting

etc. So these small changes will accumulate over time into significant exposure. So when we design the architecture again we do not assume um how humans may might break everything. So this is an other uh gap that we can we can think about. All right. So um um I prepared a very small demo here and um hopefully the demo uh can be completed. Again, this is no way related to how production systems are, but this is to show how a trust can be abused uh in a way, right? Um and this could apply or scale to um a a bigger architecture or it could be in an AI system or it could be an on-pre application that is

sitting or a data center wherever your application is in the cloud etc. So the demo setup is this way where I have users, I have a client which is my laptop. Um and then there is a gateway. This is a Kubernetes gateway and everything is running in a Kubernetes architecture. So I have a pod for a gateway. Um and this is exposed through a deployment. And then I have a service A and service B. Here service A is a is a trusted service. Um that means the traffic can be sent from the gateway through a service to service A and service B um is an internal only service. Again stressing out here is internal only service. That means

service A can only reach to service B uh but nothing else. So if a user wants to access service B, it has to flow from gateway through service A to service B. However, the unintended path that I was mentioning earlier, um, users might be able to access service B because of that, uh, breaking of trust. So, um, let's go back and I'm going to show my, um, slide here. Let me get this terminals up. So, I prepared again um, this demo already. So, if I do cube cuddle get pods, like I mentioned, there is a gateway um, there is a service A, there's a service B, right? Service B is trusted or internal only service.

Service A is trusted service and gateway is um at the sitting at the edge and we are exposing all of this through um through a deployment here. So um I'm just going to uh do some port forwarding here um to that service gateway on port 8080. So if I start here um it is listening on port 8080. Everything gets forwarded to that um gateway. Now um on my right side right um I'm just going to run um u an API call here. So if I'm running um an API call here and this API by the way is protected with some kind of authentication in this case a bearer token right. So what my system expects

or a gateway expects is that if I make an API call without that authentication header it's going to forbid that traffic. But if I add that authentication header that means it's going to allow the traffic to go uh to that um backend API and it will retrieve the information from my database right so if I run this of course I do I'm an error unauthorized and we also can see that there was this handling connection for80 and it's an unauthorized again but if I actually run over with that authorization header and it says bearer valid token that means I should be allowed so I can see some sensitive data um and then it says internal data. So

I'm able to retrieve that secret here. Now imagine um this is great right we have authentication headers but somehow let me do this foothold and then say um service A is compromised meaning I was able to compromise the gateway and I got into service A um no so I'm going to run into this um cube cuttle um run this command to get into the service a pod. So if you see here deployment service a.sh this is a command to get into to log into the pod right and I'm at the u at the pod level here. Uh now if I run again and um let me bring my API call here. So if I run to service B because

only service B expects that it can um it can see the traffic from service A right and I'm an attacker now. I got hold of the service A pod and I retrieve the data. It I see error forbidden. This is this looks great. This is exactly how it's designed. Um that you know if we do not supply authentication header here, it breaks at the gateway level. But at the same time, service B is only allowed to send to receive the traffic from service A. So um it's not it's seeing that you know it's not as um the service A is the traffic doesn't come from service A. So it's giving us that error forbidden. Now let me say I as an

attacker run this service X service name. This is a header that I'm using to say impersonate as service A, right? Um and then run retrieve the data from B. Then I'm getting that internal data again. So I'm able to retrieve that data. So I actually didn't exploit any vulnerability here, but I'm literally breaking the trust. So service B is allowed to receive the traffic from service A and that's what I impersonated and I got access. Imagine how if these service B's are multiple or multiple microservices with having sensitive data and if you assume um your request coming from a certain place that means um we can break the trust right so what I'm seeing here is I am not verifying the

context I'm only verifying what it looks like or where it's coming from but I'm not checking the service B is not checking who it is actually coming from so we are not cryptographic ically verifying u if the request is actually coming from service A um or if someone is impersonating service A right so that is the problem um that means like again um we it looked secure on whiteboard in this diagram architecture if I didn't have this dotted line here everything looks great like there's a gateway service A service B A to B communication goes through the gateway but what I did is I broke that trust and then service u I I compromised service A and I was able

to um get to service B um and service B didn't verify if the request is coming from the gate service A gateway through service B or if someone is directly impersonating uh to be a service A and sending the traffic to service B. Now just to reiterate right the gateway enforced authentication the backend enforced internal only access works great. Both controls were working as designed because we did see a forbidden when we make that API call from service a but we access the backend through a different path. We said um the x the header the valid trust signal that we used we impersonated to be a service a and the it worked. We were able to um

get that data. Now how do we solve this problem right like you know we can see this on the NIST um um blogs or NIST white papers that they mention about zero trust. So um zero trust is where the real meaning is that verify every request that is coming um and we also make sure we bind that to a context. So we are also checking where the request is coming from who it's coming from. So there needs to be an identification of where the request is actually originating and we move from um you know verify but trust but we are like no implicit trust right we are basically uh checking every request that that is

coming in we are authenticating every request that's coming in um and also east west segmentation or east west um inspection is also highly um important when we implement zero trust so zero trust is the way to go um in order to overcome these kind of challenges. It's a lot to implement in production but you know um once we have real baseline then we can improvise over time and uh we can we can get that uh zero trust based architectures. So what I want to say is that that security that survives reality right like so um identity um is always important than network like we can do again like I mentioned in in case of Kubernetes for example we can do MTLS or

we can create service uh accounts and there needs to be a contextbased workload identity and we are not just relying on IPs or headers right um so we do need to make sure that we we um understand that MTLS based or we do that workload identity in order to overcome these kind of trust based challenges and every hop has to be verified. So we don't limit the controls just at the edge level or the entry point but we are authenticating and authorizing every service um as it also the traffic flows through internal internal side right so any internal to internal uh requests part-to-art communication or in AI agent to agent communication has to be authenticated and authorized um the next

thing is the path based uh security right we need to apply those controls and again we need to verify those controls if the controls exists as security in every path um that exists or every path that we can reach this uh backend destination um whatever we would like to protect right sensitive data um we need to make sure that every path is controlled and um and controls are applied and the next thing is the east west inspection right so if we have um sorry agent to agent uh inspection that we need to do like or um virtual machine to virtual machine inspection, application to application inspection. We also need to inspect the traffic. Um so that's again an a way to enforce

policies and get that zero trust. Um because again just like we've seen if one if an attacker gets hold of that assumption and then gets into the system internal system that means they can collapse the entire system because they can get access to all of these internal services in stealth mode. Um and we can implicate the implicit trust. So in my demo if I go back we are relying on some headers we are relying on DNS because we in Kubernetes service A can access service B. So we are basically uh relying on that implicit trust and we are also implying on u working on trusted services that needs to be changed. Um so we are not trusting

anything even if the even if it's the internal traffic internal only traffic or outbound inbound uh external traffic we are not just uh trusting blindly but we are basically u eliminating that implicit trust. Again goes back to my point of where we need to authenticate and authorize every request that's coming in. Um we can we can assume breach like we assume breach when we when we design right like we designed to limit blast radius if given if there is a breach then we ass we can we can also we need to also think how much is the blast radius how what could the impact of this attack be what could be the length of this attack be and then we

also need to make sure we prevent lateral movement. So in my case again in the demo again if we get to access service A if service A is uh hacked and um you know somebody was able to access service A we should the service B should not be uh reachable directly right or if this service C D or you know several other services um we need to prevent the lateral movement um to and so that we can reduce the blast radius. Um so what I want to conclude is that security is not usually what we design on the architecture again but we need to think of what we enforce continuously how we improvise how we can do it at scale and

also how we can um enforce it when uh systems change or you know this ephemeral uh systems get um deleted added etc. and also how um we can limit to human uh decisions or how we can decrease when um humans as make bad decisions. Thank you so much for watching um and have a nice day.

Secure on the Whiteboard; Broken in Production

Related talks