BSidesSF 2026 - Making WAF Mainstream: From Static Defenses to... (Roy Weisfeld, Surya Pentakota)

Name: BSidesSF 2026 - Making WAF Mainstream: From Static Defenses to... (Roy Weisfeld, Surya Pentakota)
Uploaded: 2026-05-12
Duration: 42 min 44 s
Description: Making WAF Mainstream: From Static Defenses to Living, Learning Protection Roy Weisfeld, Surya Pentakota Discover how to modernize WAF operations with data-driven insights, AI-assisted enforcement, and safe rollout playbooks—featuring a TikTok-scale case study. Leave ready to build dynamic, provab

BSidesSF42:4443 viewsPublished 2026-05Watch on YouTube ↗

Mentioned in this talk

Tools used

AWS WAF

Vendors

Akamai Cloudflare

About this talk

Making WAF Mainstream: From Static Defenses to Living, Learning Protection Roy Weisfeld, Surya Pentakota Discover how to modernize WAF operations with data-driven insights, AI-assisted enforcement, and safe rollout playbooks—featuring a TikTok-scale case study. Leave ready to build dynamic, provable, and cost-efficient web defenses that scale with your org. https://bsidessf2026.sched.com/event/17f8e979fa47cd5029550146b3a47ce5

Show transcript [en]

Welcome to uh BSides SF. This is theater 10. We are going to listen to a talk titled making WAF mainstream from from static defenses to living learning protection. So um the talk will be presented by Surya. And uh Surya is the cyber is a cyber security leader currently serving as the enterprise to edge security lead at ByteDance TikTok where he oversees security engineering automated controls validation and independent security testing. The talk will also be um presented by Roy. And Roy is an entrepreneur and the co-founder and CTO of Huskie's an early stage startup building a new digital ecosystem. So, that with a round of applause, let's welcome Surya and Roy. Take it away. Thank you.

All right. All right. Good afternoon, BSides. Really excited to be here. Uh first time presenting, so a bit excited. I'm not going to lie. Um I'm Roy Waisfeld. I'm the co-founder and CTO at Huskie's where we're building an edge security management platform. My background is many years as a web hacker and network devops infrastructure XA200. And uh yeah, I've been I've been doing this for a few years kind of uh bypassing WAF from the other side of the field and now I'm rebuilding. And one of the things that I like most is taking uh broken platform broken systems and kind of uh making them work. You don't need to always replace. You sometimes just need to fix

it. And that's uh kind of relevant to WAF. Um joining me on stage is Surya. Yeah, thanks Roy. Hey everyone, Surya Pentakota. Uh I represent TikTok. I've been there for 4 and 1/2 years now and uh WAF is one of the crucial areas of security for TikTok because we live and breathe at the edge and this is really an interesting challenge that we have solved together with Roy and Husky's team and uh really excited to introduce the new standard uh with all of you together today and uh looking forward for the presentation. Thank you. All right. My bad. All right. So, what will we're going to be covering a few things today. So, we're going to start with uh why WAF

frustrates everyone cuz it basically sucks today and why it sucks for engineers, for executives, and how uh the legacy approach currently is a pain. Uh what if WAF didn't suck, which is kind of the next step and that's kind of our goal here to make sure that you understand how it actually works well. How AI actually fits in cuz uh you can't talk about something without AI in 2026, but not as a buzzword, hopefully as something that's actually useful. Uh some real-world examples of using this unified framework and what you can achieve with that. And some clear practical takeaways cuz our goal is for you to actually learn from this talk and see what you can uh

implement either you're a small company, a large company, whatever it is that you have. Um let's make it better. All right. So, as a start, like mentioned, nobody likes WAF. It's the problem, which is also the title of this section, but also the actual uh reality of it. Um in today's world, 2026 keep in mind, right? These huge companies like Anthropic, McKinsey, they're being affected by things that could have been stopped by a WAF, by rate limits, by working DDoS protections, by you know, something that isn't just a static rule set. But even people just building out web servers at home, you know, if you go on Reddit and you go on Cloudflare, you're going to see lots of people

having issues with spikes and DDoS and rate limits. And this would be fine if fine was a building on fire. Which isn't fine. Obviously, now we have this particular log that's uh production log that comes out, right? A random post request. Literally, we're blocking an H checkout API, which was originally created by some random person called John, maybe for testing. Gets 403 forbidden error with a random IP address. So, looking at this, it's really impossible for us to understand, you know, what what the intent of this log is. And there's no business context. There's definitely no rule that helps us understand why this being blocked. And just a random log. And this has been pain for a long time. And with the help

of new framework, we'll walk through how we have solved this and how we have integrated it towards a better protective mechanisms. So, why engineers don't like like laugh? So, I've been complaining this for a long time now and I'm going to do one last time now with in front of you. Obviously, static rule sets. So, laugh comes with its own predefined static rule sets, which were pretty much not easy to map it against your infrastructure. Map it against your business. And that brings itself brings challenges when with legacy rule sets that keep on blocking life traffic that impacts business. Revenue loss, right? And there also visibility gaps that comes up with laugh because laugh being

a multi multi thread platform with tremendous amount of telemetry, tremendous amount of millions of logs. And it's really impossible for us to study those logs and come with a better detection and preventive mechanisms. And And with this new framework, what we have done is we have stepped a step ahead. We have also did predictive analysis. We have also came up with predictive security approach that we are going to walk you more. It's always at fault. Everyone wants to blame firewalls. Everyone wants to blame laugh. It's a common terminology in this technical world where if if there's a issue with a code, I've seen developers coming and ailing at laugh. You know, it's quite common. So,

when when there's an outage or when there's an an infrastructure issue, the first team to be hit out are obviously networking team and and thereby comes network security, WAF, and firewalls. Also, application context is missing. Definitely, with this huge telemetry of logs, it's really impossible for us to map it back to the systems, map back to the endpoints, tag it to the business units, understand the revenue flow, understand the generation, and it's it's a bigger risk, right? Because obviously, we see tons of attacks coming against an infrastructure, and it's really hard for us to prioritize which attack to be prioritized. And and coming knowing the application context, knowing the business awareness, it's pretty easy for us to understand the right

approach, understand the right prioritization, and take the right call. Lastly, and most importantly, while we work at scale, while we work at multiple WAF technologies out there in the market, it's really impossible for us to orchestrate and and fit one policy for all, right? And and that's been a challenge for a while. While traditional networking, traditional firewalls did solve it with a management plane, WAF is one area which is definitely hard to solve because there are multiple layers of WAF. There's on the CDN edge, there's on infrastructure edge, there's one right next to the application, there there's one right next to the load balancer. So, when it comes to defense-in-depth strategy, it's really important important for us to ensure that we have

the right policy deployed at the right place at the right time to ensure that we don't cause any chaos. So, these challenges are obviously chaos, right? And let's speak from the other side of the world. Why executives do not like WAF? Obviously, revenue impacts. If you look at the previous example that I shared, a checkout API for for a business has been blocked due to some random John adding a policy, and that's definitely a risk, right? And and if an executive get to know that an ex-employee named John added a policy that blocked a live traffic, it's it's a shame on on on how WAF has been operating. Blast radius, uh obviously not knowing the exact end

point where this attack started or attack impacted and what could go wrong is definitely one of the biggest challenges that we face. Ownership, finger-pointing is always pain and and obviously we paged wrong people at the wrong time because not knowing the context of particular log traffic or not knowing exact rule. So, it's quite common to page wake up wrong engineers at the wrong times which has been a pain as well and leadership always does not feel happy of involving more and more engineers that are not needed. Uh and one other important factor that executives do not like is are we blocking the live business traffic? And this this happened a lot of times where false positives always trigger business

loss and revenue-generating streams. And this is one other area why executives definitely do not like WAF and probably the first first shout that we hear is just shut down WAF. And from a true security practitioner, it's definitely not a common practice or a practice that we would want to see, right? Uh and lastly and more importantly, obviously this is much aligned lines of CFOs or aligned who manage financial records for the company and very importantly, leadership would always wanted to know what's the finance financial risk and how would we mitigate this list risk going forward and and definitely with technical chaos, there are also executive expensive chaos that we'll have to solve together. And this

is a typical challenge that the industry faces today because WAF is at the edge. Probably WAF is the first tool that for for a big tech service offering companies, customers reach out and this is first uh technology that's there out there. Yeah. So, today what we mentioned on the why why it's a problem and what's happening and kind of the missing visibility, this is how it looks like today, right? Either you have one CDN, one WAF, or you have multiple of them. You start with the traffic on the left, right? You have your Cloudflare zones or CloudFront on AWS, Akamai Edge DNS, all these different kind of CDN-facing the first thing that actually access the

traffic. And behind that, you have the WAF layers, rate limiting, DDoS protection shields. Behind that, you have the load balancer side, which is VPCs, ALBs, security groups, whatever it is that can actually uh manage and disable the allowing of the traffic itself other than WAF. And the last side is the virtual machines, kind of where your web server actually sits, where the compute layer is. Now, this is chaos just looking at this. And this is not even taking into account that these are four different dashboards that you need to look at, completely different context. The assets and configurations aren't as nicely mapped as this is, and this isn't even that nice of a mapping. And that

doesn't even take into account the different traffic, right? You have uh legitimate customers accessing services, you have automation, and in today's world, you have AI agents that might need to be able to access your apps. Maybe you have an e-commerce app and you want someone with ChatGPT to be able to just use an agent and buy a product. And that can be fine, but it gets lost in this in this context of what is blocked, where is it blocked, or what layer do you even need to look at when you try to investigate it, which is uh chaos. So, what are we suggesting? We're suggesting looking at this not as a WAF layer, but as an edge security layer,

and combining this into a world where WAF doesn't suck, which is new cuz that's not today. So, how does that look like? If you take into account the different layers, sorry for the eye chart if it's kind of hard to actually read this, but the goal is saying, "What if we could actually understand the different layers? What if we could actually look at the configurations between the CDN, the WAF, the load balancers, and the compute, and be able to map these out to things that actually make sense? To be able to see even in one field of view, what's happening and where? Are there any issues? Are there any gaps? Are there any exposures that are happening right

now? So, what WAF sees today is not this. WAF sees today that that log, that one you know, that one API request that was blocked and you can't really explain it. And that's all it sees, but the edge security is way more than that. And you can't really protect what you can't see. And what you're not seeing is that user who actually made that first request. Was it a bot? Was it an agent? Was it a person? Or a mobile device, desktop? These are all different scenarios. And then, did it hit DNS? Is the DNS working? Which that's a whole other can of worms that usually doesn't work. The CDN, how how are you actually

routing this? Does that make sense? Were they hitting it from the same endpoint? That context is missing. And the app itself, the business logic. Let's take for example, if you're being if you're blocking 3,000 requests a day, you're not going to see any of that context if you're just looking at the WAF layer. What if it was demystified? What if we could see all those configurations in just one layer and be able to understand, all right, there's a potential drift because this doesn't make sense that there's two different CDNs accessing the same load balancer or just different CDN records that are showing the the load balancer directly, allowing a bypass, or just some configurations that are on log mode. And

being able to see that on one layer makes a lot more sense than kind of the previous one that we showed you where you can't really even understand what the issue is and where it is. So, let's take you to why AI matters cuz obviously we have to talk about AI. So, let's take one scenario. Okay, you detect that there is an anomaly happening. There are 3,000 requests that are being blocked, some customers that are complaining, and you want to understand what's happening. So, you surface through the logs. You try to understand what happened. And you want to enrich what's happening because not only does it matter that that they're being blocked, but what is being

blocked? It's a booking endpoint. That sounds important. We're not going to be able to see that just through the logs. So, we want to take into the business context as well. And what inside that business context is also being blocked? What in the header is actually being hit or not hit? And how do we classify if it's a false positive or not? How do we create some sort of confidence score on top of that? So, if we take into account the configurations and we take into account where that traffic is coming from and what is actually being hit. And for example, we say that it's actually something related to an ad campaign or cookies. So, we can understand there's

business criticality not only on the endpoint, but on what is being blocked and the payloads that are being blocked. After doing that, these are just simple AI context enrichment. They're not something that's a lot that's really complex and they're more deterministic in a way and they create some sort of confidence score. But step four is creating an agentic flow cuz now you want to make something that's not deterministic that has a few different steps. So, let's say that we already understand what needs to happen and what change needs to happen. Now, we want to generate the remediation itself and we want to backtest that versus past traffic. We want to make sure that we affect as little legitimate traffic as

possible. So, we want to continuously test that traffic and see how much of the traffic is surfaced and if there's a good confidence in this change. And we want to iterate on that till we get to a point where there is a good chance of of only what we want to be allowed or be blocked. So, after we do that and we iterate on that loop, we basically create an agent that has an ability to both create graph rules and also assess past traffic and create a verdict of how confident it is in this change. So, we have finally our remediation that's very specific and fine-tuned and we want to deploy that to whatever WAF we have. Let's say for

example, Azure Front Door. We want to validate that it worked across all the different zones because maybe we have more than one deployment and we want to log that decision and notify whatever it is that's the owner of these apps. And we want this to happen automatic because you probably don't have just one WAF. You probably have multiple applications. And if your business is is obviously getting larger and getting more popular, you're going to have more than that. So, how do we make sure that this happens at scale? Cuz this is just one example and these happen all the time across different CDN providers, across different zones, different rules. Some of the companies we're protecting have 10,000 rules.

Doesn't make sense to do this one by one. This is why AI actually is necessary order to create that scale and something that's just not exactly deterministic. Okay, coming to TikTok's TikTok's case study. So, for those who are not aware of what TikTok is, it is the world's largest short-form media platform. And this is essentially the platform where the young generation are learning AI from. So, definitely interesting case study you're going to hear from me. Uh Uh it's obviously a global platform and we have users connecting from almost every corner of the world, right? There are around 1 billion monthly monthly active users with a billions of videos that are directly processed. We have around thousands of domains that we have

to protect globally. And last but not the least, we have multi-CDN infrastructure. So, the goal of having this multi-CDN is to ensure that we provide the best video quality, provide the best content to our users and and users get empowered. And definitely WAF always has been a problem and and we've solved it in at least the future in the recent days and we are trying to enhance it in the future and and drive things forward. So, the challenges These are the three main challenges that I've put together. Obviously, multi-CDN architecture. In order for us to be resilient, in order for us to bring the best content to the users, we'll have to believe we'll have to rely upon

multi-CDNs and and definitely multi-CDNs come with their own challenges. Multi-CDN comes with their own platforms and and operating at a scale has always been a challenge. The obviously global scale and we see traffic from almost every country in the world and every time zone and stability and the uptime of the back-end systems are really crucial for us to be successful and and we have did a great job in investing in our infrastructure, investing in our resiliency measures, but also we have did a great job in investing in our security defense posture. Lastly, dynamic attack surface. With the with the diversified traffic, with the diversified traffic coming from different regions, the attack surface is definitely dynamic

and we do have we see almost every new pattern of attack that industry sees when it comes to new API attacks, zero-day attacks and it's been pretty wild out there that we see day in and out and definitely it's a critical challenge for us to ensure that we fight these attacks and recently we have been seeing a lot of AI attacks and and have been leveraging AI to fight those attacks as a force multiplier. Uh with this unified approach, uh I'm I'm going to walk you through a few challenges that we were able to solve it before and after. Obviously, as I mentioned, we had multi-CDNs, but for this particular use case we had around four CDN consoles that have to be

opened to block an attack where we have seen like 10 TR files and we have to share a a lot of telemetry in between the operators of these four CDNs and also ensure that a notification has been has been sent out. But now, with the change where we were able to automate the whole policy deployment, one change fix all and we were able to make our AI learn about the policy structure of those multiple CDNs and automatically deploy a policy under 3 minutes. But prior to it, it was taking more than 30 minutes for us to tackle those multiple CDNs. Second, we were able to also reduce the error rates, right? Because for every live production rule deployment, it

comes with its own error rate and the positives are pretty common, and with false positives come business challenges, and then business loss, and and that's one other area that we were able to tackle it. We were not just able to automate the whole policy deployment, but we were also able to run it in log-only mode first. Read from the traffic, run a local AI agent where it could run the traffic and identify the patterns, identify the signals of false positives, and eliminate them before we enforce a policy to production. And one other interesting area is as I said, we receive diversified attacks, right? There are a lot of bots involved. And and bots definitely needed rate

limits, and right? And we were with the help of this unified policy, where we were not just able to tackle OAS top 10, but also were able to limit a lot of bot activities and and enhanced our playbooks by learning the new bots out there in the market, by learning the new scraping techniques, and also identifying the drift in detection, because bots are wild out there, right? If you if you block bot A with a detection detection technique A, it's going to come with detection technique B. So, being context aware and understanding from live traffic is one other areas where we were able to achieve success with this unified framework. Uh Last and very important, I'm sure most

of the security practitioners get questions from executive leadership on again, the executive leadership, each one of the executive would have their own set of questions, right? We we expect a lot of financial loss questions from CFOs, we expect a lot of attack pattern questions from CSOs. So, while tackling all the security challenges through AI, we were also able to build a custom AI report based on each and every attack that we have mitigated in a unified fashion, where the report is quickly customized for for the executives that ask a question. So, that's one other added advantage that we were able to get with the unified approach. So, I would want to walk you through one attack time

timeline. So, this is JS challenge for fingerprint cluster. Obviously, detection happened in less than a minute where we were able to identify a pattern of attacks in one of our domains. Within 2 minutes, we were able to map it to the exact foreign assets. Not just mapping, but we were also able to quickly gather similar hosts that would be potential vulnerable to such attacks and got the list ready out there and and also identified a lot of infrastructure dependencies all within 2 minutes. And within 5 minutes, the whole analysis has been completed by an AGI AI engine, which which also comes with a mitigation action. Whether we have to do rate limiting, whether we have to do targeted

block on particular JavaScript challenge. So, the analysis is done and the policy has already been tailored. Within 3 minutes, we deploy a policy and not just deploying, but we first deployed in a log only mode which have which is really crucial and study the traffic, understand that there are no false positives, understand that there's no business impact and ensured that we do a thorough quick analysis before uh back testing traffic as such. And last but not the least, this is an important step where we have enforced the policy deployment. After 3 minutes of thorough traffic analysis, after 3 minutes of coming with the policy enforcement, that's when we deploy the policy through multiple CDNs, through a

strict change management process to ensure that there are no false positives in the deployment as well. And yeah, within 12 minutes, the whole WAF deployment for a particular attack that happened with JS challenge fingerprint. This is definitely one of the best numbers in the industry and we we are definitely proud to ensure that we continuously emerge and continuously learn from it and drive maturity across the AI systems and infrastructure. So, I want to give a queen keen insight here, right? If you look into this, we are not replacing WAFs. We are not replacing CDNs. And WAF and CDNs have been there for more than 30 years. And they're here to stay. They're definitely not going away. What we have did is we

have added a layer layer of AI. And AI definitely helped us to be agentic in deploying the policies, to learning the traffic patterns, and reduce a lot of human intervention, right? And one thing which I keep on telling my team is that AI is not here to replace you, but it's it All of us should be using AI to to be force multiplier because traffic keeps increasing, the business keeps growing, and and definitely using of AI would definitely help us to mature our detection, response, and predictive analysis. That's essentially what we are trying to bring it to the notice here. So, and at the end of the day, all of it goes to business value, right?

Yeah. So, before we pop a bit under the hood and kind of like under explain the framework more, what's important to note is that probably most of you in the audience don't have the same scale, the same multi-CDN infrastructure that TikTok has, and the exact same pain points, but you probably have something on shorter scale. You probably have these issues even if you have one CDN and one WAF. Our goal here in this talk is to kind of explain how to start approaching this cuz what we're seeing in the wild is that many companies have kind of given up on their WAF or just don't do a lot with it. And there's just so much

more to be done with it. So, our goal here is that this is relevant to almost every company, even if you're a small SMB e-commerce that just has a web server, this is relevant for you. So, let's pop a bit under the hood and understand the framework that we're suggesting here. All right. So, it's composed of three main components, the data ingestion, the AI core, and the outputs and actions. And that's part of the intelligence architecture. So, we start with ingesting the context from the five different sources. We start with external intelligence, which is what attackers are doing today, how they can see your current exposures, what kind of is your digital footprint. Then the WAF

and CDN layer, which is how you're currently protecting your assets, your configurations, uh if this is AWS, Akamai, Cloudflare, things of that sorts, and how you're routing your traffic today. The third one being the cloud, which is where your business applications are, where your infrastructure is, kind of where what you're protecting is. The The fourth one being the traffic sources, cuz as mentioned, there are different traffic sources. There are There's mobile traffic, desktop traffic, agentic traffic, AI agents, bots, automations, maybe even internal services trying to access your services across your different clouds or across your different services. And each one of these is relevant context to understand how to actually handle that. And the fifth and final context is the security

context. Maybe you have a bug bounty program or penetration testing or some CNAP tool, which enriches your existing assets and configurations, and it's important to actually understand all these different contexts to be able to make it an informed decision. So, because this is a lot, and it's a lot of different sources that look completely different, we want to normalize that traffic in the kind of disparate streams in order to create something that's actually actionable. So, this is where we start using AI in order to correlate those different sources and to create something that makes sense. So, not only the CDN, the WAF layer, the routing, and also the business logic, but how do we make

something that we can connect between these layers? And the second step is analyzing that with an intelligence engine. Cuz okay, we have the context. We have a normalized layer. Now what? Now how do we make an informed decision? How do we make something that can actually analyze these different signals and create a result or at least start to create a result, which is the third section, which is the signal fusion and verdict engine. So, now that we have the context, we have the intelligence engine, and we're raising concerns and raising different findings, how do we create some sort of confidence scoring? Because these different sources can have different information, and not everything is 100% confident.

So, we want to be able to deduplicate and prioritize to understand what needs to be looked at now and create these different outputs. So, what are these outputs? The first thing is the unified security management, which is most of what we're talking about today is looking at this at the more holistic approach. The second is the inventory and ability to investigate something in deeper way because take for example the map that we showed previously between these different layers. If you know there's an issue in DNS or with the app, how do you get to the WAF configuration today? How do you get to the routing configuration? How do you understand which of these can lead to which other place? So, we need

to be able to take this correlation that was done with the AI layer and translate it back to a human way to create a real investigation. The third one being the posture and insights. What we're seeing across the board is that in some companies there are just some really specific business apps that are the most important ones and are protected in the best way, but this doesn't automatically get translated to the different zones, to the different applications. How can we get some sort of assessment and insights on our best protected assets and learn from that to our least protected assets and kind of translate it away. How can we create some sort of scoring because

there really is no assessment engine for WAF today? And the fourth thing is automation. So, attackers are increasing in their way to attack and I just saw a metric a few days ago on the the quickness of CVEs and zero days is down to minus 2.6 days, which is just crazy. So, we need to enable automation to make changes. We can't rely on these static rule sets. And in order to make an automation happen in WAF, which is a business critical asset because it can actually block your bottom line and legitimate customers, we need to safely deploy this. We need to backtest. We need to log first, get some context and understand what this rule may affect in

the future to enable safe automation and not just automation for automation's sake. And the final thing, which is kind of a happy side effect of this, is enabling migration because if you think about it, we've created a layer that consolidates the different configurations. We can understand that in a in a more global sense, in a more unified way, that means we could translate from anything to anything and allow migrations which wasn't really possible till now. And we'll touch on this in a future slide. Okay, so uh bear with me. Up till now, you've mostly heard about the concept of what we're of what we've been building so far about this framework and some of TikTok's current pain points. So, here

are some real examples of actually using this unified framework in the wild and how you can actually try to use this. So, the first example. We've heard from one of the companies that we're protecting that they were receiving issues from customers not being able to access their services and their application. They weren't sure where or what was happening. They just knew that there was an issue, which is a common case in WAF. And they knew that this was somehow related to one of their apps, but not sure exactly what the scale of it was. And our engine and in this unified view was able to pick up on, okay, what what was actually happening. And these are

the steps. Let me break it down for you. The first one, okay, so there was a spike in 400 responses on a booking endpoint. So far, so bad. Uh it wasn't correlated to any of specific attack patterns cuz then it would be easier. We'd just stop at step one, block it, and we're done. So, the second step is querying the actual logs. So, here's an example of a query you can use on Azure Front Door in order to actually understand what of the logs is being blocked by this specific case. And we were able to see that there were a lot of logs that were kind of being blocked by this uh in this specific scenario. The third is to

isolate the pattern cuz we want to understand how do we create a specific uh mitigation for this or an exclusion for it. And what we were able to see with our engine was that the requests that were being blocked were on the API endpoint of bookings and tickets. So, these are business critical. If you recall that we're our goal is to also assess the context of the business applications and not just and just look at the WAF logs. And also, they were cookies that contained two hyphens, which uh for those of you that are familiar with SQL injection, are usually used to comment out uh the rest of the line and inject some sort of SQL into

that, which makes sense, but not to just block two hyphens everywhere. And when we enrich that specific case, we were able to see that it was a meta pixel from an ad campaign that generated a campaign ID, and that can include two hyphens. So, when you take all this into account, you can easily build a remediation that just excludes two hyphens in this case of the cookie. And this can be done in minutes and not in hours or days if you even understand that the issue is the WAF. And that's kind of the goal with this framework. The second example, and sorry for the eye chart if it's not really easy to to see what's going on here, but the goal

is to say, all right, this is the ingress till the application traffic. These are the different endpoints that's passing through. In this case, it's Cloudflare to AWS. And inside that, we can pretty much easily see there's an issue. We're going to dive into what this issue is and how we find that. But the goal is first surfacing things quickly and easily, so they're explainable. So, it's not as scary as just what WAF is today. And the step-by-step of this is first assessing the assets and configurations, correlating the CDN, the WAF, the cloud. This is why we're collecting at from different sources, mapping the upstream to downstream. And then the second step is the network flow itself. All right,

what of this is publicly accessible from the ingress till the application? Cuz there's an intended way, and then there's what is actually happening. The third one is the desired path. What do we actually want to happen? Cuz if we go back to this graph, we probably want it left to right. We want the CDN, the WAF, the rules, and then the load balancer and the application itself. But what's actually happening is you can access that load balancer from the internet directly, bypassing all this WAF protection. And that's easy to see here, less easy to see when you have different dashboards. The fourth one, that's an optional step, is can we rebuild this path with different configurations, different

assets? Can we remove some of these parts and actually make it slimmer? It's not really possible when you don't know what you're looking at. Now, the last point is about the vendor lock-in problem. And as promised that we were going to talk about this in a later slide, this is that later slide. So, the reality of multi-CDN WAF management is this. Maybe you're a company that just acquired a different company, or you're going to be acquired in a few weeks, and now your one CDN, it turns out to be two CDNs cuz you now have to take responsibility for a completely new framework. And either you're dealing with this today, or you're going to deal

with this in the future. But then, things like this, things like this WAF rule that on the top left, the AWS WAF rule for just blocking a specific admin route, which is eight lines. Having that same exact rule on Cloudflare is going to be two lines. Does that make sense? Probably not. Now, if you think about scale, these are just two specific CDNs and WAFs. What if it's three? What if it's four different ones? What if it's 500 rules that need to be manually rewritten in order to actually do this migration? So, what happened with creating this unified framework is we basically created an ability to migrate and to get out of the vendor lock.

And we've created an ability to extract the configurations from, let's say, Akamai, translate that into a unified WAF rule, and then be able to translate that to a different provider, let's say, Cloudflare. Now, what happens is the result of that is some of these rules can be automatically translated, let's say, half of them. Some of these require reviewing them. But that's a lot better than if we take you back to 500 rules. You can actually manage this. You can make this happen. And when we talked about this with other companies, we're seeing projects that span years to create this. It doesn't make sense in today that that this takes so long. And there's so much more. There's so

many issues today with WAF, with CDN, with load balancers, with just even seeing and understanding what's going on, with cost management. There's just so many different issues that we we're just not dealing with today because we can't see it. We can't understand what's going on. So, if there's only a few things that you can take from this talk, that's going to be this section, which is the key takeaways and what can you actually implement and do today, other than understanding that we need to change something with WAF. So, these are the are the things to remember. So, this is an important part again, just uh making sure that you're with me. All right. So, the key takeaways. If

you're an engineer, you should try this. Look at every rule that you have that's currently on skip or bypass. Can you explain why each one of these are there? Can you remove them? What are the top 10 routes that are currently being blocked today, if you look at the traffic breakdown? Does that make sense? What we're seeing in the wild is that usually some of these may surprise you. And the third thing is, before moving to block, test the traffic. Look at it for the previous traffic. What can potentially be blocked and how do we move from log to block mode? Yeah, coming to the CISO, which is definitely the most uh critical job here. Uh

three questions that uh definitely need to ask is, how do we measure the effectiveness of our edge? Um the days of WAF are gone and and edge is the new term. So, definitely uh if you're a CISO, please ensure that uh this question is asked. Uh second question is how do we analyze the traffic patterns to identify possible business losses. So, the more critical the business is, the more enhanced the protection of WAF need to be. Uh this is one of the important question to be asked. Third and very important, how do we improve efficiency within our teams without bringing in more manpower? So, use of AI, use of automation, use of agentic AI platforms. Uh it's really possible for

us to uh protect and do the best without bringing in more and more people and making it more chaotic. And if some of you are either head of infrastructure or just in charge of the infrastructure, you don't have to be a head of, these are some controls that you should try out. Uh do you need all your existing rules? Not sure if you're aware of this, but in some of the platforms, you're actually paying for each of these rules. And maybe at the tier that you're currently on just doesn't justify what you're doing with it. The second one, which is related to this, when did you last assess your WAF? You should probably do that. You should

probably look at it and not just once a year, not just twice a year, cuz again, this is a business-critical platform and framework. The third one is, are you manually creating rules today? Are you moving them from log to block mode in a process that takes weeks or months? You should automate this. And more than should, you need to automate this, cuz attackers are not going to wait for that, even if it's 2 days of log mode. Attackers are going to attack you in minutes. So, you should make changes happen faster. That's it for our talk. Um few minutes of questions, right? Cool. Thank you. So much um amazing talk on WAF. Yes. Yes questions.

Um so, today's vendors uh such Akamai, Cloudflare, um as long as you're behind their service, it seems like the rules are in in force, but once you remove that layer, you're no longer protected. Are you doing more in this case? Are are you actually letting the the customers keep their rules as they move around? Is that what you meant by lack of vendor lock-in? So, I'll answer this in in two different layers. So, the first layer being that you're protected if you're using their services. That's true, but are you actually using them correctly? Can they like in our example, are they able are people able to access your services by bypassing Cloudflare? Cuz your computer's probably not on Cloudflare or

on Akamai itself. And then you need to understand if that layer is protected or not. So, even if you buy Cloudflare, it doesn't mean it's actually doing its job. And by moving between vendors, what happened in one of the cases that we saw was that there was a company that was moving between two different vendors. They were moving from Akamai to from AWS to GCP. And what happened was they they copy all their rules. So, during that time, they were completely unprotected. So, you need to copy over those rules and do that in a managed way, which again, there's just no tooling for this. So, you have a gap in your protection even for a short amount of time.

Attackers don't care that you're just in the middle of a migration process. They're still going to attack you. Does that answer your question? Also, just to add a point here, right? And definitely, it's impossible for us to hatch all eggs in one bucket. So, it's encouraged to have defense in depth in place. Obviously, we have CDNs, we have commercial vendors out there, but it's truly important for us to have our own protections in our infrastructure to ensure that we have hold of our security posture rather than relying upon vendors and waiting for SLAs and then calling them, right? That's one of the important learning that I've learned through the years. Wonderful. We have online questions.

Again, if you want to post your question online, you can go to www.besidessf.org/q, the letter n, the letter a. Um here we have two questions. Number one, in step three of the verdict, how is the confidence score calculated? Can you please share what factors of influence the calculation of this score um is, I guess. That's what All right. Sure. So, given into the different context points that we have and traffic itself, so we assess the amount of traffic. Is this a lot of traffic that's being blocked by this current rule? Is this a common or manageable set? Cuz we are seeing that in manageable sets case, in custom cases that you have an application that's a

bit more complex, the confidence there is a lot lower cuz that means it's not something that you built, it's something that you're just using for an endpoint. And is this on a specific endpoint? So, for example, if it's a managed rule set on the whole application, there's a lot more place for shenanigans than if it's a custom rule on a specific endpoint and just a small amount of logs. So, we take into account these different uh parameters and also the freshness of a rule and configuration change. When was it last changed? Uh was Is there someone that is actually actively making changes or was this done a few years ago? So, these different uh signals allow to

create some sort of confidence score of how relevant is this rule, how much is it affecting the traffic versus the breakdown of traffic, and where is that traffic coming from? Wonderful. Thank you. Um question number two, again online. How many engineers are required to make the AI-assisted WAF functional and deploy it in production? So, so that depends on on what uh the scale is, the amount of CDNs and and configurations, how custom they are or specifically are. You can start out by uh and this is something that I've been working a lot on myself on the personal time as well, but creating skills and CLI tools to access the different services. It's not just the MCP provided

by that vendor. And it really does matter how complex your infrastructure is, your application is. Again, I think the whole goal of this talk is talking about how it needs to be more taken more care of and needs more specifics cuz just using whatever the vendor says is the best practice on WAF specifically does not take into account your infrastructure, does not take into account your business logic and context, and that is something that only you know. So, you can start with one engineer, build some tooling around it, some more understanding, start with the visibility cuz if you don't see it, you can't protect it. And that's something that you can do in just pretty easily as

a start if it's just one CDN. If it's multiple, that that takes time. And uh I I don't think it's just an amount of people more than it's the time and understanding and expertise cuz if you understand WAF, and there's not for some reason, there's just not a lot of people that understand how WAF works and what is actually needed from it. So, if you have one engineer that really understands WAF and they know they know your business context, you can probably start by building this out. Thank you. Um do we have any other questions? We still have a few minutes to burn, so if you um, have a question please uh, let it known. Okay, awesome. So

again, a round of applause for Roy and Suriya. Wonderful. Thank you again for this talk. Amazing. Thank you. Just one uh, shameless plug. Uh, if anyone in your connections are looking for impactful job or looking for a change, TikTok is always hiring. Please stay connected and uh, uh, more than happy to navigate through the process. Thank you. Thank you, Suriya. Thank you.

BSidesSF 2026 - Making WAF Mainstream: From Static Defenses to... (Roy Weisfeld, Surya Pentakota)

Related talks