Contextual Agentic Garbage Collector: Sweeping Up Misconfiguration Crumbs Before Attackers Do

Name: Contextual Agentic Garbage Collector: Sweeping Up Misconfiguration Crumbs Before Attackers Do
Uploaded: 2025-10-27
Duration: 44 min 57 s
Description: BSides Edmonton 2025 This video was captured using a locked-down, unmanned camera. As a result, there may be moments when speakers are not fully in the camera shot. Additionally, the audio quality captured by the podium microphone is dependent on the proximity of the speaker to the mic. This means

BSides Edmonton · 202544:573 viewsPublished 2025-10Watch on YouTube ↗

Speakers

Naor Kalbo

Tags

CategoryTechnical

StyleTalk

About this talk

BSides Edmonton 2025 This video was captured using a locked-down, unmanned camera. As a result, there may be moments when speakers are not fully in the camera shot. Additionally, the audio quality captured by the podium microphone is dependent on the proximity of the speaker to the mic. This means that variations in audio clarity may occur if the speaker moves away from the microphone during their presentation. We appreciate your understanding of these technical aspects. ___________________________________________________________________________________________________ Contextual Agentic Garbage Collector: Sweeping Up Misconfiguration Crumbs Before Attackers Do by Naor Kalbo Security policies and engines today are complex ecosystems - not just defined by rigid, structured configurations but also by many layers of free-text and meta-data that tell the real story behind a rule or a configuration. Traditional configuration analysis can flag obvious issues, yet it often overlooks the contextual “crumbs” left behind by temporary fixes, ad-hoc tweaks, or legacy testing exercises. These remnants, much like orphaned data in a software system, can create unexpected vulnerabilities that attackers are eager to exploit. In this talk, I introduce a pioneering AI-driven strategy inspired by the concept of a software Garbage Collector. Just as a Garbage Collector continuously cleans up memory leaks, our AI agent proactively sifts through the meta-configuration, analyzing unstructured fields—free-text names, descriptions, annotations, and even multilingual notes—to identify contextual misconfigurations before they evolve into high-risk liabilities. Advanced NLP is key: semantic analysis decodes the intent behind free-text entries, while contextual classification organizes policy components by interpreting diverse cues. This integrated approach detects anomalies that traditional tools may miss, shifting policy management from static settings to dynamic, context-rich narratives that reveal hidden security gaps.

Show transcript [en]

[Music] So hello everyone. Uh thanks for the kind intro. So my name is No Calu. I'm a principal researcher at Kato Networks and today I'm excited to talk about what I like to call the contextual agentic garbage collector. Well, think of it as a smart sweeper that help us clean both minor and major misconfiguration crumbs before attackers ever get the chance to exploit it. We will dive into how we identify this contextual misconfiguration issues inside very critical systems like firewalls by analyzing the context around them as a complimentary level to the traditional well-known industry configuration analysis we all familiar. We will start by introducing the problem domain. We will move into the detection layer and then finally the first ever

introduction of the complete agentic framework which will get somewhat technical. So buckle up and let's start. So a little bit about about us only for the sake of the the talk context. No marketing. One short uh slide I promise. So KO is a company in the sassy space. Basically the convergence of security and networking together. Our product essentially takes all of our customers network and security infrastructures and move it into the global private cloud uh with a full inspection capabilities and all of it managed by us. So essentially this lets our customer to enjoy a single unified platform that covers all of their organizational needs instead of juggling a collection of point solutions. Right? So let's talk about the problem space we

are trying to solve. According to Gartner 50% of the cyber security incident in organization will stem either from lack of talents or from human failures. And this is isn't some uh distant prediction. It's already happening today. And if that number alone isn't concerning enough, let's add it to almost 3.5 million unfilled cyber security roles in across it, CISOs, ABSSE, security engineer, you name it. The result is that the pressure on IT and the security teams which are often already understaffed and very small is actually skyrocketing. And that rising pressure creates a very vicious circle that both explains and fuels up the search for in for security incident that we are seeing today. So when we are talking about AI and

automation, it's no longer a luxury or just a proof of concept experiment that we are trying to have. It's becoming the real organizational business needs that to close the gap that have been building up for years for all sort of reasons. So in our platform the users like the IT managers and the CISO we referred earlier they define the desired behavior or they apply the company policies in areas of security and networking through the policies they create. This policy cover a wide range of products such as w one firewall, LAN firewall, uh casby, DLP and many more. And essentially a policy is no no more than a set of roots with the flexibility though to define

multiple dimension for the desired behavior. So as much as powerful as this mechanism is, it is also plays a critical role in the system where the margin there for an error can be very significant. And that's why a proactive oversight for this specific mechanism can play a key role in minimi in minimizing the potential future damage for the organization. So we have decided to bring together two walls. The walls of the policy from the first time and the world of AI. AI capabilities allow us to build an autonomous systems that continuously monitors user configurations and can highlight the potential failures early in the process. And this world's convergence opens the door to some very powerful new

types of insights starting with the detection of contextual misconfiguration which is the which is a whole new set of security issues that live in the semantic dimension to the ability to suggest optimization. Not only at the level of individual rule rather across an entire policy and in some cases might be consist of thousands of rules. In this session we will be focusing on the first type of insight the contextual misconfiguration. So when we step back and we look at the bigger picture, it's clear that operational security teams need help. That's why we built a new agentic AI engine that layers directly on top of the policies mechanism we referred earlier like the organizational firewall which

we provided K. the this engine essentially works proactively and continuously to improve the organizational overall security posture. So what that policy mechanism actually looks like. So at its core it's a list of rules right this is the policy a list of rules where the order matters and each rule define a very desired behavior. So let's take a rule, a firewall for example. We can see the rule name, the rule description and the and a range of rule misconfiguration and definition. I'm not sure that you can read, but let me uh make it clear. The rule name for this one is allow remote access to Ireland and the whole description there is this is requested by the finance team for the

due diligence efforts. some due diligence efforts and there are many other technical configuration for the rule service the ports the application we can think that if it's remote connection maybe we would like to allow some VNC or RDP services and essentially we do have the action whether we would like to allow it or to block it to monitor that you name it so assuming a customer might have hundreds or thousands of these kind of rules. What does this engine actually do? So the main features essentially are applying semantic analysis on a free text to identify the contextual configuration issues that usually wouldn't be caught by any traditional configuration analysis. Why you ask yourself? Simply because it's not possible.

We are doing that by examining data types such as the rule name, the rule description, the associated tickets for that specific rule contracts wiki pages, confluence pages, and basically any other documents relevant if there are any. Ultimately, we are taking a huge burden off our customers shoulders by essentially introducing this agentic engine as a true force multiplier in the organization. So the semant the semantic and the behavioral engines I just talked about we have developed can generate many types of insights which we will cover shortly. The first one is the detection of temporary rules. So what is a temporary rule? Temporary rule are firewall rules that we have finded were identified that were created as a short-term solution mostly to

address an immediate organizational or business need. Let's see some examples of the insight that we have found as temporary rules that we have found as temporary. As an example, a temporary rule that granting the R&D team open network access during internal hackathon event. Again, open network policy. So I bet this one shouldn't be there at all or shouldn't be there at least uh and need to be removed as soon as possible. Another example of temporary rule that is a temporary rule that allowing sensitive protocols or application like RDP for the sake of P for a third party contractor within the organization to complete some uh some projects that they they are currently having on the system. So let me

emphasize that sensitive protocols or application that were meant to be allowed only during the P phase. So think how insecure it is and dangerous if that P has ended but that rule is still there. Another example is a temporary rule that blocking or allowing access from specific geography due to some global event right and another one is a temporary rule which actually it's an exception for a user let's say his name is Ethan that conflicts with a company policy like enabling tet which I assume We all agree that is insecure and why is there the the rule is there because it can just ask this one to complete his hacking push and the organization essentially the IT need to prov to

provide that or to complete some u some project in the in the organization. So as you can see from this example, detection can range from simple semantic use like spotting the water temp or temporary all the way to more complex contextbased scenarios such as understanding PC as something temporary edoc contractors the semantic meaning of an analyzing the run name the rune description and every aspect on top of them. Another insight type is the testing rules. So detection of testing rule this is are slightly different from temporary rules. These are actually viral rules that we have identified as being created mainly for validation, for debugging or to experiment with some specific feature or organizational scenario. Let's see some examples.

The first example is a testing rule that's created to help to to debug a network issue. And we all know that while debugging we are very generous about our nonrestrictive approach. Ser seriously though this might be ended with a major security gap in the organization. Another example is a testing rule created to explore and to get familiar with some certain policy configuration like allow any rule which might be legit at first. We saw that at customer because they don't know how to configure the firewall at first. They want to learn they want to to to allow everything and then to scope it down. But as much as this might be legit at first, it must be dropped as soon as possible.

And we see many cases where customer actually forget about it. And this is a real major issue in their network. Another testing rule is a testing rule that created to reproduce and to validate some past incident on the network such as let's such an example as an old IoT device that communicates between the IP cameras and the DVRs right which legacy configuration would have a really hard time to to detect that as nonrelevant anymore. And of course all of that is essentially inherently multilanguage and obviously it's depending on the LLM model in use. We have seen testing rules written in multiple languages. for instance, a Chinese testing rule that uh that is for verification on some old

domain controller or a French testing rule for demoing the RDS feature. another Spanish testing rule that disabling the block of risky genai application and why only for the sake of evaluation and risk assessment being done in the organization. Well, this might be legit to certain periods, but it significantly increases the probability of this configuration left forgotten as it is, which in turn might consequence a security breach in the organization. The third type of insight and in my opinion one of the most unique ones is the detection of expired rules. These are firewall rules that we identify as very they are they are having a very interesting pattern in the data. Customers tends to create their

own visual or textual indication to signal some desired wishful date again cutff date for the rule. So this cutoff date could be tied to an exact date, a period, a period or even some conditioned by some event, right? And this is actually pretty fascinating because it's something that is very hard even impossible to capture with traditional uristics methods. So for example, a rule that allowing a marketing employee Tik Tok access until a marketing course is finished with some point of time, wishful point of time. Another example is rules where we have detected an expiration sentiment embedded within the text, the rule name itself. In one very very interesting case, we have identified rules marked with EXP as a signal to expiration or

expired and some free text of a wishful expiration date with a high degree of variance of variance and this is very individual convention. We have found among thousands of rules in our customer base which actually only contextual analysis can spot. One time it could be exp. The other customer might call it differently. This is the the true power of leveraging AI and LLMs in order to to spot that kind of issues. Another one examples is rules that were likely to become irrelevant once some specific target date was reached. And for instance, when patching efforts were completed or when some particular issue was resolved. And here the agentic framework the agent can increase the detection confidence

by dynamically acquiring additional knowledge about the project patching specific patching effort from Jira ticket from wiki or confluence pages. Think about that and we will see that in a second. And on top of that, a very interesting challenge we have tackled here specifically in this kind of insight are rules that might be conditioned by holiday or event and the holiday and the event actually are textually included in the text such as Thanksgiving, Fourth of July, Black Friday or even Super Bowl and ultimately ely the semantic engine should both detect and resolve them into a date in order to pinpoint them on a timeline to determine whether they are expired or not. So as you can see the semantic and the

behavioral engine can can actually uh generate multiple types of insight. The temp test expired as we discovered as we actually covered during the previous slides but there are also some very interesting uh other types that currently out of scope for this session but it very uh it yet it's still worth mentioning. The first one is contradiction rule. Contradicting rules where actually two or more rule have conflicts of intention potentially causing misconfiguration in the organization the overall applied policy wish to be. Mismatched rule is a rule where mistakenly configured with the wrong action compared to the semantic meaning the intention. Let me give you some example. A CISO or the it can can can actually define that they are not allowing VNC in

the organization. So they create a new firewall rule. They're saying block VNC in the description according to policy blah blah blah define the VNC port define some applications and then mistakenly it chooses to allow in the action. This is something that is very hard to detect. But using the semantic, the LLMs, the power of AI to do some semantic analysis and to compare that the intention, the real intention versus the actual action can help us a lot and we are digging a lot of misconfiguration using that way. Another one is over permissive rules which actually a rule that is configured with a too broad of scope and unnecessarily increasing the attack surface in the organization.

So what actually do all of this new AIdriven insight have in common? They bring improvement across two key dimension. The first one is the improvement in the detection quality. We are talking about rules that would be very hard, very difficult, almost impossible to uh for traditional uh analysis to identify as a security threat to the organization. And remember the stat we mentioned earlier 50% of security incident in organization stem from this kind of uh issues and the second improvement is the improvement in the uh detection speed the MTD the mean time to detect and in a standard scenario this specifically this world that we saw would be probably only be discovered during an audit. M maybe once a year in the best case

once a quarter or worse than that they could be found too late after they've already been reconnaissance or exploited by threat actors. So up till now we have gone in details for different type of contextual misconfiguration and how they are hidden within the policies like temporary test and so so forth. These are the crumbs attackers eager to find and they invest endless resources to do so. We are actually what we are doing is we are delivering the sweeper. We are delivering the sweeper to clean these scrumps away in advance. So in the next slides I will present the agentic framework that makes all of this magic possible. So let's get to know the building block of this framework.

So when building multi- aent architectures there are several ways to connect the agents in the system think like you just mentioned in your talk. So we do have the networkbased connection we have a single agent or to use a single agent or hierarchal setups or the supervisor approach. For our use case, we have found the supervisor approach as the best fit. The supervisor is the core a agent in the system. All the other agents communicate with it and it the supervisor decides which agent should be called next. Its role is very similar to traffic controller orchestrating the flow deciding who should execute and when. The supervisor can trigger the process either by schedule timer or by detection

of changes in the policy. It also include tools that assist for example in account lookups pulling valuable uh information such as whether the account is active or not which policies are in use and so on. The supervisor can assign can abort or retry task for each of the agent you will see in a second. And in cases where human judgment is required, the supervisor knows to request human in the loop input. Finally, it is responsible for storing and publishing the insight insights that have been found back to production environments. So this is the supervisor. The next building block is the data retrieval agent. The data retrieval agent is responsible for the gathering for the data gathering

from any given policy in this in the system and they are doing that the data retrieval agent doing that not only for internal but also for relevant enrichment that might be external as well. This agent has several tools and I'm not drilling down into how tools are communicated but just to give some sense it could be communicated with MCP for example. MCP model context protocol is something that is very hyped right now but this is one way we can actually use the LLM ask for a specific tool to be used. So this agent the data retrieval agent it can actually fetch data using the some graphql queries that we are using in our backbone. So this is why GraphQL

and it can also check and pull for enrichments from third party services for example contextual ticket enrichment from service now or customer interaction from Zenesk. They are very loaded with textual with a contextual that can help us a lot when we want to decrease the false positive and to increase the accuracy in our system. We don't want to create the alert fatigue. We are very cautious about that and gathering data for analysis for the analysis of the context is very important for us. So the agent also leverage an LLM obviously for several uh enrichment use cases such as distilling the most relevant part from long text like tickets, Jiraas, Zenesk whatsoever performing name entity recognition

and generating summaries. The next agent which actually all these agent lies in the uh the next agent the core detection functionality lies within the detection agents. So there are several detection agent you will see next each assigned to detect and investigate specific insight domain. You can think of each one of them as an expert at this field. So one example in our framework is the temporary rule detection agent we have seen earlier right with the temporary rules. Its goal is very simple is to detect and investigate temporary rules within a given policy. For the LLM backbone in all of the detection agents, we choose to leverage cloud sign 3.5 through Amazon Bedrock, which basically in our testing,

it con consistently delivered the most reliable and long-term stability. And this agent is equipped with a range of tools that essentially allow it to further investigate a potential temporary rule. Uh its purpose is essentially to confirm or disapprove the relevance of the finding. Some of the tools that we have equipped this specific agent with is contract management platform tools like dosign for example. which are these tools are essentially responsible for adding context around PC rules related and cross validating them against actual PC contract that is happening right now in the organization minimizing actually the uh false positive potential. Another set of tools is the ticketing system tools like Jira which they are responsible for validating employees h

request for temporary capabilities like we saw earlier like Ethan that ask for tenant and we are doing that by using the Jira ticket to analyze that that's essentially being sent to the IT or the appsac team and the and lastly for this one is internal wiki tools like conference for example which are responsible for gathering additional knowledge about internal projects about internal pipelines about internal RFC's and this is very important for the contextual aspect we are trying to analyze another detection agent is the expired rule detection agent and the reason I'm showing that is just to clarify that each one of them has a different purpose, has a different set of tools, has a different capability,

system prompt, you name it. They are all customized in order to perform some very specific action. So, as we all know, LLM can sometimes struggle with tasks that involve comparisons, right, or time based operation. So this is why we actually equipped this agent with a set of specialized tools, time function tools which are responsible for determining the current time, handling different time zones or even perform the some accurate date and time comparisons which LLM are pretty apt about them. Another set of tools is the holiday and the event tools which are responsible for resolving nondeterministic dates such as we mentioned before Easter, Black Friday, Super Bowl each of which by the way they falls on a different

date every year and we need to spot that. We need to define and we need to pinpoint a specific date. LLM couldn't do that. And a very important agent, the last one uh is actually the judge agent. So this agent is essentially responsible for this decision making around the insight generating by the other detection agents. The judge receives all the insights and must decide whether they are relevant or not for the user. Its decision making process relies on three pillars. Noise reduction tools which are responsible for handling and filtering out high ambiguity insights. This is a very difficult problem. Conflict resolution tools which are responsible obviously for investigating conflicting insights and making sure that we are

doing some very informed decision between them and prioritization tool which are responsible for defining the prioritization of the insights and to assign a confidence level to each one of them. So after we saw that right all the building block let's connect them all together. So we do have the supervisor, we do have the data uh the data agent, data retrieval agent, we have the detection agents in that specific case. You can see there are three detection agents and we do have the judge agent. And now let's see how it's all placed together. So it's all start with the supervisor agent triggered whether by predefined schedule or by observing a change in the policy. The supervisor agent initiate

the data retrieval agent where in turn gathering all the relevant policy data with external third party enrichments. The data repro agent return all the collected data to the supervisor agent which in turn initiate only the relevant detection agent if found to be relevant. In this case, the supervisor decided to initiate all of the detection agent meaning both temporary contradiction and expireable detection agents. Once this detection agents done executing and they could do that on a separate time. Each one of them can take his time. they are running simultaneously but they already have some uh different task and different uh um comparison or logics or uristics. So when each one of them will be done, they will return the found insight into

the supervisor and then the supervisor agent initiate the judge agent in order to review all of the found insights. And this review includes the noise reduction, the conflict resolution and the final prioritization that we just referred. So once the judge review is done, final insights reported back to the supervisor agent and ultimately based on the judged result. The supervisor might trigger a specific detection agent for a retry or it actually gets a green light to publish the insights into production environment. Finally, after the supervisor publishing the publishing to production, the supervisor returns into idle mode and waits to the next schedule or changes to be made. Okay. So as we moved from the problem to the solution,

we saw contextual misconfiguration can open the door to many new risks, right? And how AI can actually step in in order to help us both find them, spot them, and to fix them early before they are escalating into a major issue. In that way, it's like having a contextual garbage collector quietly and systematically sweeping away those hidden misconfigurations before attackers even know they are there. This is very important and this is actually the journey that we're taking today is just the starting point a glimpse into a smarter more proactive way of thinking about security in the new semantic era. Thank you.

um have you looked at um integrating it with any vulnerability scanners like tennle or something and then being able to do like a risk score uh for rules because that's one of the things we struggle with is time today but I don't really know six months how secure >> yeah I mean I mean that's actually a great point I think that there are this is a two solution running in par in parallel I mean the vulnerability aspect Tennibel with Kali Rapid 7 will tell you that you have vulnerability. It's called with a CVSS of 7.5 and some EPSS even with a some score. It's a different kind of uh let's say security issues rather than understanding that you do have

something that is not relevant in your policy because Rapid 7 or the VAS wouldn't be able to to spot that. Maybe they can see it but they don't run the analysis on on that kind of context. So this is uh definitely an issue but it's this is a different like they are running in parallel. >> Sure. Yeah. >> I got the impression of your most firewalls at this point. >> Okay. So the mechanism can be run on any policies architecture in our product. The policy could be firewalls, could be DLP policy, CASBY policy, every uh security products essentially would include some uh semantics about the rule name and the description. Firewalls are the best case

because it's most populated. It has hundreds of thousands of rules. It's really dependent on in the customer and this is a very like very lowhanging fruit to start with because we are seeing that uh misconfiguration like we have some policies with like 150,000 uh rules. We know that the customer wouldn't be assured that every of these rule is relevant or not. And this is why firewall is the best case to start with. But we are taking that approach to basically every policies that we are delivering. >> But you're just relying on the words or do you actually source evidence like say the firewall logs or the window log if you're auditing identity or something. Yeah, it's a great

question. But I think that traditional configuration are being there for years, right? Trying to dig into these services, trying to understand whether this is bad or good. This is this is past. We are taking that into a different layer. The traditional is still there, but we are actually we are going above that. We are trying to to not inspect the traffic because we are already do that. We're trying to give some other perspective like to solve a blind spot that never been tackled before. >> Sure. >> Thank you. >> I have a question for you. So what what are the gaps and support we should consider before the data? So, so again the gaps that

>> support should be considered before >> um so I I think that I'm not sure I follow but I'll try to answer as much as I understand the gaps currently that we are seeing is about best practices how actually they are using that uh unfortunately it's very hard to educate those specific IT uh systems that essentially we know that they are super uh uh busy. They on their desk they have thousands of uh of tickets and this is why this unfortunately was still going to happen. So just a proactive sorry we can take it. >> I'm sorry. >> You're shutting me down >> after we wrap up. >> Yeah. All right. Uh, thanks. >> Thank you.

Contextual Agentic Garbage Collector: Sweeping Up Misconfiguration Crumbs Before Attackers Do

Related talks