← All talks

BSidesNYC 0x05 - Exploit Intelligence with Agentic AI: Patch What Matters (Dmitrijs Trizna)

BSides NYC · 202525:49134 viewsPublished 2025-12Watch on YouTube ↗
Speakers
Tags
About this talk
Dmitrijs Trizna explores how AI agents can improve exploit intelligence triage and vulnerability patch prioritization in the face of rapid AI-driven threat-actor automation. The talk covers the asymmetry between offensive and defensive AI adoption, practical vulnerability discovery using LLM-native approaches, and how agent-based workflows with proper context engineering can help organizations scale patch decisions beyond human bottlenecks.
Show transcript [en]

Uh let's start. Uh welcome everyone. Look at you. I'm so proud of you. It's like 5:30 your time and you came here. Amazing. Uh last session it is what it is, right? And uh like consider uh this as a private tutoring lesson to you. It's actually I think it will be amazing. uh like I will share experience what we do in in in my team uh like in our team in AL it's actually like we we formed a really like a group of of clever researchers we explore how we can actually employ agents for like tackling the the most critical cyber security problems exploit intelligence is one of them and uh before I jump in maybe a

brief bio like so you like know that you can trust me so did this for kind of 15 years um like infosc the the the AI focus mostly in the last five like even before pre-lamera so did kind of defenses and and and Microsoft back end so defended Azure right now yeah in nail talked blackhead defcon um yeah so agenda um like so I I'll split the discussion in three um let's say kind of a uh impact on the holistically on cyber security then we'll see how actually vulnerability triage happens and the lately dive deep into the specifically exploit intel with with agents and before we talk about right the more kind of specific narrow niche let's zoom out

and let's think like how what what what we are observing right now and actually like I'm like our take what we see is that we have a problem right in cyber security right now there is a significant asymmetry and what we see is that thread actors are having a pretty good time like the the automation kicked in and they started pretty rapidly on boarding on it. We see huge amount of evidence that that actually state sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sp sponsored thread actors um ransomware groups commercially viable uh groups they're all employing LLM to to automate their daily day-to-day operations and uh while something like

Antropic and and OpenAI can and try to combat them limit their impact limit their ability to access their models something like deepseek is like fully available it's like free you download it you spin it up fine-tune it on your data I'm sure it's running 24/7 somewhere deep in a in a North Korea for sure and uh yeah the another piece of evidence is um Google's deep mind report and these guys know what they they talk about they have a mandant and virus talle under their belt right so they as well see a different set of cost reduction and that's actually important point uh across different thread actor activities so like they they see a significant decrease of or like increase

in a ll lamp uh usage for something like DOS or like fishings or malware creations there less of that in other domains because again uh AI right now is amazing in code it's amazing in language it's less so amazing in for example intra automation and uh that's that's what we kind of see across the board but like it's overwhelming offensive security has a significant rapid benefits from an AI adoption and um the the worst of that is number of CVS growth right and it will actually keep growing more. Uh why I'm I'm actually super confident on that. Um there there's as well like significant evidence that this is true. For instance, take a look on this stat. It's

actually not created by me sources down there uh at the bottom. Uh but what we see is that like it's a pie chart representing npm packages and like only 4% of them have had any vulnerabilities disclosed in past. 95% of them do not have any wounds at all reported. Does it mean they don't have any wounds? I don't think so. It's like the truth is that some security experts spend his time looking on those 4% which is probably the most common ones and like huge amount of vulnerabilities still live there and the question is what if vulnerability discovery will not be limited by human attention anymore. So like it's this this 90% 95% actually become tractable and again it's not an

empty words that's what we see as well like in our group we have a research team focused on uh vulnerability discovery by means of AI native approaches and just in last month when we started focusing this like 100 plus vulnerabilities we have 10 plus CVS and it's not like some low-level projects like with the low quality code it's like high high targets only so opensl we three CVs there in last month. Uh signal uh Linux kernel like this is for example uh communication with Linux uh actual like tools uh like about the this closed back that we have and and they fixed uh fixed it. So yeah, the the whole premise of like offensive use um of LLMs, it's

just kind of early days adoption grows and um the one kind of worst thing is that when you couple the code and language automation together, this leads to an interesting kind of after thought or like effect that comes out. It's actually supply chain risks because now you can mimic being an actual developer and you can actually contribute more code with less work right and that's what we see last year XZ if you remember the case uh it was back door right by an active contributor who was actually multiple months doing a work with XZ as a legit uh employee but eventually it was again North Korean who like planted the back door after 3 months or

something and what what they can do as well. It's an LMS are enabler of this like because you can do it across many projects at at once. Um last month we had a worm that hit I don't know 100 packages. Uh then we have um debug and chalk case if you know that actually like just the two packages affected like 50,000 cases or like 50,000 customers. And what we did especially with this one again basically a few weeks ago we got this chalk lab library we like scanned it and we seen right away the the back door is there and it costed us only $2 and six minutes and it's actually hints towards like this whole avalanche of

problems that are coming with the offensive misuse. So there is a potentially solution right we can see it. So back to the estimate, thread actors actually having a good time. But yeah, maybe we as a defensive minded people can benefit from AI automation too. Yeah, let's to take a look on on solution how it happens right now in I don't know uh what are the vulnerability management platforms uh programs doing out there. So let's focus on code first. Um so this definitely scales to like cloud and infra but then speaking about appsac uh the atomic unit is code repository. You take u basically it it depends on some um uh dependencies which actually come from out of your uh

decision boundary or like security boundary. It's out there public right uh the debug might live there and the debug might actually be exploited out there right. So what you do uh right now it's like you you take a person actually human security analyst that consumes both sources the public news or like keys non exploit vulnerability databases and takes a look on your code and correlates this and and sees yeah actually there is a vulnerability that I have out there exploited we have to fix it uh before actually fixing many of them the vulnerabilities that land on this security analyst table are actually like nonsensical they do not affect you. So you have this trash process where you

discard them, right? Uh once if those that are not discarded go to the actual deaf or or any ser service owner who like spends his time to actually fix it and now you repeat this many many times, right? You have like thousands and thousands. I actually speak like spoke with CESOS of public companies that have millions of CVs in their networks. It's like just not tractable. So these conventional methods do not scale. It's like the min minimal automation out there. Backlog of CVCs keep growing. What we um explore ourselves as a part of of this of this work like we see that we can employ AI native methods to see uh across the code those kind of

inconsistencies. So what if we plug a bot here that that we actually employ multiple agents together that do this that is supervised by human again remember the notion of cost reduction right uh it's u you it still has to be guided it's just now within like a day you can solve not 10 vulnerabilities but 100 or maybe 1,000 in perfect case so uh now this bot can filter out those vulnerabilities with minimum istic involvement of person and when it's not it can go and propose remediation fixes. Uh we can employ the the integration with the developer actual services that you as devs do they they actually run unit test they run validations. So they have do to do really minimalistic work

just to check that that it actually passes. So this paradigm will reduce or at least allow to combat with this backlog of growing CVS right but to push it to the limits you actually can involve the loop idea here and you can perform a continuous scan here. So uh and thinking about that again like even like pushing even further to the limits creates this paradigm shift that we believe that is possible for us to to combat this ever growing threat is of like self-defending u software which is of course is kind of holy grail. It's not there yet, but why not? Potentially you can remove any any bugs that appear in your code by mistake or by ill intent

in this loop of like minimalistic involvement of human. So yeah uh this is holistic picture of the whole vulnerability triage where we are how do we combat this ever growing threats by sponsors thread actors and uh yeah uh exploit intel is part of it. It's a piece and and then a huge heart like house of cards and the we explored actually how um AI agents can yield here uh can help the the analysts out there to do their job more efficiently. So how do you actually get something like this when you open your feed right on a weekend and you get like everything is in fire like last week F5 the uh source code was released uh what do we do

definitely right now you don't have any actionable items but you definitely pay attention more more deeply on on for example F5 instances if they are in your network or yeah basically this story repeats from time to time like a month ago it was a shareepoint so if we decouple exploit intel um like how how it happens or like what what it consists of fundamentally. So there are sources that act as a guidance missiles for for your decision- making on actually your vulnerabilities in your environment. So you might know they they are there or uh you might discover them and process but eventually you have to kind of figure out do you have to act on them or not.

So this eventually is a is a basically a score. It's it's nothing uh too sophisticated. You either see something super relevant to you like score 10.0 or you can ignore it like zero like I yeah in ideal world it's zero and and you completely take it off of the table. There might be other gradations but um what you do usually like end to end in end to end matter it's like a person that does this parsing of exploit sources or new sources and applies it to this non-trivial lineage of of all the backs and your environment. But more mature organizations with whom we spoke with they actually do a a bit of gradated approach. So they do collection

of these sources in their so-called kind of cache right or like some more local accessible much more fast um storage of of those vulnerability um exploit informations and then you have this more of like rapid consum consumption of these uh local uh storage. So for each CV in your environment you have to do this uh application. So we tried to plug uh bots there right we gave them enough tools so we explored in many v varieties this thing for example with coral like you you you can give them ability to browse web web pages like twitter or or basically all those sources uh they don't behave well and um why not uh let's step back a bit again and u we

didn't discussed um where agents are good where where they're bad and uh to to to talk about this I offer you to look on fundamentals a bit from a different perspective. We all know what is LLM right you provide some input it spits some output but with all the experience that we have over the years in our team is that uh you get some hunches where they work and we're not so in theory like LLMs are actually like they take input right the capital of France is and they spit output Paris so in theory you can get anything from LLM and it just depends on in like whatever you put in you can get any anything you

want you can get the perfect score for your vulnerability if you prepare this in in in ideal manner maner for LM to make a decision. So that's where agents kind of solve it right um like speaking about agents it is an extension of conventional LLM right entropic offers this paradigm but we with all the work that we done we offer a simplistic one don't think about the like the old complexity around it's actually anything u l can do with an extra context as a tool is a basically already an agent so like memory is just a tool to be honest like query to the rack is can be formulated it as a tool. So it's like llm has some lever that it

can press and uh it uses these levers uh basically eventually to control what information it receives and and this what information it's basically a context and it's super crucial for uh work with the LMS to improve them is to understand that context engineering is basically everything that's the whole again that's the most essential thing that you need to master to get out of the LM what you want. So if we talk take a look on an agent functionality right you have an input for example you ask chat GPT in research mode to to query you for something for like I don't know apartments in in your local area it goes it does a lot of tool calls gets

information uh finds this stuff but all this stuff is just a context gathering for the last final answer. So basically you can think again it's just building the context in a proper way to get this kind of final piece of information of like chunk of information that all the preceding information uh like was collected for. So and there is significant evidence again in like careless approach to context engineering hurts really significantly. So too much information, performance drops. Irrelevant information, performance drops. Again, this is like evidence-based. A lots of academic research shows this. And uh so this is super crucial to the topic that we just discussed. Again, you plug in the the bots into this scrapping stage. You give

it a curl tool or you even give it more like targeted tool. For example, you've write the whole I don't know harness around GitHub and you give it just search GitHub with ACV. I will provide you all the information from GitHub regarding this and it doesn't work because again as you if you attended two talks before Chloe uh talked about LMs are not for everything. LMS are not for scraping. Don't use them for this. They're slow. They're expensive for this. And actually private feeds uh here still beat LM. So you have a volch you have a a lot of TI focused companies that do it in the Germanistic maner or you can do it yourself in the

derministic maner and it will be much more efficient and actually full than LLMs but um let's consider the second stage aggregation heruristics and that's where we plug the bots they're amazing actually so yeah we trust in technology we all see we all use they actually perform super well on some tasks and it's like amazing like such like intelligence is a utility right now. That's where it shines. So why? Because like it can holistically access the context from different environments. So like you can feed it in application context with the like reachability, exploitability. Then you can feed the context from organization, your app where it lives, what access it have, like what uh your devs speak about

it in Slack. And then you can give it an exploit intel and it will calibrate all those previous uh uh contexts in a necessary manner. For example, this vulnerability lives out there deep in your network but is exploited them. This is still like important or vulnerability is um customerf facing but is not exploited. It's still kind of bad because it might be exploited at some point. So all this can be uh holistically assessed by by all the models here. And um the most amazing thing LMS don't care about the format right one uh exploit source might give you this type of data like markdown with links and short description another for example uh cesak kiav uh gives it in

like structured uh JSON usually you build parsers before in earlier days to like treat each of those sources integration of new source took like I don't know days to build these parsers integrations right now it basically uh again lms consume this data data as is without any um basically predefinitions here. And this the last thing is like simple no tools. Again in this case what we what we see in many cases not agents actually work but not everywhere and in many cases the simple solutions be the complex ones. So um for instance um again back to the entropics definition of uh building a effective agents they define so-called workflow paradigm where you instead of calling to agents you call kind of lm

calls with targeted contexts and you have a kind of glue them with the deterministic code like with the old vanilla code right and this in many cases I don't know 70 80% of use cases that we've seen beats all those complex agentic workflows where One agent calls another with the sub agents like this simple stuff where you collect the context in a proper maner with a proper context engineering. It's kind of that's what works and you don't need anything else. So yeah uh that's uh that's for example our example how we build high quality context for this final score including exploit intel. So we cons like collect the data from different sources. Each of them is

informed by a set of again agents or of workflows but eventually this leads to another LM call with a much more informed context that doesn't have any anything unnecessary and contains only the core information it needs. So yeah uh that's basically it uh recap from me. So there's definitely an asymmetry. Um I'd say status quo looks poorly if we don't do anything if we don't integrate LMS properly in our workflows I don't know where we'll end where we will end the the thread actors definitely employ this more and more and um yeah conventional methods really don't scale here so we have to at like really well build scaffolding around agents they don't work out of the box we

have to like with a huge amount of domain expert expertise to build them in. So um yeah about agents I don't know three takeaways that I share with you that are not kind of conventionally accepted but what we see through a huge amount of experience context engineering is everything that's like that beats any agentic like complex workflows uh don't use lm unless needed it's like the old story with machine learning if you know like everyone wanted to use machine learning well like kind of logistical regression worked but like simple I know if else worked here as well uh you have to thing do you do you do you really vent for LM here and like as we see it's

actually in many places LM makes sense like we build our approaches in LM native way we like solve 80% of the cases but you still have to have those 20% when the conventional methods beat and um simple is is better than complex like overengineering really doesn't help so yeah and the basically the paradigm shift of the I hope that at some point in future we'll achieve the whole guaral of defending software and I don't know guys if you have a backlog of vulnerabilities maybe it will disappear and and become zero right yeah so that's it if you want to connect we we share our work actively so yeah I'll I'll definitely include that in in feeds and

uh I hope we still have time for a questions right a few of them yeah amazing so if you have any I'm happy to to answer yeah elaborate a little bit.

>> Yeah, we we actually don't like point was that you don't need parser, right? It eats any type of format information. But you have to like you can consider this what is passed in this unpared way to the model is a piece of information out there right you don't need whole Twitter to to consume right and the problem is when you cannot feed okay this is my feed take it and read it and it like it can see the necessary information out there but there are better methods like to collect okay I actually need only these pieces from GitHub so you cannot do prefiltering herself and then only that information that is relevant is fed into

LLM. So if it makes sense. Yeah. Yeah. >> Agent to agent >> for sure. Uh uh you mean even protocol wise? Yeah. That that's Google's uh creation. That's what you mentioned. Um like Yeah. Yeah. >> Yeah. Yeah. uh like like there's definitely like what we see like huge amount of functionality is achieved when you allow LMS talk to each other complement each other for example criticize we employed a lot of critics when you have an LLM that does something and then it passed to critic that basically kind of without the preconditions that were passed to the first model assumes and um solves the flows of the of the former And in this case like it's definitely a

future of like machines talking to each other like bots proposing PR fixes and then fixing themselves. That's we see it. Uh the case about the protocol I'm not sure we actually don't use it. We see the more again simple solutions work better. Uh like just a pure integration in code rather than more complex divergence to some like third party protocols. And as we've seen with for example with MCP which is the simple similar thing for tools. It's actually has its own flaws. It's it's slow and it's better sometimes to code the tool natively like search GitHub I showed you. You can code it in code and and give it as as a tool to the API rather

than building an FC MCP with GitHub. So that's what we see with this agent to agent protocol. We are not really sell to convert to it. So we use our kind of internal methods and we don't see the need like why it's so kind of trivial at this moment to connect those models that yeah >> thank you.

>> Um do I stop recording? >> Yeah it >> um if you have any questions you can meet them outside this room. Um and then next is the closing. Uh so see you there if you're going. >> Uh it stopped I think. Yeah.