← All talks

AI Agents: Augmenting Vulnerability Analysis and Remediation

BSides SATX · 202544:44178 viewsPublished 2025-09Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
BSides San Antonio 2025 June 21 at St. Mary's University
Show transcript [en]

so far. Good. So, oh, we have some people that came in midday. All right. Good. I you you came in for your first session at a really good one. Uh because we're going to be talking about AI agents. Has anyone heard anything about AI lately? No. Yeah. Okay. Maybe just a bit. All right. Uh we first of all, I want to thank you know our our diamond sponsor USAA and St. Mary's for a great facility and really helping us out um making this a great event. This is been a few years now we've been um holding this in St. Mary's also uh very grateful to support of Spectre Ops. And as we go, if you haven't yet, the restrooms are

out here and to the right and down the hall past the little couches. So, should you need that, that's where you can go. Was that on purpose? >> No. >> Okay. So, we will get we will get those images back very soon. My name is Jeff Rich. I'm one of the volunteers here. This session is going to go until 2 p.m. We're going to Payton has asked that you save your questions, so make note of them. And we'll have time for questions at the end. And I'll walk around with this so you can ask your question and be heard. And I'm going to stall another 30 seconds until the projectors come back online. Oh, the Oh, the hoodie came off. Now

we're getting serious.

So, so I'm just going to ask this. Yeah, I'm stalling for time, but but I am interested anyways. How many people this is your first besides? So, what do you think so far? Thumbs up, thumb down. Oh, good. All right. Yeah, this is you can't beat it for the value, the quality of the speakers, some of the swag you can get, networking. I mean, it's a really good It's local. Um, I'm assuming many if not most people are local, so it's a good opportunity for that. It looks like we have things going again, if not maybe a bit precarious, but we do have it. So, I think we can get ready to get this underway. And

you're going to hear about AI agents augmenting vulnerability analysis and remediation from Payton Smith. Please welcome Payton. Okay, you guys hear me? Okay, I'm sort of not close to the mic, but yes, in the back. All right, sounds good. All right, great. Right. So we're going to be talking about AI agents today and specifically augmenting vulnerability analysis and remediation. Uh so just to start with a bit about myself. So uh started my career as a software engineer and I've spent about 10 years in cyber. Uh started in thread intel. So tracking thread actor activity through by building malware analysis pipelines. I moved to Crowdstrike services for 5 years. I split time between their cyber intrusion team and their red team. Uh

I've seen a lot of EROM nation state cyber intrusions and then have also hacked in a good to a good chunk of the Fortune thousand. uh left in 2020 24 2024, excuse me, to start a business. Uh we're backed by Google and we're trying to do exactly this. I promise this is not a vendor talk. Um we'll have like one slide at the end in terms of what we're um focusing on. But I think, you know, just with how new large language models are, uh I think there's a lot of evangelizing to do, especially in this community, the community that we're all part of, uh because I think we naturally do not trust uh technology. So, um

that's why I'm here. So in terms of why you should care, uh I I I truly believe that large language models will revolutionize most industries and I think security is no um exception. Uh I I do believe that and this is I think also what what we'll be talking a little bit about today is that uh there's there's a lot of trust problems in in cyber and so I think cyber security will probably be one of the last verticals to fully integrate language models mostly due to the low risk tolerance right especially on the sock and IR side u you know if you make a mistake and classify a true positive as a false positive that

could lead to a massive problem and so what that means though is that human the loop workflows will dominate for a long time where we're sort of using these language models to augment our workflows and I think all of our jobs are safe for the uh foreseeable future. All right. So, uh just to kind of start with a few goals for this talk. So, uh the just some basic ideas is that uh we'll be able to or everyone in this room will hopefully be able to understand AI agent terminology and basic agentic workflow patterns. uh you'll understand sort of the the the base building blocks that a lot of AI agents are built upon and then you'll

also understand how to how to mitigate some of the inherent risks with language models. Okay. Uh I understand there's a lot of different skill skill sets in the in the room today. So I want to spend like three or four slides on basics and I promise it'll get a little bit more interesting. Uh so in terms of what is a large language model oversimplified. So if there's some actual AI folks in here, please don't throw things at me. But really it's just an advanced autocomplete system, right? So, it's predicting the next word or token or number in a sequence based on its huge uh training set. And it's it's its training set is is massive. It's pretty

much the entire human knowledge database. Uh New York Times, I just put that there because the uh New York Times sued OpenAI for using their their data, but it could be YouTube data, but it's not necessarily security data. There's probably some security and product documentation that's that's built in, but in terms of, you know, if it's trained on gigabytes or terabytes of logs, not really. Uh, and so, uh, each time one of these models is trained, it it costs a ton of money, right? So, GP4, uh, cost about 100 million to train. Then, GPT4.5, which was released earlier this year, cost about 500 million. And so, because of the cost to to train these systems, uh, it's done very very

rare, right? It'll it'll be done once every six months. And so it's it's sort of a theme throughout this talk is that we need to find a way so that the models will always have up-to-date information because what it's trained on is not accurate, right? Especially in cyber security when you're dealing with an active new threat. All right? Maybe it's a new threat actor that's doing something brand new. They're they're um exploiting some CVE or whatever it might be. Uh the language model is not going to have that information. So how do we go out and get that information and and and provide it to the language model? Okay. Uh just sort of a funny note here

is that you know I I I I think you'll hear different opinions in terms of RLM's the future of life and you know generative AI and artificial AI and I I I I think this can be sort of broken down into the leaders in the space aka open AI anthropic and Google say yes right but it was funny I don't know if you guys saw earlier this week that Apple put out a paper which basically just said LM are overrated uh that was actually was released this week and basically just said that uh you They're basically advanced autocomplete systems. They they they can't reason and therefore like they're very, you know, they're they're not great, right? And so

where where is this actually? It's probably somewhere in the middle. Uh but I think it's still to be determined. So in terms of uh open source versus proprietary models, uh open source, you know, I'm sure everybody is is familiar with what that means, but there's some models there on the on the lefth hand side that are all open source and there there sort of some benefits of that which we'll talk through. And then there's the proprietary models, which I'm sure most of you all maybe use today for your for your personal needs. And so there's there's different benefits and drawbacks to different types of models. Uh so open source is obviously going to be cheaper. uh it's typically going to

lag behind a little bit in terms of performance if if you use an open AI versus maybe a llama or or something else. Generally, the open source models are going to be lagging a little bit behind, which by the way isn't necessarily a problem. It just depends on your workload, which we'll which we'll talk about later. Um in terms of customization, open source, as we all know, is is high versus proprietary. You're going to have to go through some uh roadblocks. scalability. Uh if you're going to pay a lot of money for a open AAI, you know, I'm hoping that you can scale your um large language model calls. And then privacy and control. This is actually changed a little bit uh

here because uh most of the proprietary vendors now allow you to host a or a host a version of their model in your private cloud. So if you work for a enterprise, you can host it in Azure, you can host it in AWS. So this is sort of shifting a little bit but I think still open source LMS if you really care about privacy are probably going to be your best bet. So just in terms of the release rate uh so OpenAI releases a ton of different models. This is sort of their release cadence over the last yearish. Uh so they release a new model about every 3 months. It's interesting though because not every single one of these models is

actually a brand new trained model. Sometimes they're just iterations. And so we'll talk a little bit about that as well. uh last one or two slides on sort of basics. So all of these language models are benchmarked today. So they they all score differently at different tasks. Uh this is a screenshot taken from March 21st and you'll see that Gemini 2.5 had actually just just uh just came out and so it was performing the best, right? So it's pretty much at the at the top of of most of these uh benchmarks in terms of scoring. But uh I just took a a screenshot from June 19th and now you see uh you'll you'll see that um

OpenAI's 03 model is now is now at the best and then it's Claude and then it's u Gemini right and the the the point is is that uh these things change so fast right now and there's and there's different models that are performing better at coding uh you know uh reasoning and so it's important a is that you know if if uh I were to take the screenshot a month from now I can almost guarantee that this is going to be different. All right? And so it's important, especially as we're building with these with these models, uh I'm going to make an argument that uh you should really build with the ability to plug and play with different models. Uh

it's sort of a key takeaway. Uh meaning that we don't know which model is going to win in the long term, right? Uh I think over time, this is sort of like a broader market um opinion, but I don't know if it's going to be a Google wins all, right? uh sort of like a monopoly market. Um I I I think likely what will happen is uh some of these language models will start to specialize over time. Uh if we go back to this photo, so Mistral as an example, uh they came out a few months ago and decided to specialize in Arabic, right? So there's sort of all these niche areas where these where these model providers can

can go. I think OpenAI is debatably going more consumer focused and Claude is saying that they're going to focus on developers, right? So there's there's sort of all these different things and we're way too early. And so going back to this point, uh that's sort of bottom point um is that we need to build with the ability to to easily plug and play. And we'll talk a little bit more about this later. In terms of open source and proprietary, uh it really depends on your type of workflow. You don't always need that like, you know, Ferrari, right? You can sort of get by with a I don't know, pick your old car, uh depending on the type of workflow. And

so we'll we'll we'll sort of talk through um potentially a few examples there as well. Okay, that was actually a little more basic than than I thought, but kind of moving on I think to how how we can apply these models to cyber security workflows. Okay, and and specifically when I when I built this presentation, I think CV analysis is probably something that everybody in this room has done or has heard of. And so we're going to sort of use this as one example. I think there's a lot of really interesting use cases where we can use language models. I think there's a lot of companies going after the sock space right now. So like augmenting

alert triage, things like that. uh but we're going to focus a little bit on a different area here. Okay. So, uh CV, this is just the example that that we're we're going to walk through today. So, there's CV 20244577. This was a mass exploited PHP problem disclosed in July 2024. I'll reiterate July 2024 because we're going to come back to that later. And so, usually if if you're an analyst, you have a few questions when you're looking at a CV, right? A, what is this thing, right? A CV is just like a moniker that, you know, doesn't really mean anything if if I just read it. Uh, is there threat actor activity affiliated with a CV? So,

is there someone in the wild that is actively exploiting this thing? Uh, is there public proof of concept code? Um, you know, if yes, I would I would like to see it. Maybe I want to test if my system is actually vulnerable. Uh, and then I think most importantly, debatably is how do I fix this thing, right? So, um, how can I tell my engineering team or IT team uh to actually fix this problem? And so I think the question we're going to ask is usually a human has to do a bunch of reading. Maybe they're going to read various data sources, thread intel, and answer all these questions. And we're going to try to have a language model automate all

this for us. Okay. So just as an example, uh this is a little bit dated now, but uh I I I sort of wanted to pull the uh audience here for a second. So we're going to use an older model. It's from GPT01. Uh as a reminder, this this vulnerability was disclosed in July. Um, and then this model was released in 2024. And so we're going to use GPT01 to analyze this problem. And when I say that, we're just going to go into chat or uh chat GPT. We're just going to ask it about it, right? And so I wanted to ask everybody and maybe just like a show of hands. Who thinks that the language model is going

to be able to answer all these questions about this CV? Got one. Okay. And then partially. All right. And then it will fail. All right. Nice. Okay. So, this uh photo was taken in March 2025. Um, and so again, we're we're using this 01 model that was uh released in December, so 6 months before the CD was released. And it says I'm not aware of this vulnerability at all. Right? So, I think this sort of shows that pretty much just means that this this model was trained before the CV was released. even though this model was released 6 months after the CV was actually uh disclosed, right? And so I I think this sort of gets to uh a point

that I was trying to make earlier where uh we we really need to have or or uh provide these language models the ability to go out and get relevant context that is relevant today and not from when it was trained. And so you know cyber security isn't the only industry to have this problem. And so there's there's been this idea called RAG, which is retrieval augmented generation. It's just a fancy moniker that essentially is going to have the language model go out and retrieve information to answer your question, right? Uh there's one sort of important caveat here is that LMS have a context window. This is basically just the amount of information that it can

analyze at one time. U that's a vast over or over over simplification, but please just bear with me. And so, uh, we we'll sort of run into problems because we we we we can't get a ton of information and throw it at the LM because it just won't be able to analyze it all. So, we need to sort of be careful with the type of information that we are sending it. Okay. And just so a a diagram of of a rag app just just so you all can can sort of see this. So, um, modern proprietary models, this is this is they they all have this implement or they all have this implemented today. uh

it's not just a core large language model, but so what what happens is that you're going to ask the model a a a question uh you know it could be the chat GPT app the app is going to go out to the internet typically Google is just probably the standard um information knowledge source today it's going to Google for relevant information and it's going to throw it in that context window and the LM is going to analyze uh in addition to its training set it's going to look at that context window and try to answer your your question right so um Again, it's it's sort of augmenting the trained LLM with additional information after it was trained.

Okay. Um hopefully it's going to work. Let's see. Is this good? All right. So, this is a newer LM. Uh and I'm going to hopefully pause it here. So, I'm just asking it to describe. I don't know if you guys can see this. It says right here, it says searching the web right there. Uh so, this is now the standard. If if you go to any proprietary model, this is what they're doing. Uh, and it's because they had this huge problem where, you know, the the LM would just um hallucinate nonsense if if you asked it anything new. And so they they basically built these rag apps that recognize when the LM doesn't have the

right data and they try to go out and find the data, right? Uh there's a few and I'll I'll sort of let this run here for a second. So it was actually kind of interesting, too. I know it's really small. Uh but you'll see that they'll actually provide uh little little icons there. I'm trying to pause it, but it's not working. So, there it says GitHub, NVD, and and and other other sources, and they'll actually site the sources for you. So, I know it's really small, and I'm going to see. Let's see. I think I Yeah. So, um what they'll do is you can sort of hover over articles on the right that it used to actually augment

its training set so it has the right information. Okay, I'm move on here. Uh, okay. So, sort of second takeaway here is that in in in my opinion, especially for cyber security workflows, this rag portion of the app where you're going out and getting relevant context is incredibly important. Without it, your language models are just going to be sending you nonsense and wasting your time. Uh, and so I I think we can go a lot further here and we'll talk a little we'll talk a little bit about this later, but you know, uh, searching Google is just public data. uh especially if y'all work at a enterprise or a organization we we really want to search your internal data

right uh you know searching Google is nice but it's you know what about your proprietary thread intel what about your CMDB uh and another internal knowledge repository shareepoint uh to to sort of allow you to augment a lot of your analysis okay uh this is sort of just reiterating what I was just talking about in terms of uh Google rag right so you're going after Google that's not a technical term by the way so please don't use uh uh but in terms of just to kind of show what Google rag was able to do in terms of the uh initial response where the language model says hey I have no idea uh you know it's able to provide a

coherent answer about what is the CVE uh a good description or a a good description in including uh versions that are actually affected uh it can provide thread accurativity you know it is pulling from public thread intel which isn't great for for a lot of reasons but you know it's it's at least a a step in the right uh is there public proof of concept code? Um it does say yes, but it doesn't generate it for us. So that that's not ideal and we'll kind of touch on that later. Um and then it is able to provide some decent instructions uh in terms of how to fix this problem. But again, it's very generic. It's not tailored to your

enterprise. It's not tailored to the affected system that you're actually trying to remediate. And so we'll uh touch on some strategies there later as well. Okay. So, uh, some some drawbacks of that just example that I that I just showed. Uh, I don't want to have to go into chat GPT every time and manually type whatever I'm looking for in, right? We want to, um, hopefully automate this. Um, as we talked about, OpenAI uses public data. That's okay, but it's not great. Uh, and then, yeah, I pretty much said the rest of these. So, the the the idea that we're going to try to solve for here is how do we improve upon this, right? And try to automate this into our

workflows today. So you don't have to have your four tabs open where you're, you know, querying your service now and you're copying copying and pasting things into uh chat GPT. Uh I pretty much already talked about this. So now we're going to move on to trying to automate this and further we're also going to try to integrate internal data. This is more of an enterprise uh topic but uh at least part two is uh part one I think we can all build scripts in terms of automating your own uh workflows here. Okay. And so to do that to actually automate this we're going to talk about agentic systems. So every large language model provider well every proprietary one uh has their

own API right? So this this could be a traditional library. You know, you guys are if you guys have all written in Python, you're importing something and all of a sudden you have all these pre-built function calls where you can write or uh you know uh communicate with the language model. And so to kind of go back to takeaway one though, the problem is is that every language model provider's API is drastically different, right? And so if if if you were to try to build a a script or a framework where you want to be able to easily plug and play with different uh language model providers, you would have to pretty much rewrite the small script three times.

And so we don't want have to do that. You know, if we're down the road and we're and we're building on Gemini and Gemini all of a sudden doesn't support our use case, uh we that would that would be terrible. And so again uh people have started to solve this problem uh by building on or building what is called agentic frameworks. And so they they provide sort of a a um they provide an abstraction on top of language models so that you can easily plug and play with different models. Uh they're also pre-built with a few popular agentic patterns which again we'll talk about later. And they can also sort of allow your LLMs or agents

to to to work with various tools. Uh so the most popular ones I think maybe six months ago were Langchain and Langraph. Uh there's there's a few other ones as well. Crew AI is a startup. Agno I think is also a startup. AG is Autogen. It's uh built by Microsoft. Uh there's all drawbacks and and positives to these uh frameworks. Um we'll we'll sort of use Langchain and Langraph mostly as just examples today. Uh so you can see how they work. Um but I'll say that the downside is that even these just because of how new this this sector is even these can be unstable. Uh when we started building at uh Specular we were

using lang chain heavily and lang chain uh has had some problems. I still think it's a great uh framework but it's it's uh you know we've we've had to re rewrite some of that as well. Um so you know choose at your own peril. Okay so uh to kind of show an example of using lang chain here. So essentially what what I'm showing here is that this is the same script that we were just looking at but uh what what they allow you to do is just uh initialize a language model in one line. Right? So in in in this same script you you you can sort of uh uh it's just one line of code

difference to either use chat JPT or claude right so you you can pretty much build an entire script and then just have one line that actually chooses what language model you're actually using for that workload. So again it's it allows us to easily change if cloud starts performing terribly for our use case we can always swap and use something else. Okay. Uh sort of introducing one more terminology here. Uh anthropic actually released a blog in 2024 which is sort of released or uh uh talked about some terms which are now standards uh standards here. So there's workflows which are uh predetermined uh sort of code paths that a large language model can take. Uh and I know

this is a little bit vague and we'll and we'll we'll sort of walk through an example here. Uh and then there's agents. Uh agents are uh where where you're providing a large language model the ability to sort of choose its own direction. So this is a little bit of a new concept because I think if if uh anybody's written code before, you know, your your code is going to execute directly down the lines that you told it to. Uh but a and that that's that's similar to a to a workflow, but agents you're you're providing the language model to choose the route or the road that it's taking to complete a task. So all of a sudden, you're sort of like

releasing the control. And you know, that's where it kind of gets, you know, weird sci-fi things can uh happen. But um it it I I I think it also provides uh additional capabilities that maybe weren't present in legacy traditional code, right? And so we'll talk through some uh examples here. Okay. So we're going to start with the simpler one which is again a workflow. So it's a pred using a large language model in a predefined code path. And so we're going to try to use it to answer two of our um CVE questions, right? So what is the CVE? uh you know really it's a text summarization problem where we're going to read in a bunch of text and try to

summarize it. Uh and that's a perfect use case for a language model and specifically a workflow, right? It's not it's not very complicated. You're just going to read text and then summarize it. And then is there threat uh excuse me is there threat actor activity affiliated with the CD? Also mostly just text summary. Okay. So what a what a human would do what what I would do is I would probably look this up in the national vulnerability database and I would read through all these sources right and probably some relevant thread intel. Um the problem with this is that what we talked about before is that all this information will not fit in the LM's

context window. That's that finite or that's that finite amount of space that a large language model can actually analyze. And so we need to sort of build these workflows that are going to help the language model summarize this information. There's and there's actually several options here. All right, this is a very complicated graph. Uh if if yall remember earlier I I mentioned that uh some of these um agentic frameworks like lang chain come pre-built with some popular workflows. So this is a fancy text summarization workflow that uses map produce. Uh I learned about map produce in college. I don't want to learn about map produce again, but I'm glad that it's already implemented here. And so we can sort of

use this pre-built workflow uh pipe in all this text or text from all these links on the right hand side and then have the language model summarize a answer for us. Right? So that's sort of one example of a workflow. A second is that so there's there's there's different types of workflows. So, this is called a refine workflow where essentially for each of all those links uh you're going to have the language model analyze one uh and then prompt it to update its answer based on a second piece of information. Right? So, we're going to iterate through all those 14 links and we're going to have the language model sort of update its summary or it's or its answer every

time. So, it's a different strategy than map produce uh which sort of tries to distill information into one summary. It's sort of this is more of an iterative process. And so, there's different types of workflows for different use cases. because you sort of have to mess around with what actually makes sense for your use case. So, as an example, uh this use case might be better for aggregate all threat actor activity because it's basically going to look at its answer, look at a new article, see if there's new information about, you know, new activity that that isn't in its current answer and just like append it, right? So, it's it's just a slightly different type of uh

workflow. All right. Uh building agents. So now from from workflows we're going to get into agents which are a little bit uh more complex and nondeterministic copass. Sorry I'll come back here. Uh so this is an example from langraph. So lang graph is you know maybe as the as their marketing suggests lang chain is a pre predetermined chain of actions right lang graph is all of a sudden you have this uh graph where sort of the agent can choose which which which direction it takes in the graph to get to an answer and so they're they're sort of stateful graphs uh just kind of briefly walk walk through this example I think the biggest uh sort of um note here is

that if you go to the right hand side let's just say that you have an agent execute a task, right? If it fails, you can have it in the graph to go back to uh earlier in the in the graph state to edit its its its strategy, right? So, this is a very common agentic pattern where the agent sort of just iteratively tries to find the answer, right? And and and you and you have a a note at the end that says, "Hey, did you succeed? If not, go back, right?" And it sort of replans uh and and tries to actually get to the answer. And so there's a there's a slightly better example of this later as well

which I'll which I'll show. Okay. So generally agentic patterns are better for more complex tasks. So anytime you're writing code uh does anybody use cursor in here? Cursor. Yeah. Uh so uh most of most modern IDE today uh if they have a language model uh embedded are are using complex agent or um complex agentic patterns here. Um, and then, uh, in terms of also a how do I fix this problem? This can be a little bit complex, too. Especially if you're actually trying to generate a remediation playbook, a PowerShell script, uh, whatever you're going to actually fix the problem with. These can be a little bit complex. It's no longer just a summarization problem. And so,

you often need a agent to kind of help you out here. Okay. So, this is sort of a example of a of a um, kind of a simpler agent pattern for cyber security. So let's say you know to kind of continue with our CVE example let's say that we wanted it to help us with remediation right so a is that you sort of have your task and you say hey help me remediate CV 2024 4577 you have an orchestrator agent which basically gets to determine what it needs to do right so it can either go out and query the CMDB to look for asset owner changes made to this asset in the past inventory on the system or it can also

query product documentation, right? So maybe you have index all of your PHP documentation and you want the agent to go out and find interesting information and then you'll you'll want to combine that information into a synthesizer which is basically going to be a summarization but it's sort of intelligent and you're going to generate a uh um sort of like a how to fix it playbook. Okay, give me one sec. Okay, so going back to the slide, uh these integrations today to your CMDB or to your product documentation which is traditionally stored in a vector DB, uh these these integrations are actually pretty tough to build today. Um, if you were to ask a LM to make a API call to

Service Now or to pick your favorite product, um, there's, uh, the language models are generally terrible at it. And so, again, the community here has recognized that, uh, you know, this something needs to be done here. And so, they've built something called the model context protocol. Uh, this is actually was released also by Anthropic and it was adopted by Gemini and, um, um, OpenAI. So, it's now sort of like the new standard and I think you'll you'll you're you'll basically start to see the MCP protocol or the model context protocol a lot more often here and uh I think I just said all this stuff uh but really what it allows you to do is it allows a language model to

start communicating and specifically in these agent patterns which are a little bit more complex with these third parties data sources to get information it needs to make a answer Right. Uh so this is a bit of a architectural um diagram here. Um really what's what's happening and and again I actually think this is what OpenAI and a lot of these proprietary vendors are actually doing under the hood is that they've basically implemented a version of of um this this this protocol uh where they'll be able to go out and get information from from Google or other data sources to actually answer your your question. Uh this is a little bit of a architectural diagram. I

think this one's a little bit more abstract and I think it might make more sense hopefully to everybody. So if if you have your laptop here and uh you have USB ports, right? If you plug in a USB or to connect to your TV, that's that's it's sort of like an adapter to project your system onto your TV. So it's a very similar workflow where you're sort of plugging in these modules and all of a sudden the language model can communicate not just with Service Now your SIM Splunk but you know your Google calendar right it could automatically read your your Gmail and just start creating Google calendar invites right or it can it can start

sending Slack messages for you. So there there's there's there's there's all these uh or um uh plugins that people are building and I and I want to emphasize people because uh really what's what's what's happening now is there's these sort of public GitHub repositories that aggregate a lot of um MCP servers that people are writing or these plugins and they're not really officially supported yet. So this is still very very new and we're probably going to see where this goes in the next 6 to 12 months. But, you know, in my opinion, this is probably going to be the future where you're going to have all these companies maybe even sponsor their own MCP plugins and uh you'll

you'll see the community start to use them. Uh just briefly, some benefits here uh from uh MCP versus traditional LM uh calls is that MCP has memory. So, actually, if you've noticed, if you've chatted with Chat GPT long enough, it starts to memorize things that you've told it before, right? And so very very similar here where maybe the uh MCP server is going to know that you know you like your calendar invites in a specific manner, right? I'm going to skip some of this for time because I know we're running out. And so the idea here uh is that again during our agentic patterns, we're going to have a MCP server that's going to go out and query

information from Service Now about the asset. It's going to pull it in so that it can build a customized remediation plan for you. and maybe it's going to also query Microsoft's documentation to look at how to update a web server, right? So, all of a sudden, you're sort of having this dynamic uh agent that can go out and query information from all these different plugins to help you with your job. Okay, so that was it. Uh sort of closing here. I probably got a few more minutes. I think everybody's uh or a a pain point for a lot of people in uh cyber is how do we trust the output of language models um you know as soon as it starts

hallucinating I think it it almost adds more work because you have to go back and check the answer of the model right so it's a huge a huge problem so there's there's a few strategies here and by the way if you've seen a lot of um a lot of the proprietary vendors over the last six months they've they've gotten a lot better here predominantly through rag which is what we showed earlier. Uh that's just sort of the the default workflow where if if they they basically have a system that says, "Hey, am I confident about this answer? If not, go and look for um extra um um extra info here. There's also um large large

language model grounding, which is a technique released by by Google that tries to it it it enforces a language model to have to have extra current information to actually provide you a a answer. Um, so there's sort of different strategies here that that are approaching and I I think over time we'll continually get get better here. Um, you can also double check the LM's answer. Uh, so if you all have seen chain of thought, uh, you can sort of prompt the language model to provide you a detailed uh, analysis in terms of how it actually got its its answer, you can kind of read read through that. And also if you ask the LM to cite its sources,

uh it's actually it's shown to actually just improve the answer uh just by itself. Sort of skipping through here for for time. I'm going to skip this. Um uh sort of one important type of workflow here that I also wanted to show and we were talking about this earlier is you can have these agentic patterns here where you you you have a language model output some type of data and then you have a second language model that checks the result as a separate entity and and basically says hey does this result look good and if not send it back right so this is basically called a you know you're sort of having a LM evaluate

the results of a previous model and so you can build these with these uh frameworks that we talked about earlier. And so this is a very common IDE uh coding path uh where basically there's a ton of checks. And this is all a typically in a in a graph, but I'm just showing you here just just for a more um um concise sort of view here where where the the the LM is consistently uh has a secondary resource where the secondary resource is checking the LM's answer and if it's not right, it's just sending it back. And so traditionally, this this loop can go forever, right? But at at some point you you sort of

just need the LM to to to fail or um possibly, you know, make up data, right? Uh but over time, I think you'll you'll see these types of of workflows where the where you have an independent um language model evaluator become very very common. Okay, last two slides I think and I'll sort of skip most of this. Um I'll skip that one. Let's skip that one. Uh so this is more in terms of what we're building. Just one quick slide on this is essentially we're we're taking that sort of deep research or the ability to site internal data sources and sort of apply it to your specific workflow or whatever you're actually um reviewing, right? So whether it's a CVE, whether

it's a um alert, we're sort of building that uh rag data pipelines to go out and grab information that are relevant to your workflow. And so the idea here is that not only we we'd integrate with service now but also be Azure, AWS, Crowdstrike, Jura, pick your tool because we don't want you to have four different tabs open, right? We sort of just want a singular view that and and the language model is sort of assisting you in in how you're looking at data. Okay, last slide. Uh so takeaways uh highly leverage rag workflows. Um I'll say also that the space is changing incredibly fast. Uh, I pretty much have to update this this this deck every few

weeks. Uh, I really believe that AI agents and language models will help augment a lot of your workflows. Uh, you kind of just have to dive in. Uh, and I hope this inspired you all to start building. And that's it. Thank you very much.

We're going to make Payton catch his breath and have a swig of water. We have some time for questions. So, if you have a question, raise your hand. I will come to you with a mic and you can ask your question if you would just mention your first name when you ask a question please. >> Okay. So, I'm Sean and my question is when it comes to this research uh for pulling in information, how well does it do the checks and making sure that the information is actually accurate? Because I mean, >> yeah, anyone can publish on Medium, anyone can publish on GitHub, >> but how does it determine where I guess trust like a truthiness rating if you

want to go by Col Bear? >> Yeah. Yeah, sure. uh it doesn't um I think uh what what you'll start to see and what we started to build is basically you you almost have these data pipelines where you look at the reputation of the source and you have to analyze the integrity of the data before you feed it to the context because the language model has no idea uh it has no internal reasoning uh capability so you sort of have to do that for it which by the way is why I think scale AI and all these popular which was just um bought by Meta for 14 billion. Uh like they got acquired because that's their entire

business model is they're tagging data uh so that it's reliable for these uh models. So yeah, it's it's it's still a challenge today. >> Another question here coming over. I'm coming this way.

>> My name is Brian. Um, you skipped Well, you ran out of time, but there's a I just was curious whether you're going to say anything about A2A, um, Google's agent to agent new thing for MCP. >> I actually have not seen that yet. I'll have to look at it. Yeah. >> Thanks. >> Okay. I had another question up front, then I'm coming back to you, >> uh, Karina. And my question is, what uh current LLM uh projects are you working on? >> Uh yeah, so I um I'm the founder of a company that that this is sort of all we do. Uh so uh we're we're we're trying to do actually a lot of what we just talked

through today. Um so using language models to augment CV analysis and uh helping analysts, security analysts fix security problems. >> Yeah. Hello, I'm Robert and I was wondering um with all this research that you've done going forward, what do you think this will do to the career field for people in cyber? Because a lot of what so I'm a student learning about cyber security. A lot of what my professors have told me is that one, not to use AI, but two that it's not going to be important because uh you still have to know what you're talking about regardless of AI. But now that you brought up all this information, it seems a little unclear.

[Applause] Good question. Uh, let's see. Not to use AI. I don't agree with that. Um, I think there's a quote that I'm going to butcher, but it's basically like you will be replaced by people using AI. You will not be replaced. Uh, so I I I think it's it's similar to a huge uh technology shift where everybody started using email, right? I I it's it's just the same thing. It makes you more effective. I do think it's important to understand how the LM is is getting its answers and you still need to understand the the basics of cyber security, right? Like you you you maybe one thing that we talked about today is that you you can't

treat it like gospel, right? You have to still understand and especially for the foreseeable future, uh sort of it can augment your work, right? But you still need to be able to tell also when it's probably hallucinating. So there's there's there's a balance there. Um, so maybe agree that you still need to actually understand cyber security. Uh, disagree that uh you should definitely be using it as much as you can. >> Time for one more quick question if someone has one. Is there one up front? Oh. Oh, there is one up front. Okay.

Thank you for helping me get in my steps. your first name, then a question. >> Hi, my name is Ramon. Um, so I started looking into like I don't know how to say the name, but it's like NA N8. And so what they do is or nan I think it it is a AI agents. And so they do what uh you were saying. And so how important is it to understand how to create these agents to better your workflow? >> Yeah. Yeah. Um well uh AI engineers are definitely the most sought after uh career choice right now. Um you know there's uh I think uh it just was public that Meta offered some OpenAI engineers like

million-dollar signing bonuses and some other nonsense. Uh, I think if if if it suits your personality to to code and to uh build these agents, I think it's a great career choice. Um, definitely won't be getting replaced. Uh, you know, or you could use tools that use them. Uh, I I I think there's lots of great career options. Um, you know, you just got to pick what interests you. >> All right. Well, thank you very much, Payton. Please uh join me in giving Pton a hand and thank you very much for the presation and please accept this small thank you for this pres.

We have our next our next session starts in about 6 minutes in this room.