← All talks

Building a Practical AI Assistant for Security Operations

BSides Lisbon · 202523:18155 viewsPublished 2026-01Watch on YouTube ↗
Speakers
Tags
CategoryTechnical
StyleTalk
About this talk
Security teams often struggle with alert fatigue, analyst skill gaps, and the complexity of SIEM query languages like Lucene or SQL. As a result, incident response slows and results are inconsistent due to varying analyst expertise. This talk presents a practical approach to bridge the gap using an AI-powered security operations assistant. The agent translates natural language into precise SIEM queries and safely integrates with detection and response workflows. Built in Go with Anthropic’s Claude, I demonstrates how large language models can be used to operate in security workflows with strong guardrails. Attendees will learn how to build an agent that: Converts natural language requests (e.g., “investigate failed logins for these IPs in GCP”, “for alert X, did this user log-in using MFA from a trusted device?”) into valid queries Correlates data from sources such as Okta, GCP Audit Logs, Cloudflare, and Google Workspace and writes technical reports for findings Safely manages alerts with human-in-the-loop guardrails The session includes technical architecture, real-world use cases from IR and threat hunting, and code examples to kick-start your own AI-augmented SOC tools. Attendees will walk away with implementation strategies and lessons learned from using AI agents in live security environments. About the Speaker: Security @ Sourcegraph & AmpCode Attempted follower of my own thoughts. Addicted to coffee and code.
Show transcript [en]

So, let's welcome Vince back. Vincent back to to Besides and Sage is yours. And afterwards, we'll go having drinks. Again, it's just a reminder if you won't come back to the venue uh and you don't want the badge, do not throw it away. Please leave it at the at the desk. Thank you. >> All right. Uh hello everyone. I'm Vincent. Uh I work at Source Graph. I work on the security team. You might know us from code search or from the agentic coding agent app. And today I'll be talking about building agents. I think there are many people who like to work with agents. Um but I don't know how many people actually wrote their own

agent. And that's what I want to want to talk about today because I think it's very easy to write your own agent especially with all the coding tools available. You don't need to buy like an expensive vendor product that makes all sorts of promises and you don't know how they handle your data. Um so if you write one from scratch or with some SDKs uh you can also build a very very good agent. Um my colleague my colleague Porson he wrote a blog post in April of this year where he basically already explained how to make an agent from the ground up using the entropic SDK. Uh it's a really good blog post. I'll be

running through parts of his blog post during the talk. um as he just explains really well how to build an agent and then from from the basic agent we'll just add some tools and then I'll show you with some demos uh what your agent can look like and what it can do and what you might not or what you do want it to do um so yeah great post read it it's uh definitely worth your time um so basically a small primer on what are LM again I think many explanations have already been there but just in case you don't know uh it's basically a large machine learning model that they gave lots of data to like books, blog

posts, YouTube videos, whatever they could transcribe and dump it into a model and essentially you have something that can speak text and language and understands us. Um, and obviously they got all that data through legal means um as as good citizens should also do. Um so commonly I guess most people use chatbased AI agents um or just chat right you talk into a prompt you press enter you get your output you download the image or you download whatever generated you paste it into a file um and yeah that's actually the first iteration right of of LLM usage um but yeah we'll be we'll be writing an agent so if you want a basic chatbased agent

um you can you can use the entropic SDK um And basically you can um yeah basically define a little buffer where you can get input from. Um you add the input function to your uh uh to your to the constructor, right? And then you add your prompt over here. Then you wait for the user to submit input. You add that to the beautiful entropic SDK, you know, user ro JSON JSON structure. you run inference and then boom, you know, you go over the output and you have a chatbase agent. It's like maybe 200, no maybe 40, 50 lines of code and you have your own agent. Um, but yeah, someone thought of course like, hey, what if we

give this thing tools? What what can it achieve? Maybe it can like do work for us so we can automate ourselves out of our job. Um, so yeah, LM is of course text based and LLM tools are also defined as tax. So as you can see on on the left part, I I hope you can read it. um you can easily define like um yeah a tool function in JSON you you get the tool name you get the description the function and the parameters and then when you call you call inference which your tool inside the prompt right because we have to add that to the prompt the agent can then see like oh the user asks about the

weather so then it in first like oh maybe I should give it an accurate weather reading I have a tool for that it can then call the tool and then your your code will and of course return return the weather and then the agent comes with a nice little message to tell you what's the weather. Um in code it would look roughly like this. This is from the from the blog post of Torson. You have your tool definitions at the very top. Um and then of course you have a very safe read file um to actually you know execute the tools. Um you have some schema shenanigans right because we need to send all that JSON and all the

information to the prompt to the LLM. um we run inference and of course we add the tools to the prompt and then when we run inference um yeah we can actually execute the tools um and of course get the nice AI response back. Um so for security like why would I want to make an agent? Well input processing that an LLM can do it's really fast. It can infer knowledge or meaning from logs or piece of text just really fast. Uh much faster than I can and probably also faster than you can. And most of the query languages and alerts that we have, they're all textbased, right? You have training data to train your new

employees on how to work with these tools. You probably have playbooks on how to investigate certain tools. And yeah, the query is of course probably text based as well. Um, and for me personally, if I have to like look up a f an issue, like, oh, someone logged in from a new location, they have to grab data from seven data sources, it it gets a bit boring after a while. Um, so yeah, like I previously said, you probably have all your playbooks and rules in text. This is an example of a playbook we have at Sourcecraft. When someone logs in from a unknown location with a service account, uh, this rule fires. Um, we already had some human

instructions that humans could of course follow to to find out who used the service account. And I just added some agent instructions to there as well. So if my agent sees this alert, it just knows that it can run this query and then it gets like the impersonation event for that user in the GK auto and then it can cross reference itself with some other data sources and can then tell us who did this, why they did this. That's actually pretty cool. So yeah, if you have any cheat sheets or any type of information on how to use indexes at your work or databases, you can all compile that into the agent um so it

knows what to do. Um so yeah, some tools that we can give to our agent includes of course log searching. We wanted to be able to run aggregations, right? If you don't want your agent to query like 200 lines of 200 lines of logs or 2,000 lines of logs, it blows up your context window. You lose lots of tokens, lots of money real fast. um you wanted to be able to find open alerts. I also added functionalities to the agent so it can close alerts on my behalf. Um I also have some additional context from other data sources like but you can add anything you want. You can use the OTX database to make it query certain uh

certain indicators of compromise. You can make it do who is for you query metadata from your Google cloud environment or your AWS environment. all sort of things that that you might not put into your scene but of course a human would use. Um but of course with tools come prompt rag and with prompts the context window increases and what researchers have found is that the more tools you add the more context you add the more prompts and then of course uh yeah what happens if the context size becomes too big the performance of the LLM model declines. So if you're using MCP servers for instance, uh they come with lots of tools and they also have lots of

documentation for your tools. And if you only use one function from your MCP, um yeah, you still get maybe like 10k tokens or 20k tokens, sometimes even 200k tokens uh inside the context window that kind of ruins the the LM. Um so yeah, you have to think about context management of course or context engineering as they want to call this uh when you work with with language models. Um, so yeah, I kind of detest using web UIs back when I started as a pentest because, you know, I'm I'm old now. Everything was terminal based. We use SSH and netcat, whatever. Uh, nowadays I do my job in a in a browser and I find that very unfortunate. Um, I find the

elastic UI to be difficult to use. It has lots of information. It's not customizable as I wanted. So I decided to write a CLI and we're using Go. Um then we have like the amazing bubble tea framework we can use. You can of course use some go SDK and um yeah and the elastic search uh SDK as well. So this is an example of the the agent tool that elastic provides us with. I think it looks absolutely horrible. It's too much information. Uh I don't know. I don't even know how to use it. Somewhere I can type I guess and then press press enter and it's going to do some stuff for me. It also comes with research summarized

for you if you want that. like I I didn't even know they had a blog, but but apparently apparently they do. Um, so yeah, that is not what I want from an agent, right? I want something minimal that follows my instructions. I type a query, I type some text, it runs a query, it summarizes the output for me. Uh, that's what I want. Incredibly lazy. Um, so yeah, how would you implement some security tools? Well, of course, you have to start with metadata, right? They have to tell the AI all right you want to run elastic search query you give it index give it a query default fields we define all that just like we

saw in the example from Torston and we of course have the actual implementation right this is what the model will call uh his input from the model ends up there we do some input validation and then essentially we just call the implementation of our interface um in my case it looks like this because it's of course elastic search so here it can just do a simple query string um it knows about lucine. I'm using entropic SDK. It knows lucine. I don't know how they know but yeah probably trained on a ton of data and so it can just run queries fairly easy. We call elastic search over there and then we just return the JSON to the agent. Um

because I don't want my agent to start like searching in my seam and look at open alerts because you know the context window explodes. for getting open alerts. We have a predefined tool and we can have more predefined tools that basically always gives back the open open findings because there's no you don't want the agent to to do that for you because they get creative. They might not do always the same thing. You want everything to be reproducible. Um so yeah, the input in this case is not used. Um yeah, this is how it looks like. It's a simple uh call to elastic. Of course, there's a transport that adds some tokens and some um some rate

limiting, but in the end, it's it's it's a couple lines of code. Um and now, yeah, we put it all together, right? When we want to use our agent, we have to give it the tools. This builds builds everything up. The SDK we're using, it has some some odd things that I have to find to be able to have memory. When I first used the SDK without setting this up, it was just always Yeah. only one message each. I don't know. didn't kept track of uh of the previous messages and then we just add that to the to the agent and and that's it. Um this SDK that we're using the engine SDK it comes

with plan approvals. Um our agent is relatively simple and we use it in supervised mode. Um so we don't do approvals we just let it execute tools when I ask it to execute tools. Um to put the two together of course we define a basic logger. We have a agent constructor never give the logger. Um, and then we have a submit function much like the one from Torson, but now only it looks a little bit fancier than the example one. And of course when events come back from the TUI and from Entropic, we need to we need to put the information back in in the user interface. And here we actually create the CLI. Um, so yeah, what does it

actually look like when we're when we're using this? And is it any good? Nice. This actually works. Cool. Um, because I'm Yeah, I'm old, right? doing live demos gets my heart rate up. I'm already using Linux, so I don't know if my display is going to connect or not. Um, so I recorded a demo. Um, I set up an elastic search cluster for for this demonstration. Um, and in an elastic search cluster, I've done some nefarious things. I created a service account, um, an editor in one of the in one of the projects. I downloaded on my machine and I took some actions and triggered some alerts. Um, and the agent now is going to investigate that for us. Um, first

I'll show you the open alerts. I also made a demo for that. I'm sorry. Um, it's just just to show you what the UI looks like. It's um, it's a bit tedious. You have an investigation guide. Um, it is rather large, but it is comprehensive. Um, and then when you want to look at the data there, you have to basically scroll down a bit. Of course, the alert that you give it in Slack, it has some more information, but here, yeah, it's just a bit too slow for me to find out what's going on. And the overview doesn't really show much. The table is a bit annoying to use. Um, but yeah, it gives you all the information that you want

essentially. Um, now I need to go back and click the actual agent demo. Um, so yeah, I'll run the agent here. um type a question like what are the open alerts? It starts thinking of course because it has to think. Um it gets the open alerts. In this case there are two pot exec alerts and there's one account creation alert. Um and then of course it needs to think of course it has a bit of yeah it it like one of the audience members said today it likes to think alerts are more critical than they are. um the the Kubernetes execs and the key creation they are low like severity is low it's defined as low um but it does say like

yeah this is a high risk you can tweak that out in a prompt um so it's less creative um but yeah now we see it actually um yeah go through the logs it found my IP uh it's looking for the time the time stamps the the right ranges it then found my user um and now it's going to see what this user did it already serves a concrete creation and the thinking of course is seems very hard. Um that's what we wanted to do. If you tell entropic to think hard the models will actually think hard. I don't know what it means but that's what it does. Um I think it's now summarizing the findings.

Of course I'm connecting to entropic USA from Portugal so it's a little bit slower. So yeah, as you can see, it gave gave a pretty good overview. Um, it does classify it as um, yeah, critical, right? It says like, oh, someone did a privilege escalation. A user downloaded the service account key, put it on his machine, um, ran some commands. I used the KI Linux one, which of course for the LLM is a little bit alarmist, right? But overall, the assessment is is solid. Those are indeed the actions that I did. You can look at the logs. It confirmed malicious. Um, yeah, I know. I don't know how it confirmed. It didn't ask me if I did this maliciously, right? Um, so

yeah, you have to tweak it out a bit uh in the system prompt. And this is of course from my my demo tool, the one that I built open source for for this presentation. Um, but yeah, overall it does a really good job at explaining what happened based on the logs. Um, yeah, now I wanted to close the alerts, right? because I was spent testing in my in my project. Um, so yeah, I tell it to close the alerts and and it does. So, um, and that's pretty much it. I'm just curious like how many folks use use an agent like this at work? Uh, a security agent. I don't see many hands, but I see a few.

All right. Um, yeah, now it will call the tool again to verify if everything's closed. But yeah, that's pretty much it. Um and that is not what I wanted to do. Okay, cool. So yeah, that's it. Uh in my work, of course, there are more data sources uh in our elastic search cluster. Um this is like an overview of what what a real example would look like of of the open alerts. We have some MFA findings, some suspicious Google login. I asked the agent to investigate the MFA findings. We recently moved to different GB keys, so lots of employees are changing their MFA and it does a really good job of verifying if it was legit or

not. Basically, what it did, what I tell it to do in the in the prompt is to verify whether those MFA activations are followed or before they happened um were actually authenticated from a managed device. Did they use their managed device? Did they use Octa? Did they use Octa Verify or a UB key before? and did they then also use Ubikis after and was it from a consistent location and yeah it actually answers that real quick. Um this is another comprehensive example that uses like I think four data sources. Um we use Falco for some some of our cloud runtime environments. Um and here I was doing some maintenance in one of our uh one of our projects. So it triggered an

event like oh someone xacked into a falco pot. Um and yeah here it combined data from GKE audit logs um from falco logs. Okay. And also because we use just in time permissions at source graph um which I talk we'll talk about a bit later um it can also go to that data set and be like okay at the time of this event what permissions did Vincent have and yeah it found three black boxes of permissions but essentially it's the permissions that I required for tweaking the configuration of the customer instance and someone from my team has to approve that. Um so yeah once it knows all that it's very easy um for it to

make a make a good judgment of whether it was malicious or not. Um and yeah it's it's so yeah finent had all the valid entitlements use regular IP addresses looks legit. Um which I think is is pretty useful. So as I mentioned before like our agent um can access information about our real-time permissions. It's just because we use entitle which is a permission system. So basically let's say I want to access a cloud customer instance I need to select the permissions I need to enter like why I want to access how long I want to access and the moment someone approves it it registers in our system and and we query that as part of our agentic loop

of course we have octa logs and device management logs um they're always good to have a look at and one password audit logs like audit logs you name it all those data sets are very useful we also gave it some tools to talk to GCP because most of our uh projects have random names So the we don't know what customer it is or we don't know what service account is used for because we have to read the description and the agent can then call uh some APIs to to get that information. Um people like talking about guardrails. I think it's good to have some guards for your agent. Um the SDK that we're using has content filters, PII filters,

tool execution blocking, lots of things. Personally, I I think filtering is is not very a very good option because if the LLM wants to or if someone wants to submit PII, they can just slightly change the format, right, of the PII and then your filters wouldn't catch it. So, it's kind of like a cat and mouse game. What I know with my colleague who added secret reduction is that the better your secret reduction will be um it will also be more false positive prone, right? Um so, it can also just ruin the experience. So if you want to use LLMs, make sure you are authorized to use it on the data um instead of building other

guardrails into the tool. Um I also am not a big fan of using agents with privileged access in the background. U people like asynchronous agents. Well, when it comes to security work, I I just like to have like a chat buddy, right? I chat. I say, "Hey, look at these alerts. It looks suspicious to me." I just keep it under close supervision all the time. Um, our agent can also write focal rules for us because falco rules are simple text base. Sometimes something changes in one of our in one of our base templates of the images. Um, and then I know it's a false positive because I wrote the rules. I can tell the agent

look at this alert, update the falco rules and then I check the div and then I and then I submit and then everything's fixed. It's just very tedious to do this by hand and with an agent you're it's much easier for you to be on top of your rule sets. I'm already over time. Um, I thought I was talking really fast and but I had lots of slides too. Um, if you want to give the agent file system access, either use a sandbox or use path reversal resistant file APIs because agents are creative and if there's a little bit of a prompt injection risk, uh, better safe than sorry. My agent cannot create files. It

can edit files and read files. Um, but yeah, it's fairly limited. If you want to prompt your agent to close alerts and then you have another approval box, um, that is very tedious. So I I I took that out so the agent can close alerts uh when I ask it to or when someone prompt injects me in like a user agent header I guess. Um yeah I tried using this with Olama and Quen the quality of the agent is is not as good. Um also because the tooling is not as mature. I'll keep experimenting with it and hopefully I can get it into a better state. Um, if you can't code, just download my example. Uh, let an AI, right, implement

the interfaces for you. Um, all the data is out there for most scenes. I quickly let it implement something for Panther. Um, Panther has views markdown for the API. You click that, you give this to your agent in a URL. You give it a prompt. It looks at the examples of the code that's already there. And then it just uh, yeah, you grab a coffee, right? Because you let the agent do its job. and then when you come back it wrote the code for you. Um, if you don't know how to code, you should do that. If you do know how to code, uh, be a bit more specific and give it some good examples.

Um, and it's 30 cents. It's it's not super expensive, uh, to do so. Um, yeah, that's it. Uh, what I want you to take away from this talk is build your own agent. See what it's like. It's remarkably simple. Um, of course before you use it in a professional environment, make sure your legal heart agrees with you shipping these things to an LM provider. Um, trust but verify. Allow the agent to do things, but please, yeah, verify if if it's yelling about an incident, but it's just me pentesting. Um, and yeah, code is cheap. You don't need to be a super software engineer to start writing code. You need a basic understanding of what code

should look like. And of course as a pentest or security person uh you should test your changes. Um that's it. That's the talk. And

any questions?

No questions. Oh no.

Cool. Time for beer, I guess. >> Thank Thank you, Vincent. Thank you, everyone. So, and have