← All talks

Agentic AI for Cyber Operations

BSides Göteborg · 202636:3618 viewsPublished 2026-03Watch on YouTube ↗
Tags
CategoryTechnical
About this talk
Agentic AI for Cyber Operations: In this session we will cover a variety of use cases where AI implementations are helping SOC, CSIRT, and other cyber defender groups. From MCP to multiple Agentic approaches, we will cover the technical aspects and benefits to both practitioners and leaders. Learning Objective: - Discover multiple AI options for reducing manual effort in cyber operation - Learn how to build AI tools for alerts and cyber operations teams - Understand how AI is changing the cyber operations landscape for faster detection.
Show transcript [en]

All right, let's see. We don't have any attendee. Yes, we have one. So, let's give a minute more before we start. >> Yeah, last session was a little bit long. 90 minutes and that's it. People need some time for taking a rest.

Are you able to hear us, Nikki? >> Yes. Can you hear? >> Can you hear me? Okay. >> Yes, loud. >> Okay, perfect. >> Okay, times have uh reached. So, we welcome Nikki now. And uh it is very interesting topic agentic AI or cyber operations. So we welcome Nikki and looking forward to your information session. Welcome. Thank you. >> Great. Thank you so much. All right, we're going to go ahead and get going. I'm going to share my presentation here. Hopefully you all can see it. Okay. Um so today the topic we're we're going to cover as um Can you still hear me? Okay. Sorry, I had a little delay there. >> No, it's back. It's good.

Perfect. All right. So, we're going to hop into um I think a very very pertinent uh very pertinent topic right now. Uh agentic AI for cyber security operations. We see agentic AI all over the place everywhere. Am I cutting in and out or are you able to hear me? Okay, >> it has happened once more now as well. >> Okay. Yeah, I keep getting apologies. This keeps saying zoom quit unexpectedly. So, I apologize for that. Hopefully, it sticks with us. >> Um, okay. So, Agentic AI has just really blown up, but it's become so important for us as cyber operators to um investigate, see how it's helpful for us. You know, there's uh obviously pros

and cons. Uh so we're going to dive into those. I just wanted to give you a brief introduction to myself. Um I am an STSM and senior manager for AI and platform development. Uh my background is in IT operations and cyber security. Um I feel like I've made the grand loop uh from IT operations to cyber security to development uh and sort of background. I'm also an adjunct professor and I hold two PhDs, one in cyber security and one in human factors. We're going to touch a little on human factors today just because it's so essential when we're talking about automation and agentic AI um and then I've written a couple of books um some patents some

certifications uh let's go okay great uh just quickly this is like kind of my basic agenda I want to cover some very technical aspects but also kind of give you a road map for how you would actually build this and why you would build this what what would you use this for um agentic AI really means it's done in a few different ways But when we're talking about agentic AI, it's really enabling and allowing an AI agent to perform a task. That might be summarization, that might be analysis, maybe it contains a computer, maybe it resets a password, uh maybe it communicates with a user. You're you're giving some autonomy to a what we usually call a digital worker. Um but

you're allowing an agent to perform an operation. So obviously there's operational challenges there. Uh there's some things to consider when you're actually building it. MCP architecture um I've been saying is kind of the new hotness. Uh that's really since the last year, I think it was maybe November 2024, but really February 2025, we started seeing a lot of people build with it. Uh it's really become the industry standard. So we're going to cover that a little bit, what that looks like and why you would use MCP. Some operational uses, governance and risk of course when we're talking about AI. Um and then some takeaways. Okay. Uh, my slides are busy. I apologize for that. I'm not going to

read all of this to you, but I wanted to provide the information for you all because this is a good framework for understanding why we want to use agentic AI, but also where um I I've been saying kind of as we're we got into the end of 2025 and early 2026, I think the AI pendulum has swung all the way that it's going to swing to, you know, we want to use it to everything to people are starting to come back to say, where do we need AI? it's not going to solve every problem. Or maybe it's too much to solve the problems that we're looking for. Maybe we need some basic automation instead. Maybe we need some Terraform or

some simple scripting with Python versus I want to build an entire AI agentic approach to solve this small problem. Um we're seeing a lot more AI tools like um an example would be NADN. They do workflows uh which can be very very helpful for lots of different teams both from a business process perspective and a security operations perspective. Uh but the biggest things we're focusing on if you decide you want to use agents, what does that decision loop look like? Who makes the decisions and where? If I'm running a security operations team or a security operations implementation, I want to understand who's making the decisions where because those decisions are going to impact the security of my

organization. So, uh a very very basic example is do I determine if something is a true positive or a false positive? and do I allow that agent the autonomy to make that decision without human interaction? Um, that's going to be different case to case how you want to handle those things. Um, but just like we've seen in vulnerability management in the past, we're still seeing it today. Um but for uh sock um cyber security operations groups uh security engineering teams uh people who manage vulnerabilities we're seeing a lot of fatigue alert fatigue um vulnerability fatigue uh detection fatigue um we're seeing communications fatigue there's a ton of them there's there's a great paper by uh Dr. Calvin Nobles called uh

security fatigue and there's a subtitle there. Uh I highly recommend it if you're interested in learning about the different types of fatigue and what they mean. But this is why we would want to implement AI in our environments or in our operations processes to help us make better decisions but also make better decisions faster. Uh there's so much data that's coming in. Um, I mentioned up here SIM, EDR, uh, cloud, identity, threat, intel. There's so many different sources of data that are coming in. It is impossible to be able to look at them all through the lens of individual tools. Um, maybe you have a data lake, uh, maybe you have a SIM that pulls all this together, maybe you have

a sore uh, that does some of the automation. it runs playbooks and actually you know enmax uh some sort of action but you're looking to improve maybe the speed the iterative approach to doing that building better detections things like that or making decisions faster uh that's where the reasoning comes in so understanding risk and what it means to us context uh is very helpful when it comes to building agents agents you can build context into what they're looking for and how they act based on the prompt that you build from them or if you're giving them a profile to say, you know, I want my agent to act like a senior sock analyst and what does that mean to

you? Um, you can also build in business context. If you're in the healthcare sector, um, and you're concerned about HIPPA data, uh, you can add all of those things into the prompt. That's not going to say that it's going to be 100% right all the time, but it will give you a certain level of confidence in the decisions that it's making. Um and then action. Where do you want the action to take place? How much action are you comfortable with? Uh there's still some debate I think in the industry on especially security alerting and determination of true positive, false positive and of course action too, right? You don't want to contain uh maybe like 2,000 systems at once. That

could that could maybe have some serious business impact. Um but the ability to allow it to make decisions to a certain point. Um my recommendation and this is sort of a you know some real world experience. Uh just like when we're doing vulnerability management a lot of times the low andformational alerts get ignored simply because there are far too many criticals and highs and then even mediums to to be able to really want to look at the lows. That doesn't mean the lows aren't important. uh even when it comes to alerts, it doesn't mean they're not important, especially if they're used in combination with other types of alerts. So maybe a user is interacting with multiple different systems or

they're connecting from a VPN, but then maybe they're performing some other activity and that generates two different low tickets, but combined that's not really a low. Maybe I want to look deeper into those things, but because they come across as low, I kind of ought to close them and I don't look at them. Using something like agentic AI could help to parse those alerts in in combination that gives you an opportunity instead of having to look individually through every single thing. It looks for patterns. It does pattern recognition to help make those better decisions. So even if it's coming to the human at the end of the day to to to make that decision, they're at least

going to have a lot more robust information uh to help them make that decision. Okay. So MCP architecture and design. The reason I'm wanted to talk about MCP today um if you look at industry patterns a lot of the big companies GitHub AWS I think they have their own MCP reference architecture. There are a lot of organizations that are building MCP into their tooling or making it available to their customers to build agents. Um, and those agents might be agent to agent or agent to tool, meaning that agent to agent, they're communicating together. They're sharing information together to bring to an orchestrator. That's your M MCP server. Um, or you have agents that interact with specific tools. Um, from

an MCP architecture and design perspective, the biggest thing I want to call out here, if you're considering leveraging MCP, or even if you're already using MCP, maybe you're familiar with this. Um, there are a few identity and access management things that you want to be aware of when you're doing MCP. Um, from an access control and privilege perspective, you want to have a really robust arbback model. Um, and you want to look into encryption. It's going to be different depending on your organization, your structure. If you're doing federal data, if you're, you know, you have GDPR or or some other kind of, you know, law or mechanism that you need to follow um to protect data, you want to look into

that, not just from a legal and governance perspective, but also from a uh data protection and processing perspective. agents work best with minimal data sets. Uh I can tell you this from experience. Uh trying to build one big agent to do a very massive task or multiple tasks to make a decision uh it will you'll have varying degrees of success. But if you go with an MCP design and you build several agents that perform individual tasks, they're far more successful uh when they have a smaller data set that they need to look at and understand and um reason. That's not to say that the agents won't sort of share information with each other to help make better

decisions, but giving it a very specific purpose uh helps the confidence level grow uh when you're building agents. Um and then of course from like a deployment stat strategy perspective obviously the best way to do this is devstaging prod um really test things in dev and not just from an infrastructure perspective like oh does our architecture work is this scalable is this reliable but uh transparency and you'll hear these a lot with AI right transparency and um the way that we monitor from an observability perspective if the responses from the prompts are occurring in the way that we think that they should be. Uh we're going to be looking for drift detection. We want to be aware of those things. So,

it's best to test as many different scenarios as you can, as many different types of prompts as you can in dev. Um give it give it a really good test uh before you ever deploy anything in prod because the idea is by the time it gets to production, it's going to be super stable. you're going to get the same types of responses every time and then you'll have that drift detection in place in case the responses uh if you don't get a response for some reason if the response uh sometimes you know agents and AI solutions hallucinate um having that observability and audit auditability in place is going to help reduce the amount um that that will

happen and then when it does happen you'll be able to catch it quickly. Okay. Um, I went with a picture this time, uh, but I wanted to give you an example of what a MCP, uh, server workflow might look like in a cyber operations environment. This is, um, just a total hypothetical example. Uh, but something that you could consider if you wanted to go with an MCP design. You could use MCP just for your sock tools. So if you have SIM, EDR, ASOR, uh you could sort of say, hey, my server is the connection point between these three tools and then I have my agents, uh API wrappers, tools, things like that that I want to be able to um connect with to be

able to make those decisions and interact with, you know, maybe maybe I want to make those decisions in those tools uh or maybe outside of them. Those are those are other architectural designs. Uh the other would be giving an MCP server basically a huge knowledge base. This is sort of the conversation if anyone's familiar or been looking into LLM's large language models versus SLMs uh small language models using an MCP server for a very specific purpose like giving it all of the information and context for your business uh can be really helpful to sort of tune what it's looking for. So for example, if you have a security policy in your organization, uh something that you know um really

speaks to your business, uh maybe your vulnerability response timelines, remediation guidelines, um any laws and regulations that you follow, all those types of things you can build into a knowledge base, including threat intel, anything that's important or pertinent to your organization to help it make better decisions. Um and then uh an example of a third one would be if you're using an ITSM uh are you using an IT uh system management um or workflows either for ticketing or inventory maybe both maybe as a CMDB uh or you want to use it as a communication mechanism maybe for opening uh vulnerability tickets to system owners or providing alerts to the sock um any way that you you might want to

communicate with someone. So all the processing and reasoning is happening on a different MCP server but the communication workflow is happening uh at a at a sort of a different level. Um the idea behind keeping these sort of separate again is to help not overwhelm your infrastructure or to make an MCP server do way too much. Um there's still a lot of trial and error from my perspective in the industry on what works best. Uh again it depends on how large your environment is, how much data you have. Uh you know if you're talking in pabytes of data, you might want to have a few MCP servers. Um so it just sort of depends. Okay. But what does that mean

operationally? I've got my MCP servers up. I've got my agents running. I've sort of determined where I want them to have uh and make decisions. I I've I've made that decision for myself where where I want uh the determination to come from and then who acts when there's an opportunity for a lot of uh pros and cons, right? Um pros being response time. The amount of time for dwell time, basically the amount of time that a ticket is sitting open that maybe nobody's looking at, goes down significantly. If you're able to have um an agent the moment a detection comes in, the moment an alert comes in, it's already doing triage and analysis, it can raise a flag if it sees something

anomalous or something that you know, hey, maybe we need to take a deeper look versus every single ticket that comes through goes immediately to the sock. um even with tuning and tailoring over time, adding an agent can help reduce the amount of time that they're spending um looking at so many tickets all the time. Uh the the human factors point I want to bring out here. Yes, fatigue, but also mental workloads. You know, we we as humans can only take in and process so much information at one time. Um we also have only so much amount of time that we can pay attention to anyone's screen. I think the latest numbers I saw was 12 to 25 seconds in some of the

latest research. Um that's the amount of time and it's gone down uh since 2024. I believe that number was a little bit higher in 2024. So the amount of time that we can pay attention to an individual screen before context switching and moving on to something else is far lower um than I think even even people realize. Uh but we do so much context switching and considering like a sock scenario, you might have 15 different dashboards, different tools, alerts coming in from different systems. Uh plus all the other day-to-day things. Uh maybe you have a flood of alerts uh in a day, right? There's all kinds of things that could happen. Um or you need tuning for a

specific uh type of malware. Uh anything like that. So using an agentic approach can help compress the amount of time that someone is spending on a ticket, which is huge because then you start to think about scale. My analysts no longer are triaging and spending the amount of time just doing the same thing over and over and over again. I'm giving them now time and space to be able to do more. Maybe they have time for engineering tasks. Maybe they have time for project management or program management. um you know, you're really elevating your sock by giving them uh or your security operations team at large. You're giving them more time back to innovate, to create,

to automate other things, um versus constantly spending this time firefighting. Um and then, uh finally, investigation, depth, and consistency. You know one of the things that um not just from a sock alerting perspective pick any of your IT ticketing or um any of your workflows without automation uh we're human uh we write differently without a template or a mechanism we are looking for different things you know I don't expect myself as a developer and my friend a developer sitting next to me uh to have the same experiences to write the same way to comment code the same way it's the same in security operations right We all look at and investigate in different ways based on our experiences. Some of that

comes down to perception. Some of that comes down to experience and skill. Uh some of that just comes down to um you know the amount of time maybe we have to spend on any given task. Uh but leveraging an agentic approach to give you that initial summarization or that initial reasoning and logic behind um a ticket helps provide consistency from an analyst perspective. they know they're going to see the same information every time. And when something looks anomalous, just like they do with anything else, uh that raises a flag and they want to investigate further. So, it helps eliminate maybe missing information or uh running through the information in different ways or looking for different things.

You know, I think you could talk to any security operations person. They probably all use different tool sets um unless they're standardized by, you know, maybe their organization. Um, but I think it's worth looking into. Right. Okay. Very quickly, uh, because I want to make sure I get to my uh, last slides. Um, model reliability and hallucination. This is for any type of AI approach, especially when you're using LLMs. There does seem to be, I would say, higher confidence levels with SLMs. And by SLM I mean you have a language model that is trained specifically for a task or a group or um an individual space. So like for security operations that SLM would be

trained only on security data. It's not going to be like like an LLM has access to everything on the face of the planet. Anything that's internet facing they're going to have access to. They may have their own integration and reasoning and all of that that that sits in the LLM, but they're going to have access to so much more data, which is why prompt engineering becomes such an important thing when you're using LLM. But if you use an SLM, that could help reduce uh some of the risk that that you might see. Um, and of course, having audit logging and explanability. Explanability is helping your user understand why it made the decision it made. Not just the

decision it made, but giving your user, your analyst, your engineer the opportunity to understand the why. Because if there's a problem with how it came to a decision, there should be a mechanism to help it make better decisions. You should, you know, in pick your favorite uh open-source LLM tool uh chat interface. Most of them, if not all, have an option for a user to say, "Thumbs up. Yep, that's a good response. I was expecting that." or thumbs down. That is not what I was expecting. That's not a good answer. We need to refine and fix this. Uh all of those things are super important. Um allowing your users to help it understand when it's

performing well and when it's not so that it can adjust. Okay. Again, I know it's a lot of text. I apologize. uh but a lot of these things are super important to understand where the human stays in the loop and how we collaborate and coordinate with AI. I think there was this you know there I'm still I think there there's this fear that agents are going to replace humans. My personal opinion is that we are going to collaborate in different ways. Agents are very helpful for certain actions, very very helpful for certain um roles. Um they help elevate us to be able to focus on really more complex problems. Take out the mundane, take out the boring tasks, the

same things that we have, repetitive things that we have to do every day, help us to make those decisions, those low-level decisions that I don't need to spend my time on. I can spend time on other things. Uh so just quickly um the human AI collaboration model role definition is super important. So I'm going to touch on this in a second but uh but basically like having a a racy matrix right understanding where humans interact which and which humans right which operators uh and which agents perform what task. That's super important. It's just like you would do if you're working um on any team, right? You want to understand where cross teamam collaboration is going to happen.

You want to understand who's responsible for risk uh patching like all of those things, right? So that helps us to understand when we're building an agent or an orchestrator agent approach uh how we want to do that and who's responsible for what. uh decision authority model again that kind of comes in with a raci but sort of saying who makes the decisions and where this may change over time. The idea being that you know sort of at the beginning of your agentic implementation journey you're going to want to start with lowrisk uh actions right versus um I'm going to contain all the systems the moment that I see it I'm going to allow my agents to do that. So

you kind of build up from there. You give it an opportunity to maybe you're training and tuning a model or maybe you're just saying I don't feel quite comfortable enough that I don't feel confident enough in its logic and reasoning to allow it to make decisions. Then you go into implementation or excuse me augmentation. That's where we say I want to help my users. I want to help my analysts. I want to help my engineers. That is the entire motivation of building and using and leveraging agents. I want to help my engineers not burn out. I want to help them not feel stressed and frustrated and dealing with the same repetitive tasks over and over

again and offer them an opportunity to upskill, offer them an opportunity to work on other things, too. Uh all it does is really upskill everybody who's, you know, who's involved. So, it it it can be very very helpful. I I like to mention that because it's about augmentation, not replacement. We're not talking about replacing people. We're talking about augmenting tasks to help our uh users not burn out and feel frustrated. Um again, another little human factors piece. Um and then of course trust and transparency. You want to be able to see what your agent is doing and why. You want to have great logging in place so that you can trust it. You want to get

to a place where you can trust it. That trust is earned and takes time. Um but it also should be I think given an opportunity to grow like we do maturity models with most things. An agentic approach is going to be a maturity model. Start with one type of alert. Start with one type of detection. You know it doesn't have to be build agents for everything. Um I think that approach can be um can be kind of a tough cell, right? But uh if you start kind of small, build an agent for one task, one alert, uh then build it for another, then another, then start to really grow and build your agentic, you know, kind of approach however you

want. And that's when you start to build your decision authority model. When do I want it to take action? How do I want it to take action? And then allow those things to to grow naturally over time instead of uh AI for everything. Okay. Strategic takeaways. Uh since I am I think almost done here um we're really talking about you know I feel like we were talking about reactive to proactive for a long time in the security world right we don't want to we don't want to just be reactive I think there will always be a reactive component to cyber because uh cyber attacks are constantly shifting and changing and apt groups are always evolving and changing their their

methods so there will always be some reactive approach but how do we get to an adaptive approach. It's not going to be a one-sizefits-all. An agent is not going to solve all of our problems, but it does help give us an advantage over the way that we're doing things today, right? manual processing, um, making decisions on all the all of the alerts or all of the tickets. Allowing taking some of that burden off of our, you know, the the way that we have to interact with all this data helps us to look at the data better, make better decisions, and ultimately reduce risk in the environment. That's really what we're after, right? We want to reduce risk. We

want to be notified if there's something that there is risky that I want to be paying attention to. But if I spend my day looking at a hundred tickets and four or five of them are, you know, really serious or I want to look at um one, there's obviously some tuning that needs to be done. But two, can we use an agentic approach to help us grow and build that so that I'm not constantly looking at 100 tickets. Maybe I can focus on four or five and do an even deeper dive uh on some of the other ones that might have gotten missed before. Um so that's that's about it for me. Um, I know I've got a couple of minutes here

for uh for questions. I just wanted to share my books. My human factors and cyber book uh is coming out I believe April 6 um where I discuss some of these other things. It's not AI specific uh but more about how we interact with data um and uh how we can make tools uh better and easier for us to use. >> That's it for me. >> Yep. Thanks a lot Nikki. It was a very deep dive in very short time. So really appreciate the effort to put a lot in few and go very fast uh to make it valuable. So thanks a lot. We have one question from Peter. Do you want me to read this one or

>> how do you recommend because >> Oh, let me see. Yeah. Okay, I see it now. Um MCPLM's dealing with SIM threat intel dealing with potentially malicious data. How does one ensure captured malicious data is not Yes. How is it not able to become a second order prompt injection? Yes. So this is exactly where the data protection and arbback models are so so important. You absolutely want to keep your data in protected spaces. And that's why leveraging you know one MCP server for everything and doing all your integrations in one MCP server is probably not the right way to go. Uh just because if you do that you expose it to everything. Um, and we talk a lot

about uh obfuscation, right? The ability to um not have everything all in one place, right? Even if they can get a little bit of data, we don't want them to get it all. Uh, so to help limit the uh blast radius, keep that some of that data separate, use agents for different tasks, um, and use very very robust arbback uh, mechanisms to to protect those agents too. >> Okay. So I hope Peter do you feel satisfied with the answer can give a thumbs up if you want. Perfect. Thank you. I was uh thinking of a question as well. Before that I can give to anybody else. Anybody else want to question? Has anything they want to

say? All right. So I I felt that you were going towards drift detection. So what is that as compared to hallucination? >> So drift detection. So hallucination could be uh my my favorite example for hallucination is uh if you've ever seen it when you're using an LLM interface, you ask it questions and you get sort of the same answer over and over again, even if they're different questions. So it's hallucinating like what it thinks that you're asking it. Um drift detection is more like the amount of confidence that you have in the answers changes over time. So instead of thinking that oh uh I'm consistently getting uh a false positive on this type of alert over and over and

over again all of a sudden it starts to drift and say actually this is true positive true positive true positive and that may or may not be right uh based on the data. So, it's helping to determine those patterns to make sure that there hasn't been drift in the way that the um LLM or your reasoning portion of your um system is uh drifting away from what the intention was, which was it to determine false positives versus everything's a true positive. >> All right? Because the context for me was that you might have a new data as compared to what was defined in the past and that can cause a more conflict on this definition of understanding a

problem. So you might be legally going towards a drift but maybe the system will stop you from going on that path and think you're hallucinating. Yeah, there's uh the other interesting component of AI that we don't talk a lot about is the way that we use language in AI. When we talk about drift detection or like observability, you when we used to talk about observability in it, it was very much networking, is everything up and stable, is it scalable, reliable, all those things. Drift detection can mean a couple of things. And the way that I meant it here was more the responses that I'm getting are drifting over time into decisions that I was not

expecting it to make. >> All right. And then we are very focused on the sock perspective if I understand correctly because then the subject matter experts are sitting to observe that uh drift >> that actually that should be done by automation. So drift detection should should really be automated, but it sort of gives the signal to the engineers, whoever actually architected and built the solution to say, uh, hey, I'm starting to see these answers are getting a little wonky. You might want to investigate. Uh, just like you would if you started to see network signals or DOS attack, signals of a DOS attack or something like that. Uh, it just sort of gives us a signal that maybe the

responses we're getting are not quite what we expected. >> Right. Nice. I had one small question or observation I wanted to share uh with you that you mentioned that it's much better to have smaller agents. I support that but it felt like lot of uh shadow IT oriented goals as well. Maybe if we don't maintain them or manage them then it they can become like dormants in the future as well if we don't nurse them in the long way. >> Yes. Having a very robust agent management system is so important because if you're just building agents, that's why I was saying like you don't just want to build agents to build agents. They should have a purpose. Uh

but it's just like shadow IT or anything else. You have to have a really good inventory and catalog of what agents exist. Um and why you're using them because it's just like any other technology, right? Over time, we want to evaluate if it's even still working. Is this something we still even need or are we do we need a different solution? Uh so I think it's one of those things that not only should you have a great inventory or catalog of your agents, but they should be evaluated over time to see if they're affected, if they're required, if they're needed, uh or if they need tuning and tailoring just like anything else. >> Just last thing on this one, who owns

that inventory and managed it? Is it a sock or is it some IT infrastructure? I think it's IT infrastructure or the or if you have a security engineering team, it would be for the security engineering team to manage. Uh because I do not expect sock analysts to be able to build and maintain an MCP architecture along with their day job. Uh but having an IT partner or a security engineering partner that can help build this and monitor it, uh that would be definitely the best way to implement. >> All right, perfect. I think that was very nice of you to give more details on this. All right, if no other more questions are here, so then we close this session

and wait for the last session. Thanks a lot Nikki again for very good presentation. >> Thank you. Bye everyone. >> Thank you. >> Thank you. Bye.