BSidesSF 2025 - AI's Bitter Lesson for SOCs: Let Machines Be Machines (Jackie Bow, Peter Sanford)

Name: BSidesSF 2025 - AI's Bitter Lesson for SOCs: Let Machines Be Machines (Jackie Bow, Peter Sanford)
Uploaded: 2025-06-04
Duration: 44 min 53 s
Description: AI's Bitter Lesson for SOCs: Let Machines Be Machines Jackie Bow, Peter Sanford We've been forcing AI to imitate human analyst workflows, but what if that's holding both machines and humans back? Through real-world experiments at Anthropic, we'll show how letting AI tackle security problems its ow

BSidesSF · 202544:533.4K viewsPublished 2025-06Watch on YouTube ↗

Speakers

Jackie Bow Peter Sanford

Tags

StyleTalk

About this talk

AI's Bitter Lesson for SOCs: Let Machines Be Machines Jackie Bow, Peter Sanford We've been forcing AI to imitate human analyst workflows, but what if that's holding both machines and humans back? Through real-world experiments at Anthropic, we'll show how letting AI tackle security problems its own way can allow humans to focus on the nuanced work machines can't do (yet). https://bsidessf2025.sched.com/event/28905be132f57d745e618a218928fe75

Show transcript [en]

I'm going to hand hand it over to Jackie and Peter who's going to share with us bitter lessons of uh AI as they apply to sock. Can't wait to hear what they have to say. Thank you so much Jackie and Peter. Take it away please. Thank you. Hello besides day two post lunch. I hope you all are nice and full and you're going to bring that, you know, just like post-launch comfy movie theater energy for our talk. Um, so we're really excited to talk to you about what we've been learning building, um, AI assisted detection and response um, investigation and triage systems both powered by and designed with uh, AI. And as introductions, I gota

remember not to step back. I'm Jackie Bo. I'm Peter Sanford. And we're also presenting on behalf of our colleague Jack Adamson. And we are the detection platform engineering team at Anthropic. Oh, and I'd be remiss not to mention Claude. Claude has really become one of our collaborators, one of our co-workers on this project. So, we do need to give him his fair shake. So, what is the current state for any of us who have been working in threat detection and response? It's pretty similar to how it's been for the past 15, 20 years. We have too many alerts, too many signals, too many logs, and um we're just inundated with false positives or we're overtuned and we're

missing things and there's just not enough human eyes to actually look at all of these things. Um and as human analysts, we are tireable, fallible, and we don't really get offered a lot of great tooling options. There's a lot of great tooling out there, but for things that actually fit perfectly our use cases, we find that it's still lacking. I would guess that some people in this room are feeling pretty skeptical about AI. Uh, and that's great. Uh, I personally have an aversion to security products touting machine learning features. And it it sounds crazy that that's the case, but why why do I feel that way? Uh it's because I've been burned by by Blackbox ML where the

models work uh in the vendor's test environments and they are great demos. Uh they have great demos but when I actually apply them in my environment they fall over and the reason that they fall over is that my data doesn't match what the vendor built their model with. And I asked the vendor about this and they tell me, "Well, we don't know, but if you give us a bunch of samples, we'll try to incorporate that into our next training run and maybe it'll get better over time." It's so incredibly frustrating and we want to uh build something that is not that. It's easy to swing the other direction though towards a sore mindset where we think that maybe we can just

codify every possible investigation and response action and that should be good enough to help us with the burden of more and more alerts. And I mean it is true that soar and automating playbooks can save you time and has been helping analysts out. But when we get into this headsp space of, oh, we just need to codify exactly what a human would do in every situation, we fall prey to kind of oversimplifying the process and missing what a lot of uh the greatness of human investigation brings, which is creativity. It's trying things that haven't been tried before before. And um in this world of security, orchestration, automation, and response, we aren't really given this freedom of

creativity or this way to try a lot of different paths out. Exactly. I think uh soar is so great when you have already seen a thing in your environment, but it doesn't help when it's a complete new type of attack that you've never seen before. Um this reminds us of an AI concept called the bitter lesson. The bitter lesson is the idea that uh trying to encode what we think of as human specific thought patterns into models ends up being worse than training models generally. The big takeaway from uh the bitter lesson is that general methods and throwing compute at the problem win out against clever algorithms and encoding human reasoning into models. So some examples of this are training uh

AI models to play games. Um early on uh game engines would try to encode uh human strategy into how how the engines played games and that worked okay. Um but those engines could not beat uh later models designed where uh there was no strategy encoded into the engines at all. And uh the models were built by just training them to play against themselves with different variations and to learn for themselves what strategies worked well and what didn't. And those models came up with new strategies that no humans had had ever tried and uh were significantly better than anything that had come before them. So Peter, are you saying that if we let the models cook, we get better results?

That is what I'm saying, Jackie. Let them cook. So I actually stole this directly from anthropics documentation when we released 3.7 um extended thinking. So the the quote goes, "The model's creativity in approaching problems may exceed a human's ability to prescribe the optimal thinking process." And I remember coming across this as we were having these discussions about the bitter lesson and realizing that we are in this world where we're seeing the direct implications of the bitter lesson on a much larger scale. We want to take a little step back and say that we are practitioners. were not full-time AI researchers. Um, so we've really been coming at this idea from the perspective of how can we start using

the learnings of the bitter lesson in our day-to-day work while continuing to do our full-time jobs of keeping our companies safe. The good news is that it's really easier than ever to start trying these these techniques out. Um, we're going to walk you through uh the tools that we built to improve our detection investigation process. um as we continue to do operational work uh using just a very small team of of engineers um and really leveraging uh Claude to help us with that process. Yeah. And so we're going to go over some of these foundational building blocks that went into building the systems that we're going to talk about. We're going to go through these relatively quickly

because we want to leave time to actually show you some demos of what we've built um and also of course take your questions. I think it's worth noting that uh we worked on this system as well as doing our you know core detection and response work um for around like just over three months I would say and we have built the prototypes that you're going to see today. So first previously we were kind of blocked on needing a purpose-built model to do detection and response. If you just tried to use a foundation model to triage alerts, you may get some results, but they wouldn't really be that great. But what we have found, especially within, I would say the past

six months, because things in AI move really quickly, we have found that the foundation models themselves are pretty good at this work completely out of the box. So, if you were thinking that you were blocked on having to do RL, like have a huge corpus of data samples that are labeled and then run an RL environment over it, we're here to tell you you don't need to do that. Yes, you could probably see optimization and performance gains if you had that time and that expertise, but you can really just use a model right out of the box right now. Also, we have found that uh one of the biggest things that sets you up for

success in this paradigm is having a technical stack that supports integration with AI tooling. And so, you know, the beautiful thing that we all wish is that we could be dropped into a company with a complete empty slate and you know unlimited resources and build from scratch the detection system that we would like. But we know that's not usually the case. But with tooling that you have and tooling that as you're looking to integrate new tooling, uh things to look for are um tools that use common languages and frameworks. So languages for writing detections for your infrastructure. Uh Python is great. Uh we know that the models especially claude is like pretty good at writing

Python. Uh SQL as a query language is very well known. Um as well as uh Sigma for detection rules. There's so many sigma rules out there that the more there is on the like available on the internet about these systems, the better the model will be at using them. The other components that are really helpful are robust documentation always as well as APIs that make uh speaking to and working with a system very easy. things to steer away from, and I'm looking at a lot of the legacy SIM providers here, are tools that treat how they work as a proprietary secret. If you are working with tooling that does not expose external APIs that forces you

just to work in a UI, you're going to have a lot harder time and you'll probably be boxed into using whatever model they're packaging and selling with their product, which again goes back to that blackbox ML theory that we were talking about, which we just really don't like. We relied heavily on models to be our co-engineers with us throughout this process. They helped us brainstorm, prototype, and ship our systems. Um, so starting by by talking about how we did that, we really leveraged we started with our our coding process by leveraging a a coding assistant. Uh, there's a lot on the market. It seems like there's new ones coming out every week. uh we use cloud code for our

development but we expect that you would have similar results with many of the tools available. I have been personally excited about how uh people outside of anthropic have been using cloud code. I uh am a big Steve Yaggi fan and so it my just heart grew when I saw this post from Steve Yaggi uh talking about the relentless nature of cloud code working to solve any task that you give it. it just doesn't rest until until it's solved the problem you've handed it. Um, so we're going to look at a quick demo of uh using cloud code to prototype a UI feature for our investigation tool. Uh, on the lefth hand side uh you'll see the

UI running locally on my laptop and on the right hand side is cloud code. Uh, I've opened cloud code in the project. I haven't given it any information. and I give it a really bad instruction of just uh in this investigation UI, I would like the ability to see raw transcripts um without any additional context. It doesn't know anything about this codebase. Um and it does a pretty good job. So, we'll take a look at that. Um so on the the right hand side you can see cloud code um and that that request to in the investigation UI uh show add a add a a button to uh show the raw transcripts. Um cloud code first starts

by searching through the codebase to understand what the code actually does. Tries to find the investigation UI uh components. It then comes up with an action plan of how it's going to implement this feature uh which it has those little check boxes there. Um it's going to start editing the code and uh the first time the first edit uh it asks for permission to make the change which I approve and say go ahead and make all future changes. Um as it makes the changes uh the UI reloads automatically to reflect the changes. Uh in a moment here we're going to see where it actually adds uh the button to show these raw transcripts. The idea with

this feature is um sometimes it's useful to see all the specific details and not the summaries that is displayed in the UI. Um so that's the the thing that we want to see. So it's added the button up at the top. Uh it's now implementing the actions for what what actually happens when you click the button. Um and in a moment here, okay, it's done. We'll go ahead and and try clicking that button. And we see the raw transcripts in JSON. And it really does speed up development for us. Like I consider myself a pretty good engineer. Maybe not a great UI engineer. I think I could, you know, depending on my context, how much I've

been working in this the frameworks like I could maybe implement this in 30 minutes or an hour. uh and that is the real time uh one minute 30 seconds it takes per clog code to implement these features. Um so it really just speeds up our ability to try things out. Uh and this was a really important part of us building this in such a short period of time. Uh we've also uh been using tools and MCP servers a lot in our day-to-day work. Uh MCP is the model context protocol. Um, and uh, we're really excited about these as well. Uh, tools are just a great way to break the model out of its uh, con constrained context

window and give it the ability to interact with the outside world. Uh, and also to do things that it may not be great at. Um, so, uh, models we know, uh, will produce nondeterministic output. And if there's specific things where you need determinism, giving a tool that that performs those actions can help get consistent results. Um, and MCP servers uh are very exciting. We're we're seeing so hot right now. We're seeing more open source MCP servers every day. Um, I think like we're we were personally excited about uh Slack, uh, GitHub, Smrep, and the recent unofficial, uh, GEDRA MCP server just released a few weeks ago. I don't know, Jackie, do you have opinions about that Gedra server? I

contain two wolves, and one of those wolves is super excited that you can have Claude do reverse engineering for you. But also as someone who started my career reverse engineering and had to walk uphill both ways for my IDA pro license, I'm a little salty. I'm a little salty. The final component that we've been leveraging that maybe is talked about a little bit less is external memory. Uh and this the is the idea of like giving a place for the model to write down uh information and then uh make that available to the model model later. Uh this is surprisingly effective as another data source um that uh our investigation tool can use to query in

the future. You can basically think of it as uh a meta signal uh that the model can use in investigations. We're going to talk about talk about this more uh in a few slides. Yeah. So putting all of these foundational concepts together um we are going to talk to you about uh how we use these at Anthropic to build uh the systems. Um right so while we mentioned that doing this first initial prototype of this system that we're using internally right now took us around like just over three months the idea of building something like this has been percolating for a while. And so when we started talking about this, we found ourselves in the state that I kind of

hinted at at the beginning of this presentation where we start out with a lot of firing detections. We're adding more every day. Um, and then we have what we call non-firing detections or like low confidence signals that are still important and many times can actually lead in aggregate to things like incidents or you know significant misconfigurations that we want to know about. But we don't have enough time to look at all of these, especially since non-firing detections can be around a hundred times more in volume than our alerting detections. So we we also knew, you know, Claude doesn't really get tired. It he doesn't really cut corners. We can spin up as many claws as we want. Um,

but we don't want to give the entire process completely over to Claude at this moment because again we're trying to avoid this black box of like we throw data in and data comes out and we don't know how we got it. So, we wanted to create a system where we are collaborators and we work together on triaging uh detections. And I while while Claude does not cut corners, I will say we have found he can sometimes be pretty spicy in in his uh his deliberations. Um and we'll we'll talk about that a little bit more too. And so, what did we do? Um kind of what we do a lot when we have thoughts at anthropic, we talked to Claude. Uh,

and so, uh, Jack on my team had this conversation with Claude where he said, "Okay, here are all these concepts that we've been talking about. Um, write a white paper that in a year we have completely changed the landscape of detection and response." And we've done it by using um, you know, uh, LLMs, we've used uh, humans in the loop, we've created this like collaborative triage platform, we've used external knowledge bases. Um, and so, uh, this was the part of the prompt that we fed to Claude, and Claude ever so helpfully spit out a 48page white paper. So helpful. Uh, and this white paper actually has some pretty good, um, affirmations of some of the topics we were thinking about. Um,

and also some fluff. I mean, it is a 48 page white paper. Uh so what we did from here is we distilled out the pieces that um we wanted to include in our first batch of implementation. We distilled those into a design document again using claude. We saved it in a markdown format and then we dropped it into the repository where we're actually building these systems. And so with the design doc we collaboratively worked with cloud code um and cloud code would implement a feature. we'd look it over, it would commit it, and then it would diligently go back and check off what it had done. And so we had, you know, a clear trail.

So it's vibe coding, but just in the sense that it's enjoyable, not in the sense that we're yellowing it. So let's dive into our process uh for building an AI assisted investigation platform. Um we built this tool called Clue. Clue is a a backronym for claude links useful evidence. I think Claude helped us come up with that backy. Uh but the initial idea for clue was uh very simple. It was let's take some detections and hand them to Claude and ask it how would you investigate the this detection? We would also ask it what tools would you like to use to investigate this detection? We would then allow Claude to hallucinate what tools it wants and then

we would take those those list of tools and we would ask Claude please implement these in Python and we would give it some information about our environment um for what is available what APIs are available. Cloud would go and implement that code in Python. we could look at it and verify that it's doing what we actually expect. And then we would plug that code back into this tool to complete the investigation process where uh Cloud would then actually call the functions that it wished existed uh and generate a final triage report for us. So you might be saying, aren't hallucinations bad? And uh obviously we don't want Claude to hallucinate events that didn't happen. Uh but we do want

Claude to be creative to let Claude cook as as Jackie said earlier. Uh and so uh using some structured hallucinations is actually a strategy to get Claude to creatively solve problems we give it. For phase two, we built on this uh early idea and we added a bunch more features to uh automate things that we were doing manually. So we gave it more tools for searching uh both uh our SIM and externally uh other systems. We built a batch detection processing pipeline. Uh we built an open-ended investigation UI um where we can collaborate with claude and use natural language to do investigations. And we built a storage engine for memory and also for storing artifacts and reports.

I'm going to show you a very simple architecture diagram um that Claude helped us uh generate this this uh diagram. Uh the point of this is that there's not that much to our system. We have a storage layer. We have uh an event bus and a queue for processing detections. We have a way for users to interact with the system and a way to pipe events into the system. But it's not very complicated. It is something that you can build with off-the-shelf services from your cloud service provider very easily. Uh don't be intimidated by a little bit of infrastructure. Um we're now going to look a little bit at the prompts that we're using. The main point of this is

that there's nothing special, no secret sauce in our prompts. Um we're we're giving it like very general information. We've tuned these prompts a little bit mostly by when things go wrong uh and Claude makes a mistake we ask the model uh what could I have asked differently? How could I have asked this question differently to have helped you uh answer correctly? Uh and uh that's helped us to tune our our prompts a little bit. Um but they're they're pretty straightforward, pretty simple. Um, and we're going to look at the the different components that are are listed on this slide. So, this is our uh alert investigation prompt or this the initial component of it. Um, that we give it a

little bit of instruction at the top, just some context for what we're trying to do. Um, we give it the specific event that we're investigating. We give it the rule that triggered the event. We give it a set of uh SQL tables um that uh Clue can query um especially when it's looking for additional data sources and uh for it to be able to expand out and do uh broader investigations and then we give it some basic guidance uh for how to investigate and uh things to try and things to avoid. Um the main point here is that uh we we've played around with like what works and what doesn't and we've we've want to give it like some constraints

for how far it's going to go. U but we don't want to be overly prescriptive about what it should do. Yeah. And digging a little bit more into the tools that we built. These are all tools that um we built internally. Um and they're pretty basic as well. Like the most important one I'd say here is query. And query just is a tool that cloud can use to query our data lakeink. Um so this tool has advice on how to uh do like optimal search queries and how to just interact with um interact with our data lakeink as well. We um as Peter mentioned like models are nondeterministic and so sometimes they need a little help if you want

repeatable results. So we have um tools for creating reports because we want our reports to be structured and and contain the same information. We also have tools for interacting with uh thinking so extended thinking. So one of the interesting things that we do is we kind of create this conversational model where you know if you just oneshot prompt uh claude you'll get a response but if you ask it to think and kind of come up with a multi-step answer you can get better results and it can also use different tools in each uh stage of this like conversation. And here's the output of of uh the investigation process. Um so clue generates a triage report. Um it has uh

its initial findings and then just enough information for a human if they're looking at this to understand the key components of the investigation and if they want to dig in further uh that information is readily available. Another really important piece of this tool is every investigation has a full investigation transcript. Um this means that uh we can verify everything that happened. Uh everything that Clue did uh is in this transcript. We can go see its thinking process. We can see what data source it accessed. We can see what queries it ran. Uh that it's really helpful if there's any question about accuracy. And it's also helpful as we tune the tool and try to make it better. Um this is a

lot of where we think where we have something that is different than like the blackbox ML that we talked about at the beginning where like we can have actual confidence uh in the process and in understand what the tool is doing. Yeah. And one of the things that we also find really turbocharges the model is having context and that's organizational context about things that it's done in the past. So uh we store every investigation report and every um you know imprinted detection report into a repo. And we mentioned earlier these knowledge bases don't need to be overly complex. You don't need to use rag or like a semantic graph to start out with. Obviously those are great but you can

just use a bucket. We just use a bucket with a file structure that then claude has a tool to understand where things are stored, where the metadata is, and it can interact with these reports uh quite easily. And so this is where we get into kind of the stuff that I'm very excited about. Um so batch processing. So our idea was it's great to generate singular reports, but we would love to see patterns over time. And going back to talking about two different kinds of detections, you have your alerting detections in red. And these are detections where you know, you know that humans should look at these. There's some kind of um they're really high

confidence. Each of those gets their own report um generated. But then we have, you know, on the order of a 100x number of signals which are lower confidence detections. And for each of those, they all kind of get batched together. And for the report and the summary functions, these are workflows that each have their own tools um and their own um schedule. And so the summary report uh runs over a batch of these like lower confidence signals. And from that we do what we call meta analysis, which is another workflow which includes prompts and tools that gives you a summary kind of of your environment. And the thing that we're really excited about here is

that not only are we in control of every component of this, we can spin up tens, you know, of different experiments, trying different ways to write reports, to write summaries, to do meta analysis. We can do best of n in um you know running the same detection through the model and getting uh n different results and comparing them. This is very exciting because something that I have been sold again for you know over a decade is UEIBA or user entity behavior analysis. So it's this idea that if you can fully understand what's going on with your users and your systems, you can quite easily see anomalies. And what we're building here and what we are

excited to build in the future is something where you can do this entity extraction and get a really close contextual idea of what's going on in your environment and you don't need external tools for it. Ah, and yes, this is another demo. And this demo, uh, as we mentioned, we built an investigative UI. And we didn't set out to build this, but we found that once we gave Claude and Clue the tool use and the other features that we gave it, suddenly it was very very good at being a collaborator in doing natural language uh investigations. That's me. That was that was interesting. It was inception. Um okay. So in this demo we're going to go through a

investigation where we have a situation where we have a set of contractors and we have um some concern that they may have had access to systems and documents that they shouldn't have. So we want to this is you know a classic problem in investigation or incident response. Can someone go and look through the logs?

I know the demo gods say no even to a video. Even to a video. I know.

Okay. I don't know. Do we Ah, okay. Sweet. Yay. Huzzah. Um, do you want to try to do that? Yes, please. Cool. So, we're actually going to try to slow this down because the investigation engine is too fast um to actually explain what's going on here. So, we've asked our natural language question of here's the name of three contractors. Can you get me the logs to show if they have access to anything um that they shouldn't have? Um we have a concept of you know FTE clearance as well as contractor clearance. You can see um the conversation here. So, the only thing that is on the analyst side was the first query that was kicked off.

And now you see this kind of conversational flow of Claude going through and making queries into different uh tables in our data lake, gathering the results, sometimes makes like a SQL uh error like we all do and then it will correct itself. Uh and so the nice thing about this is you can issue um an investigation query and you know continue on your day and come back. This took just about like two and a half minutes to run. Um, and at the end we get a full report about um, what Claude has found. And the nicest thing is you can go back up and see what queries were run. So if you have any question on how

did uh, how did we get to this verdict or where is this information coming from, we have the full logs. Yep.

just um and I talked a little bit about you know things we're excited about um in the that we're going to build onto this like I said we're still very early in this prototyping stage uh we talked a lot about how to apply this uh work to response so we talked about investigations we talked about alert triage but um especially if you are using CI/CD for your detection rules um you can actually have claude make suggestions to tune your rules to write new rules Um, also kind of the, you know, I talked a little bit about the white whale of UEIBA and having a clear idea of what is going on in your environment that is

generated for you and kind of again one of these like, you know, beautiful goals of a functioning detection program should feed back into preventative controls. So every time you have an incident or something happens, you should be feeding back into maturing your preventative controls. And we do believe that in the future a system like this could assist in both suggesting and potentially uh creating patches for um preventative controls. And finally, we just want to really encourage you to start experimenting uh with the same techniques that we used. Um as we said earlier, we are a very small team. We did this uh kind of on the side from doing our normal detection and response work. Uh we could not have

done this in the time that we did or our team size if we didn't have tools like cloud code available to us. Um but the the key thing for us is just like those gave us the ability to prototype and to try things and to get that rapid development cycle going. And uh the more that we did it, the more excited we got about it and the e the more fun it was to build more things into into this platform. Um so you should just start start experimenting, start prototyping, see what you can build. Uh don't try to add everything at once. Uh just start small and build from there. All right, we have questions.

[Applause]

Well, thank you very much. Uh again that was great presentation lot of insights and we have questions pouring in. So, great. Love that. Thank you. There is a few minutes left and I will try my best to squeeze as many questions as possible in the rest. You guys will have to take it up with uh the catch us. Fantastic presenters upstairs. If you buy them a drink, they'll probably answer your question. All right. Uh let's get going. Um this is one interesting one. How much would your project have cost if you were a customer? Yeah. Uh that's a great question. So one of the fun things about cloud code is we can benchmark how much queries cost. And

so we we have some rough ideas about per alert. um it can range anywhere from 20 cents to like a dollar per alert which obviously you know is uh something that we are working to pair down and then with the summary analysis where you have like hundreds of alerts into one summary again it's like you know 50 cents to a dollar and that doesn't count the cost of like the data lake or um yeah so and I think the the thing that we're seeing is just the economics of LLMs is they continue to get smarter and smaller models continue to be more affordable and be comparable to uh larger models from just six months ago. So we expect

the cost of this to go down significantly uh in just a fairly short period of time. Thank you very much. Uh another one here is uh what do you mean by structured hallucinations? What do you mean? Um what we mean by structured hallucination is we talked about tool use and tool use is a great way to kind of coax a model down a path. Um but when we talk about hallucination we want the model not to be stuck in a rigid way of thinking. I keep getting this uh in a rigid way of thinking or on a prescribed path. So we want the model to be creative and to come up with ideas that we couldn't have come up with. And

so that's where we we elicit hallucination in in like a structured format. So it's not like creating network flow logs, but it's you know it would be really great if I had a tool that did this. Yeah, I think constrained hallucination is really how we think about it. Okay, thanks. Uh are you not concerned about uh Oz issues with MCP? Um I think a lot of people are concerned about that. uh there's uh significant work being done. I think we'll see improvement relatively soon. Uh but also I think uh nobody talks about like off issues with LSP servers and how they integrate with uh development tools. And I you have to understand what your MCP

servers are doing. But I I uh think we'll see good solutions for that relatively soon. Thanks. Uh do you see anthropic opensourcing these tools uh and workflows? If so, on what time horizon it might happen? Yeah, that's a great question. I think we we encourage you to try to build these things yourself because we we do believe that with cloud code, why is it oh with like things like cloud code, it's really really easy to prototype. Um so coming up with kind of the process we talked about where you have a design document you can drop that into what you're building but we do um we are thinking about open sourcing because what we've built uh can be CSP agnostic

you can you know use whichever CSP you want. using kind of um foundational building blocks that are available across CSPs whether you use GCP um AWS or I'm sorry if you use Azure um sorry uh um I told you I wasn't going to be spicy um and uh also SIM agnostic so we believe that you know however you are querying whatever wherever your data is um this tooling can be agnostic from that yeah there's a large chunk that's just glue code that is specific for our like the our different databases and our our environment. Um and so that's why we don't feel like we're ready to release something right now. But um there's a lot that like potentially we could

share. Okay, I'll try to manage this microphone. How do you think the next version of Claude will further amplify this workflow? Yeah, I think what we what we kind of have we were just talking about this, weren't we? Or like forecasting that a lot of what we're doing is building um with like our tool use, we're we're building all this scaffolding around the model, but we believe with time the model itself is going to get better and better at doing these things itself. So we'll see kind of you know right now there is the model and it's surrounded by an ecosystem of tools and integrations but I mean in my mind we will kind of continue to see most of

those capabilities start to become part of the core model itself. Yeah there's uh something that we're not doing yet but we hope to do is uh use different models for different parts of the the process. So, uh, smaller models that are less expensive are probably still very good at writing SQL queries. And, uh, we want to leverage things that a lot of the AI coding assistants are already doing where you have the most intelligent model, thinking high level about the the plan for doing an investigation and then delegating to less expensive, faster models for actually executing on that plan. Great. The next one I'm reading out is how are you auditing and monitoring clue activities similar to any

enterprise applications? I guess is that uh how we're monitoring people using clue? Uh I think uh that is uh the way we monitor any of our hosted applications is we have uh audit logs and access logs and and that's how we monitor that. I think if we're talking about monitoring like the actions that Clue does or like verifying that uh Clue isn't making mistakes, um a lot of that is comes down to looking at the audit logs if something doesn't feel right, if it feels like something was missed. Um and then just periodic um as we we're building the the tool uh we're reviewing the audit logs quite a bit uh in understanding uh like where things are

working and where things aren't. And so we're doing a lot of auditing there as well. Um, do you retain clawed with new investigations or just rely on fresh context? Yeah, it's a great question. Uh, the context window is clear every time we do an investigation and the way that we access context is through tool use. So when we talk about our knowledge base, claude has tools that it can reach and pull things into its context. But we can't start um every conversation with uh context and actually we believe that this is a benefit because if you have already preeded the model with how it's been doing investigations in the past, the likelihood it's going to not um try

novel ways of investigating is higher. So having a clean context window is actually advantageous. Here is an interesting one. I think in the clue security audit of three users, will Claude respond nicely if you ask it to tweak specific investigation sections in plain language?

Not exactly sure what that's asking that we we can take it offline. We can pass on that afterwards. Yeah. Great. uh you had a great access review demo. Were you able to automate this kind of activity completely uh by automate again I'm not quite sure um what the question is but we do something that I'm currently working on is the capacity to do scheduled queries. So say you know you have um like a user who has been um you know has come to your attention of doing some things that maybe not be like by the book and you want to just run um kind of like activity reports you can create uh scheduled queries that will run um like

these kind of natural language uh things over time and then you can also use things like meta analysis and summaries over that. Well, I think that's most of the question. There are a few still left, but I'll let people approach you so that you can get some drinks out of them. Once again, folks, uh, thunderous round of applause for our speaker, Jackie and Peter. Thank you so very much. Really appreciate it.

BSidesSF 2025 - AI's Bitter Lesson for SOCs: Let Machines Be Machines (Jackie Bow, Peter Sanford)

Related talks