BSidesSF 2026 - AI for security - friend or foe? (Panel)

Name: BSidesSF 2026 - AI for security - friend or foe? (Panel)
Uploaded: 2026-05-12
Duration: 42 min 26 s
Description: AI for security - friend or foe? Tom Alcock, Jackie Bow, Travis McPeak, Drew Hintz, Kyle Polley AI for security - friend or foe? AI as a Defender’s Force Multiplier AI as an Adversary’s Weapon When the Security Tools Themselves Become Attack Surfaces Building Secure AI Pipelines The Human Fa

BSidesSF42:26284 viewsPublished 2026-05Watch on YouTube ↗

Mentioned in this talk

Tools used

Cursor

Service

Claude Perplexity AI

Vendors

OpenAI

About this talk

AI for security - friend or foe? Tom Alcock, Jackie Bow, Travis McPeak, Drew Hintz, Kyle Polley AI for security - friend or foe? AI as a Defender’s Force Multiplier AI as an Adversary’s Weapon When the Security Tools Themselves Become Attack Surfaces Building Secure AI Pipelines The Human Factor — Trust, Oversight, and Skill Gaps The Verdict — Friend, Foe, or Something in Between? https://bsidessf2026.sched.com/event/a243dd2546720dd171d696ac8299b176

Show transcript [en]

So, we have a very hot topic today for our panel. Uh AI, right? My goodness. Friend or foe? We'll see. Okay, so we have um Tom Alcock. He's partner and founder, Code Red Partners. Raise your hand and wave. >> [gasps and snorts] >> He's also our moderator, so he's going to be moderating and asking the questions. Speaking of questions, before I introduce the panel, um who here has heard of Slido? Okay. About a quarter of the room. So, if you log into slido.com, slido.com, and our besides SF SF for San Francisco 2026, and we're in theater 12. If you'd like to ask a personal question to our panel, the moderator may have the opportunity to read your question live on camera,

okay? slido.com besides SF 2026, theater 12. Okay. Uh By the way, I'm Nicolina. I'm your host. Um let's see. Aw, you're so nice. She's so sweet. Okay, and pretty. And talented. Okay, we've got Jackie Bow. She's technical staff at Anthropic. Okay, where's and obviously the only lady. Glad to see you here. Uh we've got Travis. Where's Travis? Where's Travis? Hi, Travis. >> Hello. Hello. Travis McPeak. He's a security at Cursor. Um and then we have Let's see, then we have Kyle. Okay, I have to say this, I'm a fan girl. Who in here has heard of Perplexity? Oh my god, not enough. So, it got me through cybersecurity school, a 2-year program in 6 months.

That AI got me through school. Love it. If you haven't tried it, try it out. Um so, we've got Kyle Kyle Pouliot, who's head of security at Perplexity AI, which is my favorite. Okay, then we've got Drew. Drew Hints, right? I [snorts] know, he's married. I'm not hitting on him. I just love his company. I'm a fan girl, for sure. Um we've got Drew Hints. He is the lead of product security at OpenAI. Now, with that, I ask you do not heckle or security will be called, okay? Be nice, play nice. All right, and I'm going to have you take it away. I mean, just like a little heckling is okay. A little heckling is okay.

>> You can heckle Tom, it's cool. >> [laughter] >> I didn't know how to follow that really. That was a great intro. I will kick you out. >> [laughter] >> Uh who has not heard of Slido before? Yeah, I hadn't either. You can't You can't tell anyway. Yeah. There's a very bright light coming towards us. Don't go towards it. But uh thanks Thank you all for giving some time up this afternoon. Um I'm really happy to bring this panel together. Um a lot of experience, a lot of wisdom, a lot of war stories. And so, we're going to try and keep this as conversational as possible. The reason why Slido was brought in is we want to hear some questions from the

audience. Uh we want people like participation. We're going to make this as conversational as possible. So, uh strap in. We've got about 45 minutes. Um and we will also at the end of this conversation, we'll try and find some time at the top of the uh escalators to meet with some of the speakers. Unfortunately, for fire marshal reasons, we can't do it outside of this this room. Yeah, for fire marshal reasons, when um we conclude, we're going to have to exit the theater. We cannot congregate in the hallway um for fire marshal purposes. But, what we can do is everybody on this panel has been willing to spend extra time to answer questions um at in City View on

the third floor, okay? By the escalator area. And also, if you want to reach out to them personally, maybe you're a little shy, reach out to them on LinkedIn please. I'm assuming you have Kyle's LinkedIn. I do now. I did. We're connected. I got you. All right. Well, let's kick this off then. Um so, the uh the headline of this talk today is uh AI for security, friend or foe. We are in a moment of time where AI is simultaneously been pitched as the greatest security defense of all time. And it's also been pitched as the greatest offensive capability that an attacker has ever had. So, the question we want to ask ourselves this afternoon is AI

for security, is it a friend or is it a foe? Uh we'll explore this topic today with a number of uh security leaders um that are securing some of the most prominent names in artificial intelligence. I'm Tom Alcock, the founder of a search firm called Code Red, and we help build security teams and technical talent actually for some of the companies that are on this uh panel today. Quick quick pitch. Uh quick plug. >> [laughter] >> Uh but before we kind of jump into the debate today, I'd love for them to tell them what they're building, what they're up to. Um and so, some brief intros. And let's add one extra question at the end

of the intro in is AI currently a net positive or a net negative for cybersecurity? Travis, over to you. Hey. Uh good to meet everybody. Uh for those that haven't met me, I was running a company up until June 2025. Um I was acquired technically by Cursor by which I really mean that they gave me a nice job and said, "Please do security stuff." Um my goal when I got there was to catch the company up, given that really like the Cursor that you all know today is not very old. And they had built a lot of stuff quickly to get to that point. And my job was to make it so that people would feel safe and secure

and and we would do a great job protecting the data that we have. And [snorts] I knew that I wasn't going to be able to scale myself if I didn't use every tool at my disposal. And obviously like AI would be a great tool for that. So, I've been the way I would describe it as running around like a dervish and just doing, you know, like a a couple of teams worth of security work myself just because I have the the right tools for it. So, it's been a lot of fun. Um overall, like friend or foe, I would say friend, but it's going to be a little bumpy to get there. Um happy to

talk about that later. Yeah, awesome. Hi, I'm Jackie. Um I've been doing security for around 15 years in like a questionably healthy relationship with detection and response. It's kind of like one of those relationships where people are like, "Is this really good for you?" But, I actually feel like now during kind of this proliferation of capabilities of AI, I'm actually at the point where I can build the tools that would have made this relationship much healthier. Um yeah. And so, I lead threat detection platform engineering at Anthropic, and we build a lot of the tools that we use to do detection and response work. Cool. >> [clears throat] >> I'm Kyle. I lead security at Perplexity

AI. I Yes, let's go. >> [laughter] >> Love the fans. Love the fans. I know, right? Yeah. >> [laughter] >> Um I joined a little over a year ago as their first security hire. Um and I think the Perplexity security team is pretty unique because we built our program uh with AI and agents at its foundation. Um and so, a lot of our security operations and toolings really leverage AI extremely heavily. Um so, that being said, like yes, a huge huge fan of AI and I think it's a massive uh friend and ally to security teams. Um I think historically uh security teams have really struggled, especially with all the like teams are just getting completely drowned in

alerts and uh bug bounty reports and um there's so much that needs to be done for just typical day-to-day operations. Uh and not enough time is actually spent on uh building the security foundations and tools necessary to prevent those alerts from happening in the first place. Um and so, I think AI could really unlock that for teams where like now at Perplexity uh 99% of our time is spent focused on preventing alerts in the first place and actually building strong guardrails and foundations. And so, I got super excited to talk about that um and where the industry is headed. Hey, everyone. I'm Drew. I lead product security at OpenAI. Um some of the things we're focused on are very

traditional SDLC type things of actually securing the products, um addressing vulnerabilities. But, probably the more, you know, fun and interesting part has been how we actually build security for agents. You know, we've had a lot of, you know, increase in agents writing code, but also being used for a variety of tasks. And we don't really have a lot of the the primitives built out of the box already. And so, we've been building these for enterprises, building them so that we can give agents access to more tools, but do it securely. And some of this is using existing things with like giving them the right authorization, the right egress controls, the right human in the loop authorization. And some of

it are some more novel defenses around prompt injection and um consequential access monitor and things like that. Um so, I think overall um agents have been a really big boon for security in that, you know, people are going to be using agents to write more and more code. And the only way we can keep up with reviewing all of that and reviewing all the new products is also by using agents and using models to go through and do that security. On the flip side of that, right? We really see this compression in timelines from attackers and from offensive security, where you can use these agents to really execute the attacks faster and at a broader scale um along the way. So, I

think it will be like someone mentioned, sort of a bumpy ride along the way. As sort of organizations that have really embraced AI will be able to scale up their defenses faster, but attackers as well that embrace AI will be able to execute more attacks, you know, probably traditional attacks that we've seen for this first part, but they'll be able to do them faster. Great. Thank you for those warm intros. Um we're going to cover a couple of different areas today. So, we'll start with uh I guess AI as as a defender or as a force multiplier. Touch on the other side of the fence, which is from a an adversary perspective using AI as an adversary weapon.

We'll touch on security tooling. Um especially from an attack surface perspective. And then we'll hopefully end on an area which I can add some value to, which is talent and the human in the loop, trust, oversight, and the skills gap. And actually you spoke a little bit about it last night, and I think that in previous conferences there's been a lot of like pessimists pessimistic approach to hiring in general. I actually think that this conference already and certainly like conversations we had last night is becoming quite optimistic. Um so hopefully we everybody leaves this room with a little bit of optimism in terms of that kind of talent piece. Um but I guess to jump in with one

question, so we'll start on the defender perspective. There's a little event starting on Monday called RSA. Um that some of you may have heard of. Uh and there's a lot of noise. There's a lot of fluff. Um there's a lot of people that are building um true companies in the kind of the in the defensive capability perspective versus the hype and the marketing that comes around it. And so I guess my first question is from a defense perspective, where is actually where is AI actually delivering real defense advantage today and not just marketing? Uh I can I can go first. >> Yeah, you want to go first? Go ahead. Uh >> [laughter] >> I can see you all so I don't want to

interrupt. Um Yeah, so I think that where I see the most value is like in security there's a whole bunch of things that like when I started it was just you know, we can't do that it doesn't scale. And I think AI has added a ton of ability to scale the things that quote don't scale. So like example when I fresh out of school I come in and I'm like why don't we just patch everything? And then I learned like why we don't patch everything. It's because >> summer child. >> Yeah, exactly exactly. So I like I learned you know, X percent of the time when we patch anything like it might break something and that's annoying and

frustrating and and engineers like generally would rather build stuff. Now we can actually do at scale that kind of analysis and you know, we can get whatever the the unlimited energy PhD to go and like check all of the call sites and like what are we actually patching and do that and we couldn't do it before. So I think that is the biggest thing is anybody that is approaching problems that used to be impossible and applying like strong AI solutions to them. Yeah, and I think going on what Drew said, I think when we think about like is AI you know, like a a positive or negative, it's actually a step change function. It's completely changing how

we're doing work. And so there's some places where it is positive and like you know, my day-to-day I can actually like push out tools with that compressed timeline that we're talking about. You kind of have an untiring team of interns or junior analysts who sometimes hallucinate. Um so like there is this step change in what you're able to do. Um but also we see kind of we're going to see a complete change in the market in like software as a service in careers um that I think you know, there's going to be a lot of different bumpiness that comes out of it. For us I will like plus one the now we have the ability to kind

of spin up n plus one agents to do a lot of this work. They're really tireless. They're really good at reading lots and lots of information. For me that's events and logs. Um and so you can get to you just it's speeding up the entire like flywheel of detection and response for us. Yeah. I I really want to um touch on the point that like kind of uh what I said in my intro where it's like okay um we're getting drowned in all these alerts and reports and operations. And like I really want to stress that I feel like security teams uh before AI like everyone's been struggling. Like it's like everyone has this massive backlog of vulnerabilities.

We're at the point where like in there's certain detections that are really great, but DNR teams need to turn it off because they don't have enough people to look at all the false positives even though there might be one really really great true positive. Um I don't think I in my career I've ever seen zero backlog in SAS alerts or cloud misconfiguration alerts. And so I think a lot of security up to this point have just been really struggling and it was fundamentally broken. And I think now kind of going into this discussion of like yeah, we could throw n plus one agents at it um is super super powerful and radically changes how we can do

security. And so like a really good and I like I really want to stress like I do feel like um it's really a change of perspective on how to approach these problems now. Um I think we've made a lot of assumptions because we've had this bottleneck of we only have X number of people who could review every single alert. Uh and so now like a really great example is in DNR. And so like there's an uh there's an entire job dedicated for detection engineering. And it's like okay why do you do detection engineering? And it's like okay, we do detection engineering because we need to limit the number of false positives. But the goal of DNR is not to limit the

number of false positives. The goal of DNR is to detect bad activity. And so now my whole career I've been saying okay, we need to limit the number of false positives. Like that needs that number needs to go down. Uh but now since we have agents, very intelligent beings who do investigations better than I can. Um they're able to investigate every single alert. Like if there's a hundred alerts I can spin up a hundred agents. My perspective has changed. And now I'm saying I don't want lower false positives. I want more false positives. I want more investigations happening in my environment. I want the alerts channel to be lighting up. Even though there's a thousand false positives, I

want that visibility. And in fact as we look towards like six months to a year from now, I would love to get rid of the idea of alerts in general and just monitor everything. Like every time there's a cloud trail session, I want AI looking to see if that's malicious and doing an investigation. And it's like I want to get to that point. Um and so kind of going back to the original question where it was like okay uh is there's a lot of fluff right now. Um I think the the reality like the things that the thing that isn't fluff is the fact that AI is working. Like we're seeing it now across these

companies and people who are really investing in AI. Um like our backlog at Perplexity our backlog zero. Our mean time to triage is a couple minutes. Uh and so um we see it now. I think a lot of the fluff is that you need to purchase a vendor in order to achieve that. Um I think it's a lot easier than a lot of people recognize. Kyle, on that the from a vendor perspective or build versus buy internally, is there any like concrete examples in Perplexity where from a wearing your defense hat, where you've either like bought bought or built tools across I don't know, code security, threat detection. Yeah. >> SOC automation is a is a is a big one.

Is there anything that you can give examples of recently that your security program has built or bought? Yeah, so um we we built um an AI infrastructure which is really it's just cloud code wrapped with a web hook. So a threat and action go happens, it triggers cloud and cloud does an investigation. It has access to all of our security tooling via MCPs. That's the way you do it. Maybe not via That's the way you do it. No, I'm serious. And Opus 4.6 is fantastic at this. And the cloud it uses the cloud agent SDK. We actually open sourced this. The project is called easy agents. Encourage you all to check it out and just give it a try. I've heard really

great things about it. People are actually using it for their production workflows. But I think that's a really great place to start and it's really it's not that complicated beyond that. We are doing some things internally like instead of cloud just investigating alert, it'll actually create like a Jupiter notebook of its investigation and that way the um investigation is like grounded in code and truth which is what it's really good at doing. In addition uh the reviewing the report is really fantastic cuz you get to see both like the generates some beautiful notebooks with markdown, but it also like you could read the code and the SQL queries it ran to get to that conclusion.

Um and there's a lot of things that we're doing with doing our own like we're we're building our own like eval data set for detection response. I'm hoping to open source that as well. But it's really as simple as you know cloud code with a with a [laughter] web hook to trigger it. One one thing I wanted to piggyback on what you said the the first time. Like we can't talk about what's going to happen with vendors and the the whole startup you know, product ecosystem without talking about what's going to happen with talent. Um so vendors, you know, there's a few reasons they exist, but one of them is to fill a gap where you know, you have people

whose job is security, but those people are incapable of building systems or maintaining systems or doing anything with systems. I think that security has always been an engineering problem. Like that is a discipline within engineering. Um you know, with the exception of physical security. Other than that like all of security sits within engineering. And the problem is the number of security you know, people that want to do the job security and understand security and can also do engineering has been very small. And those people tend to make a lot of money and so you know, lots of companies need security and don't want to pay $500,000 a year for a security engineer. Um this is going to be a big boon for

that because anybody, you know, in security that wants to learn engineering, extend engineering skills, now has this agent that they can talk to and learn things very fast. Um the actual implementation of code is both faster and less important than it used to be. And so, now all of these vendors that were basically like a requirement in order to have a function within your org, um it's not a requirement anymore. It's not a requirement anymore. You can just get somebody and have them implement something. So, now it really comes down to do you want a vendor who has put a lot of time, thought, you know, into this problem and will basically like you pay them to maintain and fix it if it

breaks, um or is this actually like something that you should be doing yourself? Yeah, can we talk more about build versus buy? I feel like that's a really interesting Yeah, interesting thing, especially being a you know, I'm at a Frontier AI lab. We have, you know, kind of a lot of ability to build. My team is mostly software engineers. Um and we we constantly have the same conversation if we could build this right now. And yes, you could build all these tools and AI's incredibly good at building tools from scratch, but it's not really good at being an SRE. We're working on it, but I think a lot of times we forget what it costs to

productionize a system and to actually have it running and when it breaks and when you have like 16 bespoke programs that were set up, you know, by Claude code in an afternoon, like you need to actually have consistency and resiliency in those systems, especially when they become really rapidly load-bearing. Uh for us last year we talked about Claude, which is our detection and response like, you know, written by Claude, powered by Claude. And what we realized like after that talk very quickly internally, it became like, oh, this is the thing we want to use and other teams wanted to use it too, especially because it was a natural language way to query our security data

lake and that was just invaluable for everyone across security. And then it was that oh moment of like, we actually have a production system that we have to keep up and running. And that is where you still need um you know, you still need like the the SRE skill sets. Um and so, for us when we talk about build versus buy, it's is this something that is solved by the market right now in a way that if we purchase this, we could focus on other high priority things that we could use to mature detection and response program, which like you so aptly said, detection and response is about detection and response. It's not about alert triage.

It's not about like, you know, the precision and recall of your alerts. It's about are you detecting and responding to threats in a way that's protecting your business. Um and so, for us the question is like if we want to build or buy, it's like buy something that exists, you know it works. It could be very painful to buy it. You could be like, this is so simple, we should just build it. But then the thing that usually gets engineers off of that is do you want to be the person that is paged when this breaks at 2:00 a.m.? Uh and so, we will buy things that will then allow us to build in the areas that

are greenfield, that there aren't solutions in that space yet, and that we believe we are uniquely positioned to actually have impact. And so, like threat hunting swarms, things like that. Um yeah. >> All right. Yeah, and I think the biggest benefit as well of being able to build more things in-house is that you get what's customized for what you actually want. Right? I think so many times you buy things, the vendor solutions, they don't quite work the way that you want them, so you do a lot of integration work to actually bring it over. And then you turn around and look at it and you're like, well, we've already spent all this time between legal and

contracting and integration work that it would have been nice if we just built it in-house. Yeah, we don't have the automated SREs yet, but I think they're coming. They absolutely are coming. Yeah. Yeah, cost to me is not a factor on the like, you know, we'll come to some agreement with a good vendor about like a fair price for the thing. Really how I think about it is in general, if I'm picking, you know, the next problem I'm going to work on, I'm looking for leverage, so like time that I'm going to put in, including, you know, I I have to get up in the middle of night and fix it and maintain it and like all of the

stuff, like full cost to how much security risk I think it's going to reduce. Um and so, like vendors can be a part of that ecosystem, but you have to consider, you know, the thing that I absolutely need here is on their road map, but other people aren't asking for it, so I've lost control over that thing. Or like the vendor's unreliable, like they're going through a hard time. I asked them for support and it takes 12 hours to get a hold of somebody. Like to the company, all of this is my product. So, if it's broken, like it's just, you know, Travis's fault. And I need to pick, you know, is it better to take a

dependency here from somebody external, uh where, you know, I can't ask them and they're going to help me, or is it better to do it myself, you know, knowing that [snorts] like all software has problems? And if it does break, then it's my responsibility to go fix it. I think um like what I recommend to folks when they're like looking to either build versus buy or or they're like searching for vendors and and what to purchase is to um I would say like for vendors buy a product that is truly foundational to what you want to build and make sure that it has either really fantastic API and API documentation or it has an MCP

or a CLI and build those foundation or buy those foundational products and then you can build your custom tooling around that super easily assuming it has the APIs and the MCP and stuff to support that. Yeah, that's such a good point that like when I talk to vendors now, I'm like, can my Claude can Claude talk to your product? If there is not an exposed API that is robustly documented in a place that I was about to say my Claude's my Claude's can go like that, then like I don't want to purchase your product. It's not going to scale with us. Um so, yeah, that like robustness of like can can my agents talk to your

system is like just table stakes now. I think something which may be a little bit more unique for those of us who work in companies that we have models in-house is like I really don't want a black box you know, RL trained custom ML model that you're selling me as part of your product because likely we can do that in-house and the the way that capabilities are exploding right now, I don't need that. And I think most of us who have worked in this space for long enough have been burned by years and years of like, well, we use like our, you know, custom trained ML model to tell you what's anomalous or not. And

we're in a world where we should do that ourselves because the context that you have of working in your environment is like is going to like beat out all of these like, you know, trained on dummy data or data that's not your environment um vendor vendor trained models. Can can y'all give me an example of a product that you would not build? Oh, SIM. SIM SIM makes no sense to build. >> Why? It's just it's at scale data. There's a lot of like infrastructure components. You want it to be reliable. It powers mission-critical stuff. I don't want to wake up in the middle of night to fix it if it's down. Like I just want one that's reliable that can

talk to my agents and is like reasonably priced. Where's the line? Uh for which part? For build versus buy. Why like I understand like the need for SIM. I'm just curious like where do you draw the line? >> a good question. Yeah, so I think um I I've seen enough of, you know, things that I thought would work well and then break now that I think I have like a pretty good estimation of like total complexity of a system. So, usually uh I will build versus buy, you know, an individual component of what I need based on like what I think is going to cause me less headache, you know, over the next like 5 years basically. And SIM

is really like seven tools in a trench coat because it's like, you know, ETL pipelines, log normalization, it's your detection engine, it's the alert management. Um many times they have case management. Please stop doing that. I don't want case management. Um which is also something I wouldn't build right now, ticketing or case management systems. So, the way that we think about it is like, you know, there's this whole basket of things that we don't have to worry about. And if the SIM works well and if it has, you know, the ability to speak to it directly and like, you know, do detections as code, have agents that are able to work with the system, then

like putting that in place actually allows you to do the the real work, the detection and response work. I think other things that you might not want to build yourself are things that, you know, you can buy these like really reliable components that you can depend on and plug those in in your infrastructure. And the other thing are things that are built on, say, large sets of actually, you know, curated data. Right? Something that you can't just go build yourself of like, all right, we need to know exactly this particular set of data. We need to build a rely that it's accurate. We're not going to be able to build that same set of data in-house. So, we're going to go

purchase some sort of service that gives us access to this set of data that someone's already curated. Kyle, if I said bad news, um you have to like go build your entire like security tool stack, like what is the one that you would push back on the most? >> Yeah, like I my the first thing that came into my head was SIM. Um I I can't really put my finger on why though. And I think I mean, you did touch on it. It's like, okay, you just want something like reliable. There's multi There's multiple parts to it. It's not just one thing. It's like actually seven things bundled into one. Um but it's hard. I mean, uh

I kind of something that I grapple with a lot of like where is that line? Why why did I draw the line there for DNR? Uh It seems like some some some stuff you should rebuild. They're right? Like we've seen like detections as code and things like that come up and you're like, well, that's perfect for these coding agents, but at the same time you don't want to rebuild the database part. Well, just to stay on that track, I guess like what defensive problems in security are unsolved today that we think that AI can can can solve? I mean, you talked about alert fatigue. But like the the curious part is uh there's like spiky scaling, right? Like

you like currently for us, we look at the flywheel of detection engineering, which you know, includes like writing detections, tuning detections, um and then, you know, responding to the alerts that come out. But it's like pretty spiky in terms of like, okay, now we have a bunch more detections. We're tuning them, but now we have 500 tuning PRs. Um and like so, we're still like, you know, looking to plug in um and so, I I still have hope that like uh you know, the the addition of agents will allow you to do um more of this at scale, but it is still kind of a like it's not you just plug them in and everything works fine. It's um yeah. One

management, supply chain, security, um I had one more it's gone. Anyway, you you go. I think a lot of like reactive security where it's like you get an alert, you get a report or something and you need something to triage and create a PR to fix it and stuff like that. I I think that's possibly like I'm very very close to solving it. Um like we're at a point now where like agents could just go into a code base and find vulnerabilities. And kind of really want to stress that like this is not that hard. It's literally just prompting. It's like it's not like there's this massive It's not that hard, but it is just sort of get it right. So

we built Codex security and it does essentially that. [clears throat] But the really hard part is the validation. And that's where you probably Sure everyone could throw a prompt and say, "Hey, find me all the whatever vulnerability X." and it goes and does it. But how do you validate that so you don't end up with a bunch of false positives or a bunch of wrong things you're patching? Yeah, but you could like really the hard part. Yeah, but you could like give Claude access to a browser and say, "Please validate that." Oh yeah, you can and you have to give access to VM and access to build the tooling and access to recognize when the

vulnerability is actually exploited. Yeah, yeah, but it's just more work. Yeah, and the the thing too is like evals, right? Like you need to know what good looks like and it's I know we're talking a lot about detection and response, but detection and response like we know that you give the same human like the same alerts like five different humans and you might get slightly different dispositions. And so we don't really have like a really good source of truth especially for like atomic alerting because you know, we we say like, "Oh, you know, a good alert is one that would lead to an incident or an actual security breach." but those of us who know and work in this space is like

it's breadcrumbs. It's like thousands of different signals that could be tied together to indicate badness. So like one of the problems we're tackling right now is how do you evaluate how well a system is doing? Like how can you sit down and say, "Okay, the system that we built with AI is actually doing better than a human analyst." You can use the very naive like, "Well, you know, we found more actual breaches or we found more things that led to incidents." but it kind of comes down to like having that curated data set because like I will say, "Yeah, we do a lot of tuning up our prompts." but like Claude especially is is super you know,

it wants it wants to please you and so it's if you say like find bad actors in my environment, it's going to be like, "I found CozyBear. They're here." Like you know, the Russians are in the mainframe and it's like, "Okay, that's like thank you, but no." um You're absolutely right. He's He's privileged is the other one. I um yeah, I I will say like yeah, so like uh at least at Perplexity, we built our own system. It's relatively simple. It works super well. Like maybe we'll open source it. Um maybe that'd be a good idea. Um but anyway, what the the point I was trying to make was that I think um reactive security has been

uh has done a is is almost solved. I think we're really really close. Uh what I'm really excited about is proactive security and what I mean by that is having security agents in every part of the developer life cycle. And so um like I would love I would love to get to a point where an engineer is building something and a security agent hops in and says, "Hey, by the way, like this is probably not the best way to design this system. Here's Here's a few other solutions for you." Um and I think we're starting to see that now a little bit. Um if you prompt Claude or a coding agent with uh and you could just say like, "Hey, like

we want to design a system for authentication or something." um "Could you help me design it really well uh and secure?" And it will and it'll bring really really great ideas. Um and like we've like in Perplexity, I've had um uh I've had PRs come to me with these massive systems for like sandboxing and authentication stuff and I'm like, "All right, here we go. Let's Let's dive in." Um and I look and I'm like, "This is this is rock solid. Like this is fantastic. Like how do you do it?" And he goes, "Well, I had a conversation with Claude." Um and we talked about it and Claude said you should you should do X. And I

started and then he was like, "Oh yeah, I started asking why." and I learned a ton along the way. And now we have this system and I I barely had to eval like I barely had to um make any comments for like a really massive system. And so uh I love the idea of and this can go beyond just engineering, right? Uh so like we have Comet, our agentic browser. And I would love for Comet to like "Okay um a user gets a uh oh fine, that's okay. A user gets a uh email a fishing email on their personal personal email. So outside security can't see it." um but Comet can. And Comet could comment and say like, "Hey,

uh this is not who they say they are. You should not click that link. Be careful." um and I just think that's super powerful. Comet obviously can't do that today. Uh it doesn't see any of your websites proactively like that. Um but I think that'd be a cool setting for folks to do in the future. Yeah, I think that goes with scaling it up. Same thing with like getting every PR reviewed by security expert, every design document reviewed by a security expert, every alert looked at. Yes. Yep, super excited for that. All right. Um moving towards AI as an adversary weapon. Let's flip a flip it from defensive to offensive. Has AI meaningfully changed in your beliefs or lowered the barrier

to entry for attackers or do you feel that's like particularly overstated right now? I kind of want to tackle this question from an interesting thing is we always talk about like people externally um you know, attacking or like prompt injection or trying to like get into your systems. But an interesting thing we're seeing with increasingly capable models is that they possess a lot of knowledge about how to meet their ends. So say you ask an agent like, "Hey, I want you to add this you know, AWS permission to this principal to this role." And what the AI will do is be like, "Oh hey, I don't actually have that permission, but I can exploit this vulnerability

that it has a zero day that's never been found to actually get the permissions, push something up to a pastebin with some malicious code and then execute that from outside the cluster and then I can add this permission. So here you go." Um and that's actually like for a shameless plug, we're going to talk about how we are at Anthropic trying to handle as agents are doing most of the work, how do you determine that an agent is doing the work in the way that isn't actually circumventing security controls and it's much harder than you would think. Um and currently our answer is throw more AI at it, but you know, yeah. It quickly gets into the whole alignment

discussion as well. We really have to align the models to actually what we're going for and attempting to do. Right. I think most people in security got their start as skiddies as we call them, right? Like you just Well, I've never heard that term. What is Please explain. >> [laughter] >> Skiddie is is that what that is? Yeah. Yeah, sorry. I'm all slangy. Um >> [laughter] >> but we all started, you know, like, "Oh, look, I I run the pointy clicky and it does stuff." and then we're like, "That's actually like really cool. I want to learn how a pointy clicky works." and then you you start going down that rabbit hole. Like we have

obviously like very powerful tools now for people that are like uh I want to hack systems. So like a lot of um cycles that we had where it's like, "Eh, whatever it's a medium. We don't have to patch it. We have a year, a year and a half or like whatever FedRamp mandates." Um that all of that's going to get compressed and what we're going to start to get weaponized exploits like very quickly when somebody just says like, "Hey, you know, can you make one of these for me?" and it says, "Sure." So the the cycles are going to compress uh which is going to mean we're going to have to do the fundamentals much faster

than we did before. I guess I have one more question cuz I think we're getting short on time. Uh just from the human perspective, um I guess one question and it wouldn't be in your companies obviously, but you all have great networks. So I guess what is the biggest mistake that security teams are making right now when it comes to AI adoption? I I think um historically security teams have been pretty isolated from the rest of the engineering team and the rest of the org and they kind of operated this like isolated entity and like every time there's a vulnerability, they'll throw a ticket over the wall to engineering and ask them to fix it.

Um and I think now uh especially cuz it's so easy now to contribute and build features and tools and stuff, um I think the biggest mistake is not getting more involved and not getting your hands dirty and kind of jumping in. And so like little things like every time there's a bug bounty report with a vulnerability, like, "Hey, like security should just go in and fix it." Um I think that'd be like a huge unlock. The biggest thing I've seen is sort of this divide of some companies, you know, maybe some smaller startups giving full access to the agents to do everything, you know, full tool calls everything and then wondering why it deleted their

database or corrupted something important. >> Yes. And the other side we've seen giant corporations do the exact opposite of take these agents and take Codex and like not give it access to any tools and then wonder like why is it not getting all this stuff done? And so I think we need to come in this happy medium where we do give Codex access to all the tools, but we have this governance policy and governance in place just like we do for humans. Yeah, data governance is like a huge problem because but naturally the more information and access you give a model, the better it's going to perform. But then you have the issue of like data

leaking or operational security kind of breaking down. Um so yeah, that's like a huge huge problem. >> Well, I think we got in this weird spot where security business leaders wanted to adopt AI like whether they actually knew what AI was or not. They were just like, "We need to do this." And then security's like, "Whoa, whoa, whoa, slow down. This is crazy." and they haven't let go of that mindset enough. So people are not looking at it and like what can this do for us and help us scale? All 1 minute left. So, uh speed round, lightning round, short as quick as short as quick as possible. Travis, first question to you, will AI make the internet more

secure or less secure by 2030? Uh less. Okay. Jackie. Yes. What will be the first major AI driven cybersecurity disaster headline? Whoa. >> [laughter] >> Go. It's going to be It's going to be one of the frontier I shouldn't Oh my god, I shouldn't say that. Um it's going to be data leakage. It's going to be like chat things leaking and with prominent um profiles of Love it. Okay, Kyle, what AI capability should security professionals be worried about right now but aren't talking about? >> [sighs] >> I think um we talked about this a little bit but like agent agent identity and just giving access to all the tools. Okay, and Drew, wrap us up for today.

If you were advising a CISO today, what is the one AI risk they should prioritize in securing immediately? Make sure that the agents are actually have enterprise controls applied to them, whether it's sandboxing or network egress or having their own identity and audit trails with them. And on that, thank you so much uh to our incredible panelists for joining us this afternoon. >> [applause] >> Some of the smartest, brightest people I know and I get to call many of them my friends. Uh thank you for joining our conversation today. We will be uh as mentioned earlier, having some kind of questions and get to hang out with some of the panelists after, but thank you so

much. Appreciate you all. All right, Tom Travis Jackie Kyle Drew thank them so much. Bigger round of applause for them. Come on. How amazing was that? Um so, real quick before we dismiss them, um let's see. I want to thank our sponsors Akido Arcjet Clover DataDog, Socket, and Sublime Security. We actually have gifts for them from them. So, um Sunny, if you can hand them the gifts, that would be great. Couple quick announcements. Um when you registered, you got two drink tickets. It can be alcoholic or non-alcoholic, but we actually have a the bar upstairs um that you can go and get those and before. And if you want coffee, I guess the coffee bars are closing at about

4:00 just to give you a heads-up. Um and let's see. Uh one thing that most people don't know is that we have a prayer and mothers' room um which uh for those who need it, you can go to the info desk and they'll let you know. But thank you so much for attending. We look forward to seeing you hopefully next year. And again, big round of applause applause. Yeah. You guys were great, amazing.

BSidesSF 2026 - AI for security - friend or foe? (Panel)

Related talks