← All talks

BSidesSF 2026 - AI-Powered AppSec: 10x Your Security Team Without Scaling... (Anshuman Bhartiya)

BSidesSF44:57297 viewsPublished 2026-05Watch on YouTube ↗
Mentioned in this talk
Service
About this talk
AI-Powered AppSec: 10x Your Security Team Without Scaling Headcount Anshuman Bhartiya Security teams are drowning in vulnerabilities, PR reviews, and manual triage. Learn how one security team built an AI-powered platform that automates PR security analysis, SAST triage, & design reviews, discovering critical production vulnerabilities while reducing review time from days to minutes. https://bsidessf2026.sched.com/event/6570b3936d993703bcaef6d3304c64f5
Show transcript [en]

All right everybody welcome to the first session of the day. Uh we have Bertia who will be speaking on AI powered apps. Boost your apps team without scaling headcount. We'll be doing Q&A at the end of the session. Please post your questions to Slido which you can access via the QR code on the theater um sign over there or by going to bsidesf.org/q&a org/q&a with that over >> awesome hello everybody uh welcome to this presentation uh today I'm going to discuss AI and appsc intersection >> oh sorry is this not working hello hello okay can you hear me now >> more hello >> okay okay there you go sorry um yeah So I'll be presenting about the

intersection of AI and appsc and how appsec and product security teams can use AI to scale themselves without necessarily adding headcount or new tools or platforms. So who am I? I am Anuman. I'm currently the appsec tech lead at Lyft. Um I help run the boring apps community and uh that's that's my website.ai. I've been playing and tinkering with AI for the past year or so and I blog about everything I've learned over the years. So if you're interested in uh learning more about the experiments I've been doing, you can go to my website. Um as far as my career goes, I've been in this industry for about 15 16 years now. I have been on

both offensive and defensive sides. I've worked for multiple companies, small uh scale startups to enterprises. uh and I've worked across different domains. Cloud security, incident response, application security. Uh apps continues to be where my heart is. Uh it's just something that I'm very passionate about and I think the challenges that um apps teams face are very difficult to solve because they just don't involve tools or platforms. They involve human uh you know like relationships uh understanding what the problems really are so on and so forth. Um and yeah I've we've been building a platform called prefi internally at lift uh which is what I'll be sharing about today uh of how we are a able to use AI to scale ourselves

and also uh this whole experience is pretty wild to me being able to present in a theater so please uh bear with me. Okay. So what are we going to speak about today? Well, I'll keep it simple. First, we'll discuss the what I call it the security scaling crisis. Um, where, you know, just kind of thinking about the problems that apps teams face on day-to-day basis. After that, I'll cover three different use cases of how we've been able to use AI as a force multiplier. Uh, poll request analysis, triaging, SAS vulnerabilities, and apps design review automation. I'll share some of the things we have learned over the year or so. and then we'll have about five

minutes in the end for questions. Cool. So before I uh speak about the security scaling crisis, this is something I want everybody to think about, right? And especially with AI now, um the amount of code that is getting generated, all of us are experiencing that has skyrocketed, right? And the the scaling crisis existed even before AI was a thing. Uh security engineers were drowning in alerts, vulnerabilities, so on and so forth. And with AI that problem has just uh increased so much more right and with AI generated code all the code that gets generated is not secure. It can be secure if you provided the right context, the right guardrails, all of that in place. But just if you just ask

let's say clot code to generate code, it's not going to be fully secure. Right? So we have more code, we have more issues, security vulnerabilities and we have the same resources. At least from my observations across the industry, I don't see you know multiple teams hiring multiple engineers. the hiring has sort of seen its ups and downs and I feel like we're going through a transition where teams are themselves figuring out okay what are we going to hire for what roles do we actually need to fill right so if you add these three things the problem is only going to get worse and worse right so really this is the security scaling crisis I I want all of us to think about

and how are we going to address these problems okay so now speaking about the security scaling crisis Right? These are some things I'm sure all of us have seen. There are hundreds plus pull requests that get merged every day and especially with AI now. So, um just a caveat, I built this presentation about 8 weeks ago and so much has changed in the last 8 weeks that uh you know it's like a few of these slides might already be outdated. Um and I want us to think about whatever I'm presenting plus AI to it, right? So, these were the problems that we had faced even before AI was a thing. We were merging hundreds of PRs

on pretty much on like a daily basis. Uh we were dealing with SAS findings. Uh apps teams were uh struggling to do design reviews at scale and engineers were just waiting for ABSC engineers to provide them feedback uh in a timely manner right and with AI all of this is just getting worse right and we've tried to address some of these challenges in different ways. We have tried to hire more people. Uh we have tried to uh get some more tools, some more platforms and we have tried to prioritize right. Uh all of these things work until a certain extent and then after that they don't right like for instance when you say prioritization what are you really

prioritizing? What are you prioritizing against? Do you really know how prioritization should work? Right? Right? I mean these are questions that security teams they come up with their own way of doing things and then they often don't align with what engineering teams uh face. So you know these words get thrown out quite a bit but um I feel like teams don't necessarily understand what are they working on or what are problems worth working on right and with again AI the whole uh shift happening right now these problems get magnified more and more also uh security teams generally expect engineering teams to shift left um there are certain ways where security teams try to train uh

engineers to do threat modeling right as an as an example in My experience what I see is you can't just train an engineering team to do a threat model right you have to find a few people who are really curious about security who want to know how things can actually be broken down right and if you just expect an entire engineering team to do threat model of every feature every release uh I at least in my experience I haven't seen that work at scale right so all of these things we have tried and we have failed so what is is a solution to the security scaling crisis, right? Uh I I I want us to think about AI as a force multiplier,

right? We we have seen uh clickbait, you know, blogs, articles, what have you, content saying that AI is going to replace our jobs. Uh we won't have our jobs anymore. I think that's the wrong phrasing, right? I think we should think about AI as a force multiplier, right? And if you think about AI that way, there the opportunities are endless, right? like the things that you were doing already, you can just do that plus a a lot more things. You can spend your time on things that actually require attention and your brain, right? So, just a different framing of how to use AI. Cool. So, here's where I'll uh so we've been working on this plat platform

called Priscia at Lyft internally and I'll share three use cases here. Um so, on the left hand side you can see like a an architecture diagram. So it's actually very straightforward. We have a bunch of events that get sent to our SIM platform. From that SIM platform, we uh fire a bunch of web hooks that hit our uh PRISKI platform. And within the platform, we have as of today, we have about six agents. You're only seeing three here. Again, this presentation is already out of date. But uh uh for the three agents that I'm going to present about today, uh they do pull request analysis. So imagine you know you have uh hundreds of services and every

service is showing up call request and you want to know about high impactful findings. You might have your SAS tools and platforms but we all know the the state of SAS tools right now right? Um the second use case of is of doing a SAS triage. So now when your SAS tool throws a bunch of findings at you how do you triage as a security engineering team? The SAS tools don't do a good enough job. uh the burden eventually falls on the security engineers to prove what is exploitable and what needs to be fixed, right? >> There you go. Thank you. Um and and the third case is abs reviews, right? As u one of the core activities of any apps

team is to review specifications to proactively call out threats and and risk that the organizations need to fix before things get deployed and shipped, right? Um the way we've built this platform is pretty straightforward is it's a flask back end. Uh we use Dynamode DB forh for storing all the artifacts and we we kind of use multiple LLMs because you'll hear that quite often is you know with with LLMs and AI noneterminism is a big blocker in getting these workflows and uh sort of uh solutions adopted. So one way that I think works reasonably well, it's not foolproof, but it works reasonably well is if you have the same workflow go through multiple LLMs and

then you build a some kind of a consensus uh mechanism to see what are the common findings both of those workflows found, right? And the fourth thing is async processing. So when you when we think about a llm's agents, right? These agents have the power of running autonomously for a long intervals of time. So it's really important to design something that works async, right? Like you can't expect to uh invoke an agent to do something and just wait for the response, right? So these are the four main principles that we kind of build this uh platform on and I'll go into each one of these use cases next. So the first one is automated PR

security analysis. Right now, we've all uh you know worked in organizations that have CI systems deployed, that scan every pull request, that call out vulnerabilities, that leave comments on the PRs. Uh it works I think it it works for compliance purposes really well and it it also works in certain cases. But when it comes to finding high impactful vulnerabilities, uh you think about authentication, authorization, uh how how does your application actually process customer sensitive data. These are things that um are very difficult to call out at the PR level, right? because it involves you to understand how the application code is changing how it is impacting the threat model of the application and you know where is the

data flowing from source to the sync all of that stuff and traditional SAS tools traditional tools they don't do a good enough job they search for keywords strings they build a all of that is fine but it it there's so much room for improvement here right so again just kind of highlighting on the PR review problem we've we've seen hundreds and PR that get merged every day. Uh critical vulnerabilities, especially those that actually matter to an organization, they often get missed or they often get ignored, right? Um and what happens is if you don't find these critical vulnerabilities and if you start finding issues and leaving comments in the PR, it starts to create this friction with

engineering organizations where the trust starts to erode over time, right? So it becomes really important to uh have a system that finds vulnerabilities and often proposes a solution that is able to fix those as well. So what is the solution here? Right? This PR uh security analysis problem is is a problem that we have faced and experienced over the years. So what is the pro uh what is one way of solving this? So in the privacy platform we have an agent and that agent kind of goes through multiple phases right. So the first phase is it extracts all the routes. So imagine in a PR you see that there's multiple API endpoint getting added changed you know like are removed.

So, so that's the first phase because and we initially wanted to focus on just authentication authorization vulnerabilities only because we wanted to see if can AI really find these high impactful issues, right? And we wanted to keep the scope really uh small. And I think this is also one thing that I want to stress is with any agent tech AI system starting small with a really small scope allows you to understand how these systems work and sort of uh do iteratively and and then it's it's it's basically a process where you as an engineer have to learn how the system works, how it breaks and how can you add guardrails to the system to get to the

outcome that you actually want. Right? It's more like outcome engineering. You'll hear that word quite a bit but I think it's so true with any AI based system is you focus on the outcome right you don't focus on how you got there I mean yes you do at times but I think the the eventual uh thing that people care for is okay did you were you able to find this thing before the product got shipped right so again uh stepping a step back we first extract all the routes we check for authentication related signals right so so now that we know okay there's a new endpoint that got added is it affecting any au authentication component of the system,

right? And like similarly, uh are there any roles that are getting added, changed, uh so on and so forth and doing the peer analysis over these multiple phases allows us to get signals that otherwise are just not possible with any traditional tool, right? And these uh phases are basically simple LLM calls where we identify candidates from the PR itself and then we send it to the LLM and we ask the LLM okay uh is there any authentication change happening or not right and then uh the fourth step is the multi-model consensus uh which is something I mentioned earlier is we just don't do it once we do it once twice and then we compare the outcome right this

allows us to sort of bridge the gap of the variance factor which you see in any uh like AI based system and then in the end we generate the findings right so how does a multimodel consensus voting work uh this is again this is like the early prototype version it is still uh working right now it is I'm sure there are different ways I'm sure there's a lot of research going on about how to get two models come to the same conclusion but if you just start small with a very basic simple concept is you take two models in our case we used sonnet and GPD um I think uh Right now we're using Opus and GPD again but the

idea being that you assign one model certain weightage right so in in this case set has like 60% GPD has 40% and we only generate a finding if both of them agree and the consensus is greater than 50%. Right? And I find that this approach has worked really well for us because if I run the same workflow two times three times the consensus system bridges a gap and we keep seeing the same findings again and again. So this is one way I feel um should be explored more in order to bring that a determinism in in any AI based system. So apart from extracting the routes and finding out authentication authorization things this is all something we've we

kind of learned over time is PRs have so much so many files so many things are changing tests are documentation right and if you just send the entire PR to the LLM you're burning so many tokens it's it's not going to be cheap by any means right so how can you be smart about it you can further reduce the noise by doing smart route filtering right you find out what like are there any health uh endpoints which are getting affected you don't want to do a security review on health checks right uh like similarly there might be some uh accepted risk in an uh in an organization which is okay and we don't need to review them so it's really

important for teams to know uh what is the threat model of what they're trying to address what are the keys to the kingdom what are some key components right and everything else you don't have to review so again it's like focusing on what really um matters to an organization. It's it's it's not about coverage holistically. Yes, we will get to the coverage, but I think when when you start building a system, the focus should be on what matters more. Cool. So, the first question you might have is okay, you have this highly complex peer analysis system. Does it actually work? And I'm here to say that yes, it does. Uh so far, and again, this

is uh 8 weeks out, but we found 30 true positives. uh four critical, 16 high, 10 medium and this is all in first party code. These are vulnerabilities that no scanner uh were uh like found for us and these are all vulnerabilities that were high highly impactful right all of them are authentication authorization related issues IDOR issues forced browsing uh horizontal privilege escalation all of these things are things that require a human engineer and an AI system to understand what's happening um and I've pasted some screenshots you can clearly see that um and and the way I we kind of implemented this was we did not block any PRs we we did not leave any comments

on the PRs. this was happening uh passively and as soon as we found something I used to manually reach out to the engineer and I used to have that conversation with them right like saying that look we found this do you think this is a true positive because I myself wasn't sure what AI found was actually a true positive or not that is something that you'll hear quite a lot is how do you verify what AI finds right you need um well you you have to start with human in the loop to verify and then over time when you've built the confidence in your system you can take the human out of the loop Right. So first uh I kind of

reached out and you can clearly see you know these uh findings are real findings that have real impact on the organization. All right. So so that was the first use case. The second use case uh is of the SAS triaging right um SAS triage or I I'm calling it SAS. You can think of static analysis code scanning. Um there you go. So the the problem with SAS again we all know uh you you have your SAS tool deployed it will find you know hundreds and thousands of issues. It will have its own severity rating critical uh high medium but as a security engineer when I go to that dashboard I'm overwhelmed already. I'm like what do I address? Where do I

start? There's 15 critical there's 100 high. Uh I obviously know I have to start at critical but there's 15 of them. Where do I start? Right? and just kind of instead of actually doing the work I spend more time in prioritizing again like when we say prioritizing what are we prioritizing do we even know those 15 critical from your SAS uh tool are actually critical cities or not right so these are some genuine uh problems in the SAS industry and that's why you know you we we hearing uh so many AI native SAS companies now uh finding things that just haven't been found before and there's a reason why that's happening um uh the SLAs get

missed consistently right with Again like with vulnerabilities itself you find something and mostly security engineers they create a ticket in the engineering uh backlog and then we we we assign SLAs to it right this is just how vulnerability management life cycle generally works and it is broken it has been broken since years so the the SAS problem has been there will continue to be there right and we've again we've tried to solve it in different ways we have uh you know we've tried to automate the triaging aspect of it. Uh we've tried to hire more people and these solutions just don't work anymore. So how do you solve the SAS problem, right? Um the idea is simple again. You

use AI and then you use code context. When I say code context, what I mean is uh if you're triaging a finding from a SAS tool, right? So so let's say it finds SQL injection, right? uh the SAS tool will show you okay this endpoint or this particular piece of code is not sanitizing the input and it might be vulnerable to SQL injection that's all you see in the SAS finding how do you know if it is actually exploitable how do you know if it can be invoked by an external attacker these are questions that cannot be answered by any SAS tool because they don't have the visibility of your organization and how the infrastructure works how the uh existing

controls work all of that stuff right and that's That's why it's really important to bring all of that code context and then use AI and then have both of them reason about okay what is actually happening here right um this is a real world example um that I'm showing here so um this is again like u if if if you think about code context right the SAS tool will say there's a SQL injection vulnerability on on line 47 right it looks scary might be critical but a human engineer still has to investigate And we might spend anywhere from 30 minutes to hours, right? There's there's uh I mean it's it's it's basically a problem. So at lift we use something

called source graph. Uh source graph is a uh I think they call it a code intelligence platform. It's basically a way to search for code, right? And they have an MCP server. Um again I'm not going to go into AI relative language like MCPS and whatnot but just be aware that uh source graph allows us to search code um by giving the right question the right prompt all of that stuff right so this is how it works and on the right hand side you can see that um this was a finding that our agent triaged by reasoning through multiple aspects right first it found out the risk assessment and again the the the uh the scores the

severity scores that the sales the SAS tools produce u those severities are are not what you think uh will uh you know apply to your organization because the SAS SAS tools doesn't understand what your priorities are right so the agent has the context so it knows uh what is bad what is not bad and then it evaluates you can see the authentication analysis right it first finds the endpoint it then tries to look okay what are all the areas that endpoint has reached is there any data uh access happening is there any customer data being you know sort of accessed or not and then it tries to search for all the ways uh we at Lyft want folks to

implement authentication right so it's aware of all the security best practices and then it finds out that okay on these alliance you're not implementing this that's why it's uh it's kind of vulnerable and then it kind of goes and checks okay is it actually externally exposed right because if something is vulnerable but might not be externally exposed that changes the game altogether you have to change the severity you I have to change the SLA, right? So, our agent is smart enough to sort of reason through everything and um again we have found true positives using this approach. So, these are some results right before and after uh we deployed the agent. Obviously, before uh we used to manually

triage these vulnerabilities. Um I also want to say something about consistency right depending upon what vulnerability you are triaging and who is triaging it makes a big difference right like if you ask a junior security engineer to triage a vulnerability u they might not have the necessary experience or the organization's IP to actually triage it accurately right so consistency was something that has been a problem right like we can't expect one person to triage all the vulnerabilities and uh it's very difficult to come up with a way the same way gets followed by multiple folks on the team. Right? So, so that's something and then context of depth, right? In in order to triage a

complex vulnerability in a microser environment where an API request makes its way through multiple microservices, you have to dig deep. You have to find out how how are different headers getting added. All of that stuff right after AI, you know, obviously there are no human involvement. The AI agents go and do the triaging autonomously and they give back the results. And at that point we as human engineers we just have to make sure okay did it reason about the finding accurately or not and if it did not can we change the prompt right it it was consistent AI agents just follow a framework right like there's no aspect of who is doing what and it was

pretty detailed right they can run 247 they're not human beings so you know the after we implemented AI the results were surprisingly better and I think it makes sense So this framework of SAS triage it worked so well at Lyft that I really wanted to abstract it out and I wanted to share it right because I I at least I didn't see any SAS vendor or any product platform that kind of approaches at least a micros service environment in a way where you know if you have a finding the the agent of the tool can reason about different repositories as well. So I open sourced a tool this I open source I open sourced this tool last week uh

it's on my GitHub you can see that but basically the idea is simple I created a a lab environment in that environment there are four microservices o service uh doc API front end app and and inops and I created a pull request in each of these repositories uh they're all out there if you want you can go check and then I uh asked this agent run vibes to reason about okay whether these PRs bring in any security vulnerabilities or not and the agent is smart enough to go through multiple phases. uh it first does a threat model of okay what is happening in the PR what can go wrong and then after after that it actually

goes and finds out okay is there any infrastructure uh security control that might make it unexploitable right so I this is an experimental agent it works well but if you're curious you know you can use cloud code you can use cursor just kind of understand how it works and apply the same uh framework cool so the third use case is AI powered apps sec design interviews. Uh this is where I actually first began my journey with AI, right? At Lyft, we are a team of four apps engineers. You will be surprised but it is true. Uh yeah, just four apps engineers taking you know doing basically uh uh design reviews, consulting, code reviews at times. So

and I was right like this is no way we are going to scale ourselves right. So I started playing with AI. Um well first before I go into the uh solution. Yeah. So these were the problems, right? Manual review used to take us days. Uh these specifications that our engineers used to come up with, they were like 25 pages long. I'm like, how can anybody reasonably expect to do a manual review of this specification in like a day? This just not possible, right? And these specifications had things about products, features that we had no idea about. I I I don't even know how this came onto a plate, but this is something that Abject teams have been doing for

years. I don't know how uh and I don't know about the accuracy and you know like people ask questions about okay uh you're you're using AI to do abstract design reviews is it actually finding anything and I'm like it's not about it's finding anything or not it's about okay you ask uh AI four questions about let's say authentication authorization sensitive uh data logging and it'll give you something that you can then use to go further prod right that first step itself uh if you can automate using AI why not so yeah we had some problems with the abstract design reviews and um uh we used AI to solve them, right? So, so like I mentioned this was the first

use case which I started with. So the first use case was extremely simple, right? You can imagine you upload a PDF or a a Google document to uh a website and you just ask it questions, right? Like how is authentication happening in this uh specification? Uh so forth and it worked really well, right? like I was able to save like the reviews that used to take me days now used to take me 5 minutes right where I would just upload the document I would ask it questions um again before and after previously it used to take us days only high priorities uh uh specs used to get reviewed and the the quality was again inconsistent and after we used AI you

know the reviews are much faster every spec gets reviewed now we have coverage as well and um you So the I think one thing that is worth mentioning is when we do these design reviews we weren't going too deep into what was actually changing as well right we used to ask basic questions like okay are you following the best practices or not and we used to and we just used to trust the answers that our engineers gave right um AI has allowed us to go deep into these sort of aspects and actually verify what the engineer is saying is true or not right so there's some advantages um as well of using AI that uh you don't often

pay attention to. Cool. So, so that was a v 0. Um for the past few weeks, we've been working on this new sort of v1 of the apps design review which looks much more complex but it works really well. So the way it works is we have a flask UI again. Um so like anybody can come and provide a Google doc URL to be reviewed. Uh we have a Lyft agent runtime environment. So at Lyft our our ML platform team has been there for years. They are pretty matured, right? So they have provided us this infrastructure layer where we as security we don't have to worry about okay where will our agents run, how will

they run, what about observability, what about all of that stuff, right? We just kind of build our agents. So I know I'm fortunate enough to kind of work in a company that has all of these. Most companies don't, but if we already have that then we should use that, right? So so that's the idea behind this. uh we have the agent runtime and then we we follow we basically uh uh drop a job in a queue the job gets picked up by a worker and then so in specifications you know you we've seen architecture diagrams there's so much stuff if you just upload a document and ask AI to find stuff it's not going to find it you

have to guide it you have to uh ask it questions like okay are there any architecture diagrams in it what is happening in the architecture diagram how is the data flowing right so there's multiple steps that this agent takes. It uh downloads the document. It analyzes all the architecture diagrams. It does a threat model. Um there's aspects of doing GDPR analysis, right? And then in the end, it kind of synthesizes all of it together and then we store it. Um this works really well. We've been able to find stuff proactively at the design stage that we just couldn't before. Right. Cool. So what are some things that we have learned uh over the past year or

so? Well, the first I already kind of covered briefly but uh you know without making things too complicated just use two different LLMs to find or to run the same workflow and compare the output works really well. uh code context. Uh if you have GitHub, if you have source graph, if you have any other tools, use that and find out if they expose any MCP servers, any tools, any skills and use that context uh to basically uh you know do context engineing around your agents and then structured output. So with AI based systems, this is one problem that I uh came upon pretty early is you know if you don't have like a defined structure

of the input and the output you're you're going to have problems where even the output cannot be parsed right. So having structured JSON output just outputs in any particular framework uh is really important to to know what your AI systems are doing and uh there are certain constructs like hooks uh like if you use clot code you know and hooks are ways where uh before storing the output you can call a hook and the hook will ensure the output actually confirms to the JSON structure you initially defined. So these are things that are really important in order to build a system that kind of follows the guardrails. Uh reducing the noise, you know, smart road

filtering I already covered. Um skills, I I I want to stress on skills. Uh if if you're paying attention to whatever is happening with SAS, apps in general, uh skills are something that have that have really changed the game, right? Like if you as a security engineer, if you can write down a skill of how you do XYZ and any coding agent can just take that skill and try to emulate what you as a uh human engineer would do and we have started to write skills of finding stuff of triaging stuff of ensuring there are different guard rails and it works really well right obviously with all of this there are security concerns as well

and I'm well aware of that like you can't just uh you can't just have a scale and say okay you can execute code in that scale right that that doesn't work well so using AI native constructs is important but doing it within guardrails is super important as well the fourth point here iterate on real vulnerabilities for prompt refinement this is something that I I find really helpful is is that if you're using AI to achieve an outcome right you are hopefully providing it a a a prompt Right? So, so it's like prompt engineering but you create this iterative feedback loop where if you don't get to the outcome you tell the AI okay this is the outcome I want to get

towards this is where we are this is the prompt help me refine the prompt and you do this two three times and you will end up with a prompt that works and um this is exactly how we were able to triage those complex authorization issues right first when we built the system it wasn't finding anything and I'm like why are you not finding anything you have all the code context. So it then told me okay you need to give me or you know like uh it it was missing one context. So I kind of asked it to change the prompt itself. So it's like kind of an inception right where you create this feedback loop where you uh ask it to do

something you verify the outcome and if the outcome is not what you want you ask it to change it again and it works really well. Managing trust uh with any security or with any organization, trust is something that can erode really fast if you're not paying attention to the kind of work you're asking your engineers to do, right? So managing false positives um you know creating a culture uh where securityurities folks are not seen as blockers, right? And more like enablers and working with the engineering counterparts. I mean these are things that especially with AI systems like if you're building an AI system for security analysis uh are you really blocking PRs are you are you really

adding more work uh for engineers to do when you haven't verified if whatever AI found is actually a true positive or not so these are things that you have to think through quite a bit and we had to do that um yeah complexity and cost right again with any AI based system you're burning tokens at the end of the day right and if you start sending entire peer diffs to the LLMs, it's not going to be cheap. If you run Opus, if you if you run any Frontier LLM, it's going to be super expensive. So, you have to be smart about it. Uh how how do you do that? I think I've covered a few, right? You you uh you do

some smart route filtering, you reduce the noise, you you you do the threat model, and then you find out, okay, what matters to your company, and then you only use those aspects with the LLM context engineering. uh you you might have heard of this but I think this is really the secret source right with with uh any AI based system uh you can find vulnerabilities and you can also fix them what will really matter is what context you provide right um yeah and I think one thing that we learned from from this sort of from this platform is that uh you know building something iteratively is is the right approach because we didn't want to just

build something and throw work at the engineers So you know just kind of we've kept the systems in staging for the longest time and then once we sort of build our own confidence in system we sort of uh upgraded it to production. uh bias to action. Um, you know, I think this goes without saying, but if you haven't been playing with AI, you should. And, uh, it is so easy to build prototypes and just throw it out. Like, it takes me 5 minutes to build something and then if I don't like it, I can build something else, right? So, I just want to stress that, you know, VIP coding is a thing. It is not secure for sure, but

it helps you learn things. And especially I think one thing with security uh engineers and folks in general is that we tend to build complex things but when it comes to presenting what we've built or to show what we've built we kind of struggle because we are not front end engineers and v coding allows us to build phenomenal UIs right so if you've built an agent you can easily v code a UI to show what the agent is doing um these are things that really helped me personally to gain traction like I it took me a while to get everybody on board this platform right and say that look it's finding stuff it's true positive uh I think what

really clicked was when I showed the UI like people could see what the findings were people could see the agent reasoning all that stuff and that really helped uh the third point is you know uh you can't expect everybody to be on board right you have to find the few folks in your company who want to explore who are curious so you go all right so we are getting close to the presentation um if you're curious right and if you want to build something similar or if you have questions. This is an advice I have, right? Like just pick one high pain manual process that you're doing on a daily basis. Uh build a small prototype with what with

whatever LM access you have. Uh start tracking the data that it produces. Find out what works, what does not work. That data is going to be super key in order to continuously improve your system, right? And then once you've built a prototype, ask your engines to use it. Get feedback from them. Continue to iterate on it. Then once you will end up with a system that works that few engineers use at that point it is a matter of scaling it up right how and while you scale up it is really important to uh take care of your false positives it's really important to get that consensus like I mentioned earlier it's also really important to integrate

into your existing tools right if you already have CIC CD systems in place uh if you've been building the system outside of that system it's time to integrate that in um Yeah. And then finally, I'll close with this. Um, AI is not going to replace security engineers, but security engineers who embrace AI will move faster, solve harder problems, and set the new standard for the field. And with that, I think I'm ready for questions.

Thank you. Awesome.

Thank you so much for the great talk. That was really insightful. And we have a bunch of questions. So, um, let's get started on those. How do you manage the context when PR is making a change in one but it affects the overall outcome in other people or systems? That's that's an interesting question. Um I think with any so I have some ways to think about it. I think the first way is with every repo, you should have some kind of a claw.md file or an agent MD file, right? And that file will have a rough architecture explanation of what the repo is doing, what that service is doing, right? And you could use that file as the context,

right? So imagine you have an agent that does PR analysis and every repo has that file, right? You could ask that agent to read that file first to understand what that repo is about and you could do a thread model of it. So, so that's one way you can go about it. I I don't know if that answers the question, but >> um yeah. Um the next question is >> do you block PR merging on findings from any of these security? >> No, we don't yet. Um I don't think we we block PRs at all. Uh I know some companies do. Uh my honest opinion is we should not block just because I think blocking PRs creates a lot of friction

and I think it really depends on company to company to be honest. It depends on the culture more so right like if you have a culture where security uh teams are looked as blockers I mean yeah sure you can block PRs but if you have a culture where security security folks are looked at as enablers you you don't want to block PRs you want to have that conversation with the engineer trying to understand what are they doing why are they doing it so I think that answer will depend on where you work how you work how the culture is all of that stuff yeah >> the next question is a good follow to This >> did you face any push back from

engineers with AI results? >> And how did you overcome this to build trust? >> Yeah. Um to be honest, I did not face any push back but what I did face was engineers didn't want me to pro into their code. when when I when we found these vulnerabilities, I actually proposed the engineers that we we we can use AI to fix them as well. And the engineers didn't want us to do that work. I got cur curious, right? Like why like if I'm trying to find and propose a PR to fix, why not? So, you know, I think that kind of goes back to okay, if we are finding and fixing stuff, what are the engineers going to do? So, it's

more about the identity crisis more than the push back. But I think what I did get feedback was that when we started finding these vulnerabilities, the engineers wanted us to run the same analysis on all of the repositories. So that wasn't a push back. That was more like, oh, we like this. >> Um, the next question is, do you see value in PR security review to be separate from general PR reviews? >> That's a great question. uh honestly speaking the the way our industry is heading just kind of the software industry in general I don't think security should be its own thing I think security should be folded into quality functionality correctness all of that

stuff right so just have one code review process that covers security as well and if you're following you know best pract like if you if you can bake in the security guard rails in the beginning itself the PR review process is only going to make it easy because now you can verify whether those guardrails were implemented or not. So to answer that question, I think security review process should not be a separate process. Yeah, >> awesome. Um, how are you ensuring the tech specs are high quality and detailed enough to not be garbage and garbage out? >> Yeah, that's a great question. And so I think the answer to that question is when we were first doing reviews I think

that was garbage in garbage out to be honest and I I hate saying that but uh I think the expectations were unreal right like how can you expect us to find uh an authentication sort of uh risk by looking at at a text which is 25 pages long which has no information about endpoints right so I think the process is only improved and in order to keep up with the quality I think context engineering is going to be key, right? How and where are we getting the context from. So one feedback I got from our engineer is that the the outcome from the text review is awesome but they wanted us to dig deep. So I asked them

okay you give us the context we we know where the context like we don't know where the context is. We know we have the context. So I think at some point in order to increase or improve the outcomes you have to context engineer and you have to work with your engineering teams to know how they work where their systems live where you know how they build software essentially. Thank you so much Anjaman.

[ feedback ]