Everyday AI: Leveraging LLMs for Simple, Effective Security Automation

Name: Everyday AI: Leveraging LLMs for Simple, Effective Security Automation
Uploaded: 2025-10-30
Duration: 18 min 40 s
Description: Everyday AI: Leveraging LLMs for Simple, Effective Security Automation Matthew Sullivan, Dominic Zanardi Anyone can build simple LLM–based tools that streamline security tasks. Join us to learn how, with short prompts and very little code, you can do more with less by automating IAM, threat detec

BSidesSF · 202518:40149 viewsPublished 2025-10Watch on YouTube ↗

Speakers

Matthew Sullivan Dominic Zanardi

Tags

StyleTalk

About this talk

Everyday AI: Leveraging LLMs for Simple, Effective Security Automation Matthew Sullivan, Dominic Zanardi Anyone can build simple LLM–based tools that streamline security tasks. Join us to learn how, with short prompts and very little code, you can do more with less by automating IAM, threat detection, and vuln management workflows. Get tips and prebuilt, used-in-prod examples to play with on your own. https://bsidessf2025.sched.com/event/0bcf48174e2ad39f3521603ffb42346f

Show transcript [en]

All right, folks. We're going to get started here in just a minute. A few quick reminders. Um, coffee stations are only available until 400 pm today. So, you got about an hour and two minutes left to get your caffeine fix. Uh, and then if you need to take a break from the day's events, please stop by the bar and chill out space sponsored by Run Zero. Two complimentary drink tickets were provided to you at registration. We already paid for them, so please use them. Both NA and alcoholic drinks are available. Uh there are prayer and mother's rooms uh for those who need it. Please see the info desk staff upstairs for people who would be happy to help

you help guide you there. And Opel is sponsoring head shot all day right outside of the talk tracks by concessions. Go get a new headsh shot for your LinkedIn. Uh please note that head shot end at 5:00 p.m. today. And uh without further ado, uh Dom and Matt are going to start their talk titled Everyday AI: Leveraging LLMs for Simple Effective Security Automation. Take it away, guys. All right. Afternoon everyone. Uh my name is Dominic Zennardi. I'm a member of the infrastructure security team at Instacart. And joining with me on stage is Matt Sullivan, our former team lead, who's since transitioned to Figma. So, thanks for coming to learn uh about everyday AI and a special thanks to

Bides SF program committee for giving us the opportunity to share this with you. Uh we have a lot to share today and everything you see has been open sourced. Links will be provided throughout the presentation. So, take a look at the shortcuts in the bottom left um of every example slide. We want to encourage everyone to follow along. So, if you have your laptop out, go ahead and navigate through some of the solutions we've built. So over the past 18 months, we've been on a journey to leverage LLMs to substantially scare scale our security program and scale them in some way. And scare them both. By using LLM, we've unlocked this next level of automation,

a level that surpasses what's able what's possible with deterministic code. At the same time, as a security team, we like to lead by example, staying mindful of guard rails and the types of data that we're setting up to these providers. So, we're here to provide some tips and tricks on how to play it safe. So, today we're here to advocate for a methodology of scaling security through LLMs and in ways that some people might find a bit alarming. Uh the fact is though, security team surface area keeps widening, but budgets and headcount are not. How many times have we grumbled, "Oh, things would be easier with just two more headcount." Surely nobody has ever thought that in this

room before. Uh we must automate to keep up and LLM have helped the team at Instacart do just that. Today's talk highlights some of the work that we've done to specifically automate in the gray areas in which automation was previously impractical because of non-deterministic inputs. We hope to inspire you but also send you home with real world code to play with and utilize within your own organizations. Some of the automations are specific to Instacart's vendors and techstack. So, please be aware that you'll need to tailor some of these solutions to your own technologies. Finally, we're here to take a strong stance against fear, uncertainty, and doubt in security use cases and Gen AI. Dom and I have heard

all of the excuses from vendors to auditors to thought leaders on LinkedIn who haven't touched code in two years. We dismiss these concerns wholesale and assert that when built, tested, and monitored properly, automations like these are much better than just relying on human human expertise alone. All right, let's talk about getting started. And I promise we won't take too long on this since many of you are already past the we're trying things phase. Our most important lesson here is don't be intimidated. Playing around with LLMs and security actually has a low barrier to entry. We'll detail more on that in a moment. The beauty of using LMS is that they're flexible. For years, we've built security tooling to reach a

conclusion of these strict booleans. What is true and what is false. Now we get not only the helpful decision, but the great reasoning behind it. So we'll take away three pieces of guidance. Simple things can only be built or can be built with only a few sentences. When code is actually needed, it's only typically a few lines of Python or you can find the same level of success using a no or low code solution. You can al and three you can let the automation QA itself simply by using additional calls. So, you might be asking yourself, how do we actually plug an LLM into existing processes? This might be especially true if you aren't a full-time software

engineer. Uh, but at Instacart, we use two methods to utilize LLM responses in production. The first is the classic choice, pure Python running on a Lambda or hosted Docker container, pulling in the OpenAI library or just the plain old request library. The second option is to use a workflow builder as middleware. At Instacart, we use Tines as a sore platform. You might hear us mention Tines uh throughout the presentation uh because some of these examples are Tinesbased. This is because we know this platform well and they offer a community-based free edition which will allow you to import our examples and play with them. Matt, how do you feel about these new models? The new models?

I feel great about the new models. Look at these shiny new models. These new models have been coming with some really exciting abilities, and every upgraded version seems to add something new that makes them that much more useful for security tooling. We wanted to provide you with a toy automation that you could immediately start playing with. So, we've created this dead simple TE's workflow that checks the weather. We've shared it on GitHub so that you can import it into Tines later and play with it. This toy app uses OpenAI to search the internet for current weather conditions in any city. This ability to search the internet for information via OpenAI's API is only a few weeks old.

Tines and workflow builders like it have the ability to be invoked by web hooks and to return JSON responses. That means that you can create your own REST APIs powered by LLMs in just a few minutes. So the URL on the left is the code for this workflow and the URL on the right is the REST API endpoint. If you have any interest in trying it out, it's fun to imagine a possibility here of excuse me, it's fun to imagine the possibilities here. Your automations suddenly have the ability to browse the internet and it unlocks some really incredible opportunities. This is especially true in the realm of detection and response. Obviously, I'm excited about the opportunities here, but at the same

time, Dom and I are very cleareyed about what LLMs and AI actually achieve today. Many companies, especially in technology, believe that AI should be making every software engineer quadruple their output. And Dom and I are here to unequivocally state that's not going to happen. That's not going to happen unless you have already invested in great quality and great hygiene. The same is true for applying the technologies to your security program. We started from a highquality base and built further from there. And this fact is what has allowed us to experience a bit of an AI revolution within our security program. But make no mistake, you must crawl then walk then run. It is an extremely exciting time to

be working in this space though. Over the past few months, flagship LLM models have expanded their context windows so much that now we're able to process hundreds of IM statements at once in a single shot and be able to reason through very complex changes and refinements. Preview, you're going to see some of those in a second. We believe that in three years, the entire an entire company's codebase might fit within the maximum context window of a flagship model. And that unlocks some really exciting opportunities. Make no mistake, we didn't think that AI was simply cool. So, we're finding a solution in search of a problem. Far from it. Our team was experiencing real pain points around human identity

security. The common theme in our problem space revolved around volume. Tens of roles, hundreds of daily access requests, thousands of employees, millions of permissioned objects. Does that sound familiar? How could a human ever possibly perform identity engineering at scale? So, LLMs have become our force multiplier. We're going to look at how a few simple but high impact examples of an Instacart um will solve issues that had previously been proven hard to correct.

Have you ever experienced an access approval sitting for days on end while your manager gallivance around South America trying to find their truest self? At Instacart we've built an access control program in which 95% of requests are that are issued through our identities ah excuse me we have built a access control program in which 95% of access requests made through the ident identity governance tool are auto approved with no human intervention and for more on that uh feel free to catch the recording of our upcoming RSA talk times up on standing access or if you happen to have an RSA ticket catch us on Monday morning at 9:00. Um, for the remaining 5% of requests

that we aren't fully able to automate, we still want to have a really excellent user experience, including in situations where an approver may be out of office for a period of time. Can we use AI to improve the experience of these forsaken few that slip through the cracks? Of course, we can. And this is a perfect example of using LLM with a really simple no code platform as well. Because Instacart has a flexible time off policy, our source of truth for who is available largely comes down to their Slack status, we realize pretty early on that we could run an LLM evaluation against the Slack status message and reason through the likelihood that the approver will be available to take an an

approval or denial action. This works for obvious cases like I'm on vacation. Uh, but believe it or not, actually works really well for less obvious cases such as, and this is true, a emoji of just the skiing emoji person. In addition, these statuses can now be dynamically assessed for duration. For example, out of surgery is much more likely to extend beyond out for lunch. More importantly, I thought that this I thought of this idea at about 10 in the morning and it was in production by lunch. That is the true power of using no code tools plus LLMs. Dom is going to take us through a bit more of a complex example where user behavior is less

ideal. These complex challenges are where Instacart's really seeing the benefit to customuilt AI tooling. For a long time, security practitioners and infrastructure professionals alike have had to assess user requests for various cloud resources. Sometimes you get a good justification statement with a paragraph of detail, a full resource ARN, basket of freshlymade cookies, but more often you get Yes, this is a real request. What bucket? the one you fill with excuses to your auditor if this request gets fulfilled. The fact is users users aren't always familiar with the environments, the roles, and the effective permissions in between. Let's imagine a request that says I need to read from the orders table. Can we use AI to determine if the

user is requesting the most appropriate role for this need? In order to do that, we need to do a few things. We need to translate the resource that's being requested. So they asked, I need to read from the orders table. Okay, what is table in an AWS environment? Is it a Dynamo DB table? Is an RDS database table? Does it exist? Have they spelled it correctly? And what environment are they referring to? This all takes time. And the traditional automation methods can only do so much fuzzy matching. Now we can take an LLM send limited contextual data and identify the most likely match in seconds. But what about resource identifi? What about after resource identification? The question then

becomes what roles are this is this person eligible for? This multitude of steps could take hours depending on the complexity of the situation. Instead, by using LLMs, we get an answer in seconds, one that can be piped directly to the user at the moment they make the request. Let's see this in action. I'm requesting a role that I have a need for, but that isn't well aligned with my job function. With LLM, we can propose the policy as code changes needed to get me going and then raise that for human review.

When you have conditional policy as code, these assessments can now be made in seconds. And the final decision, the merging of that PR is where we keep the human in the loop. You want to tell them about how we maintain roles moving forward. I would love to tell them about how we maintain roles moving forward. Except for I can't get the slide to advance. You can do it. I believe in your slide deck. Is that not going There we go. All right. We've created an automated capability to remove unused permissions from roles that are defined in Terraform. To be clear, we're aware there are a plethora of security vendors uh who are willing to provide this

functionality of cutting out unused privilege. Uh however, very few if any to our knowledge support doing that on a long-term basis by fixing the issue at the source by modifying the terraform where it lives. So we created an intimate endto-end automation that allows us to schedule runs quarterly and requires much less uh subject matter expertise in reviewing cloud trail data or enroll engineering. Fully automated, fully awesome. I'm liking what the outputs from the LLM do for us, but what considerations do we need to make about the outputs going in? Dom, thanks. So LLMs know the cloud, but they don't know you. Many LLMs know every major cloud provider's IM structure inside and out, but they don't know your org or the

data within. So it can be tempting to overfeed it with context, but be careful. Do your custom AM policies include user email addresses. Is it necessary to provide actual account identifiers? probably not. So, we need to sanitize what we're sending in. We know you know, but just as a reminder, tokenize your PII names, emails that might be within with non values and reverse those on the return. In this way, no sensitive information is sent to the LLM, only random identifiers. Consider using a library such as Microsoft Presidio to help you with this task. And as far as prompt injection, keep in mind if you're going to allow direct user input, the output of the LLM should

be considered untrusted. We only take user input sourced from other applications. Basically, we're creating glue from one platform and using it on another. As you saw with the IG justification statement, if you have dedicated platforms with user inputs, use those mature platforms to your advantage. Companies have been have spent years working on input sanitization. So, we don't need to reinvent the wheel. Again, there shouldn't be a need for a user's open-ended direct input into an LLM API. But what if the LLM hallucinates? It's happening less with each new model, but it's still a risk. So, sanity checking and the levels of sanity checking kind of depend on the use case. For the Slack use case we talked about

earlier, the stakes are pretty low. If the LLM misreads an approver status, the approval just sits a bit longer. Maybe it gets bumped, but there's no real harm. We we do believe it's important to keep a human in the loop for more advanced cases. From both an audit and an operational standpoint, in our humble opinion, we haven't arrived at the point of handing over critical decision-making to an LLM. Our recommendation is to let the LM provide better context and recommendation and then you or your deterministic code base need to be making the final decision. When it comes to our comfort level about letting the robots run free, we've generally decided to automate everything up to the point

of merging a PR. Generating code, go for it. Creating a branch, fire away. Pull requests, sure. But merging a PR, that's a full stop. So this is your call to action. If you haven't started building end to-end solutions with these technologies, now is a great time. The tooling is ready. Identify the pain points. Think big. Collect metrics. Be ruthless in using LLM to design solutions that make your users happy and their experiences with security enjoyable and keep security safe. Keep your organization more secure in the process. Use AI to do in one API call what it used to take an experienced engineer a day to build. Embrace this once in career opportunity that we have

and go make something with ID with ID toolbox specifically. We have published all of the solutions that we've discussed here and we will continue to develop solutions and publish them to that location. We encourage you to take a look and to contribute. That is all we have for today. Thank you so much for joining us. At this time we have about 10 minutes for questions.

Everyday AI: Leveraging LLMs for Simple, Effective Security Automation

Related talks