Prompts to Production: Building Effective Security Automation For Everyone

Name: Prompts to Production: Building Effective Security Automation For Everyone
Uploaded: 2026-05-11
Duration: 40 min 10 s
Description: A practical tour of using LLMs for security automation, from no-code chat prompts through documented prompts and agent skills to full agentic workflows with orchestrators and sub-agents. Draws on hands-on experience automating vulnerability assessment, security reviews, and SAST rule generation, wit

BSides Budabest 202640:1057 viewsPublished 2026-05Watch on YouTube ↗

Speakers

Jozsef Ottucsak

Tags

CategoryTechnical Tooling

StyleTalk

About this talk

A practical tour of using LLMs for security automation, from no-code chat prompts through documented prompts and agent skills to full agentic workflows with orchestrators and sub-agents. Draws on hands-on experience automating vulnerability assessment, security reviews, and SAST rule generation, with candid notes on cost, failure modes, and where human oversight is non-negotiable. Includes personal projects like reverse-engineering ARM firmware and building a mobile security rule set.

Show original YouTube description

Jozsef Ottucsak - Prompts to Production: Building Effective Security Automation For Everyone This presentation was held at #BSidesBUD2026 IT security conference on 29 April 2025. Short description of the talk: Security teams today face an impossible equation: expanding attack surfaces, always increasing work volume, and chronic talent shortages. While automation has long been the promise, the reality often involves fragile scripts that break when APIs change and complex tools that require specialized expertise to maintain. Large Language Models (LLMs) are reshaping this landscape, offering new ways to automate security tasks across the entire complexity spectrum. This talk takes a practical journey through automation approaches using LLMs, from simple no-code solutions to sophisticated agentic workflows. We’ll start with no-code interfaces where natural language prompts become powerful security assistants—perfect for analysts who need quick answers without writing code. Then we’ll progress to low-code solutions using agentic IDEs like Claude Code adding structured workflows while still maintaining accessibility. Finally, we’ll explore full agentic workflows where autonomous AI systems coordinate multi-step security operations, from vulnerability triage to incident response orchestration. Drawing from hands-on experience building and deploying these systems, we’ll examine the trade-offs at each level: when simplicity beats sophistication, where guardrails become critical, and how to match the right approach to your team’s capabilities and risk tolerance. We’ll cover real-world use cases including automated vulnerability assessment, security review automation, and SAST rule writing — highlighting both success stories and hard-won lessons about limitations, failure modes, and where human oversight remains non-negotiable. Attendees will walk away with a practical framework for evaluating automation opportunities in their security programs and concrete examples they can adapt, whether they’re just starting with AI-assisted workflows or ready to deploy autonomous agents at scale. About the Speaker: Jozsef Ottucsak is a seasoned Product Security Architect with over a decade of experience in secure software development lifecycle (SDLC) initiatives for on-premise, hybrid, and cloud-native applications. Currently serving as a Staff Product Security Architect at Diligent, he specializes in enabling developers to build secure products by establishing security requirements, designing secure-by-design processes, and providing technical guidance. https://bsidesbud.com All rights reserved. #bsidesbud2025 #cluster #aws

Show transcript [en]

And now we are back. We will hear prompts to production building effective security automation for everyone. So please welcome and give it up for Yf Otchak.

All right. So thanks for having me. First promise to production building effective security automation. This is the title. there will be some slight changes but first let me introduce myself so this is me I've been in software cyber security um in the last 12 years 13 next month uh I've worked at cool startups scaleups globally like uh blog me last pass um right now I'm at diligent so I love AI the technology that that includes like machine learning and uh like basically all areas not just generative AI. Uh I love it so much that during COVID I did uh data analytics and machine learning master's degree because I was bored and I also love cyber

security. So I love getting the two together and uh merging and seeing what uh we can do when we mix those two. So if you are a security practitioner, you have probably heard about the MTOS preview uh the entropic red team how it's so good uh that uh Chinese are using it to hack like western uh companies and uh you can see uh that this kind of influences how the market share or market price for cyber security companies are valued at the moment. So if you listen to the industry, the hype people, they say that uh cyber security is dead and agentic security is uh the best thing since sliced bread. So what we're seeing is like a paradigm shift

that kind of tries to tell us that there is a new way of working. Uh and if you see through the glasses that we currently wear, uh this is true because AI is generating massive amount of work for everyone involved. So you kind of have to keep up with that velocity if you are in security. So by introducing AI models and agentic workflows into the picture you're getting a lot of work coming in and uh you have to address that somehow. So the security teams basically need to scale. So the goal of this talk is uh two things. To basically expose you to how you can use agentic workflows uh AI generally for for your work or for personal goals as

well. And because this would be kind of boring if I just did that because you know everyone publishes talk about uh how to scale up security using AI and uh I'm going to give some personal information about how I set up my own home lab and how I use AI generally in my work and my personal life. So to kick things off, we are going to go with some primer on AI tooling processes history to just understand what's going on behind the scenes. Uh we're going to talk about happy paths problems and tips and tricks that I saw during my AI use and uh yeah just some exposure on uh how I use AI at my work

and on my personal projects. So yeah, first thing first. So generative AI is about generating stuff like uh it's basically behind the scenes predicting uh text using mathematical patterns learned from vast amounts of data. So you train the model it kind of establishes an internal abstraction of the world as how it sees it. you feed it some prompts and then it's going to generate some output that uh that is likely to be uh the output based on uh weights and probabilities that has learned from before. So long story short, it's a fancy text generation uh like application. So that was how it all started. Then bright people decided that uh if you generate like an intermedi layer of

thinking like breaking down tasks into smaller steps and teaching the model to act similarly then you can get better results. So this is the reasoning and reasoning is often blocked from the user. So whenever this kind of happened uh that that was already a shady move from the model providers because by generating reasoning tokens you are doubling tripling quadrupling the price because uh the model is generating output and it's usually hidden. So those are tokens that are generated but you're not paying for it. So the user can only see that okay the thinking model works faster and well not faster but better but they don't see uh the costs that kind of happen behind the scene. So

these were the first two things that kind of revolutionized how AI works and then we move on to tool use um and uh resource uh something generation I don't remember mile so so far the models weren't able to reach out to the outside world they couldn't interact with the world so we basically had to invent some kind of uh methodology for the runtime to uh execute ute certain things based on what the model generated and that allowed the model to act. Then we have things like multimodel models which allowed the model to see images uh basically transport them into uh vectors that it can understand. The same thing for audio, video, you name it. So what this allows for a model to

act based on what it sees, what it hears, uh text to speech, whatever. So we are not restricted to text right now only but other means of communication. So after that and now we are currently arriving uh where we are now uh we have computer use. So computer use allows the agent or the model to interact with the desktop. Uh like previously you could use the chat interface which is I don't know not not sure how you like it. I don't uh I like the command line interface so that works for me but not for the average user who want to kind of use like a desktop app. So that that way uh someone who is a non-technical person

can use an agent to do some of the work for them by interacting with their desktop like troubleshooting their Wi-Fi or using their browser to I don't know uh interact with a database that wasn't exposed previously. So, this kind of allows to automate some of the workflows that weren't possible because maybe there wasn't an API to interact with before or maybe you just couldn't use it. Maybe you need to uh log into different machines and extract data from there. So this computer use kind of allows some of the legacy systems to be put together in order to to automate some of the work that you weren't able to do it before. And the last step where we are

currently uh so far is the enterprise AI workflows. So this is not a new thing. This is like uh those noode solutions where you click together a workflow and it executes some of the stuff but uh so one crucial difference between that and this is that it's kind of fault tolerant. So when you orchestrate it with an agent then it can prefer like it can prepare for failures and you are able to make sure that uh the execution still works if something don't look right maybe the output is not what you've expected and previously with like stricter more defined automation this kind of break the entire pipeline but now an agent can do small uh error

correction like autonomously. And uh how we can deploy this uh well we have local when you host the model host everything at home you have in infrastructure as service and APIs. So all of the frontier AI labs offer APIs and all of the yeah all of the cloud providers also offer some of these models through API. uh you can rent GPUs and deploy your own openw weight models if you want through infrastructure as service or platform as service and you can also host this locally. So why you would host this locally is data control like if you control the data if you control the inference then that data doesn't go out anywhere that you don't want to. So

this is pretty much depends on your use case, your risk tolerance, whether the data needs to be tied to your geographical region, but uh arguably if you want your data to be in trusted hands, then you can host your own. The resource efficiency right now is very shady when it comes to the market economics of this whole thing. So local is very expensive because you're not subsidized by VCs. So you have to pay market prices for GPUs unlike uh like other large companies and uh that kind of makes it expensive. Plus you don't have the economies of scale uh like you have it with uh infrastructure as service or API. So you always end up

being better uh by using an API because it's financed by VCs. So if you want the bubble to pop faster then use those services which are kind of subsidized. So the free services uh the coding plans uh I will have a really cool image about how how it's uh complete madness uh how these model providers act. So hosting your own kind of means that you have to manage it and maintain it. So just just before I arrived here, I looked at uh at uh Twitter and saw that there is a light LLM SQL injection vulnerability and I have a large language model proxy at home because uh I'm yeah I will tell about it later but I basically had to

patch it because there was a SQL injection vulnerability it. So if you kind of put these and host on your own then you're going to have to face some headaches around fixing these things because you know bit rot maintenance security this is not a thing that you deploy and just leave it uh otherwise it's going to be like an insecure mess. The other thing is is that when you're using uh infrastructure as service or local inference then you are basically stuck with open weight models because uh that's the only ones you can use like you can't access for example uh entropics stuff just uh just what's available and uh on hugging face for example. So here is my local setup that I I kind

of did. It's uh everything is repurposed. This didn't start up as a as like an AI kind of thing. So, I have an agent runner which is a refurbished HP workstation. It's uh it was cheap uh super cheap like uh these refurbished uh X business machines are stupid cheap for what they do. I replaced uh the GPU with uh basically the latest model that uh the PSU can steal power and replaced the RAM within it. So this is much more powerful than uh Raspberry Pi but consumes more resources and I couldn't really justify uh a Mac mini for this setup but otherwise I would have gotten this. So how I do things with it there

is a traffic reverse proxy uh offering authentication authorization HTTPS so that I can use HTTPS uh even on my local network. I have cloud coding agents running through client conbon. Uh I will show a picture of it later. I have a light LLM proxy which uh uses one token uh connecting to uh my cloud provider like my inference provider and multiple virtual keys. And what it basically does is I have dozens of dozens of coding agents who who kind of do things autonomously and if I would use just a single API key or multiple AP API key, I would face rate limiting issues. But now uh the LLM proxy basically handles this for me so I

don't get timeout problems or rate limiting errors. and uh I can just launch it however I want. Then I have an open web UI that I don't really use but I deploy it because uh why not and I have an open CL just to manage some of my stuff. Uh so that's pretty closed down like I I only connect it to my uh my messaging service. So there's no attack surface there that uh I made sure of that. Uh so in the next steps for me with this setup we'll be evaluating pi the coding agent because that's like much smaller uh pollutes the context less than other solutions and uh looking at her agent and I also have a local AI

inference machine that used to be a gaming PC but this is more for experimentation like I don't uh it's too too slow to be actually be usable but I can load m like I can load models and kind of experiment with them Uh so this is like a great learning experience for me to basically deploy this infrastructure at home. So this is how it looks like. Uh this is how I add tasks to my coding agents. Instead of uh like chatting with the application, I define tasks. Uh I can set up like uh dependencies between them. So when one task is completed, it move moves on to the next one. So that's kind of how I do it do it and I

also use some paid services uh like notebook is really great for like understanding documents and the good thing about it is it's free so far. So I'm going to take advantages while that happens. I don't upload my own documents that for local but for I don't know manuals uh documentation whatever it's really great for that and it's free. So GitHub copilot had a pretty good decent free tier that was eliminated or severely cut down. Kubang code also had a pretty good free tier with thousand requests per day. Uh now it's only 100 requests per day. Uh I use Google Gammini Pro for any of the usual chats or or brainstorming or that type of work

because for conversations it's good. Uh for work rellated stuff I tend to use bedrock uh with anthropic models because yeah why not? So and for my personal stuff I use Z.AI AI is legacy GLM coding plan pro which uh which has also been eliminated. I can still use it until next year. Let's see about that. But if you look at my token consumption, that's like 6 billion tokens uh and six 4 bill800 million tokens purged in the last 30 days. So, uh, if you translate that to entropic, uh, dollars, that's like $40,000 worth of tokens burned, uh, through API calls. If you translate it to GLM's coding plan, then that's around $6,000 worth of token consumption, and uh, I

paid $11 for it. So, you can see that the economy doesn't really work out here. Uh and whenever I need to basically fine-tune models, I just uh rent the GPUs through Rampod. Uh they have like really cool uh interface and certain things are already set up there. So I don't need to work with uh building the Docker containers from scratch. So how I use chat? Uh chat is like the basic building block. uh it's a good fall back to everything where you don't really know what to want what you want to do but uh like chat is still something that's worth exploring. So if you're you if you're brainstorming or trying to find something in a code base

then chat is basically a good place to be because you don't define like a huge question up front. You just start exploring the codebase like one by one. Maybe you want to research something. You don't really need to invest too much time if you use chat uh up front because you could just like explore things and once you have everything formalized then you can convert it to like an agentic flow or something. So how I use chat in my uh dive workflow is uh I dive into implementation specifics. Um, I work for a company that has plenty of projects. So, I need to be right on top of like recognizing uh certain pattern or going

in and finding uh the fastest way to uncover vulnerabilities or coding best practices and uh we are talking about large code bases here. So I kind of need to jump in and guide the model to uncovering these things or help the model help me uncover these things. And uh and it's really stupid. So it doesn't really work the way you would expect it to work that it you go into the code check with it and it prints out these uh critical findings because if you use it then you will find that everything is critical. So everything is high critical and there is no mediums or lows. So you kind of have to think your way

through again when it comes to these findings. Um so that's why I asked questions like okay but does it exist in production or is it just a dev script that is running during the builds where it doesn't really interact with the outside world. So you have to explore those things and the models will help you find these things. Uh so what it's also really good at is like creating these oneliners, these uh quick shell scripts that uh for example delete certain files or converts certain formats into other formats. Uh I I'm lucky that we also have it integrated with uh most of our company data sources. uh the AI uh host that we use. So I can look for Jira

tickets, confluence pages, whatever. And then again, you have to find the right answer. It gives you options, but you still have to find the right answers. So the next evolutionary step after uh chatting was documented prompts which were basically like chat inputs uh put down into one single files. So this is for standardizing some of the workflows that you have within the company for example or not being not needing to remember these things because you have it somewhere saved. Uh these can be version controlled and you can call it basically anywhere or you can just save it in your into your notebook copy paste and execute them again. Uh so what we have uh right now with documented

prompts we have uh documented thirdparty reviews like what it needs to be executed during a third party review. Uh rule generation is something that it can be used for example for uh open grap rule reg uh open group rule generation is something that I I use this for. uh you can standardize an incident response format like uh in your documented prompt that okay I always want this to I always want it to look like this. So you can put this into a prompt and uh be happy about it. So I don't use them. The reason is pretty simple. Uh I migrated everything to agent skills. So agent skills is a couple month old new advanced plan. We are still talking

about text files here. So nothing advancement is really here. It's just another format that you have to work with differently. And when I need to do something that is like a repeated task, I tend to create a scale for it. If it's like ambiguous about what's going on there, like how it may go into a different direction, it's loosely defined. uh when something is really simple like uh kicking a service or updating a service and uh just like you know always updating a service I write a script I create a just command uh so just is really cool for uh like all of those tasks that you want to invoke like stupid easily and whenever something

needs some level of persistence I create a container or a quadlet which is like a a container that runs as a service. So the next level of uh evolution is hooks. So hooks hook into your agentic CLI experience. So let's say that you're editing a file and you don't want that file to be edited, then you create a hook that prevents that from happening. Like if you don't want the agent to read out secrets, then you create a hook that prevents that. So these communicate via JSON uh or exit codes and you can automate tasks that would otherwise consume like a lot of context from the agent or the memory of the of the discussion. So I use it to automatically

trigger certain things. But what you can do is load up like environment variables, run code, uh verify signatures, verify the validity of something. So it's uh it's kind of how I use it is I put deterministic checks within the agentic uh process. Uh so I use a lot of hooks in my development and operations projects. it kind of makes it harder for the coding agent to turn everything into an unmaintenable slop. So that means I'm running like a formatter, a llinter, security scans, SCA. So everything that you would kind of do in like an enterprise software development environment, but with AI agents, it's so much cheaper to generate them. So even for your like crappy hobby project, you

can write automated tests, negative tests, you can ensure that the firewalls firewall rules are collected defined and uh like tested statically and dynamically uh rebuild knowledge maps. So it's it's it's a pretty cool way to work and you don't need to pollute your uh agent MD file with all of this uh unreliable way of defining workflows because you just say that when I'm modifying this Python file always execute uh static analysis on that. So I love it. Then we have the skills. So skills are the standard ways of uh sharing capabilities. So you can pack up uh like prompts, resources, assets, whatever you want and uh the only thing that's required is like a special formatted

skill file and then the agentic ID or CLI or whatever you are using can include this and uh basically use it as a resource. And I only use a couple of carefully betted skills because it's uh it's it's horrible what's going on out there. Uh I use superpowers optimized. So this is like a smaller version of the common superpower uh skill that is uh bundled with uh cloud code. So it allows like certain it it has like certain predefined workflows about brainstorming uh testdriven development uh troubleshooting bug fixing using sub aents whatnot. So I use it for that. I use graphify. Graphy kind of builds a a graph uh representation of all the documentation and all of the code that

you have. So this makes uh it much cheaper to for an agent to retrieve information without burning so many tokens. Same with RTK. This simplifies and groups information where the output of the tools are too verbose. So the problem with some of these things is that uh most of the skills that people have written is AI generated and slop like they focus on the wrong things. they just ask the AI to generate something that looks good but doesn't really help with uh with the actual problems that the AI has. So you have to really look at the workflow that you're doing and just fine-tune it. U so these are like natural language instructions. So if you can't be

bothered to write that then you probably should not execute AI workflows. Uh but then again focus on the things that matter. Uh I would just recommend building your own or just taking when you download a skill from the internet always glance it through and delete the lines that you don't need. And so one of the coolest things that has happened to AI is the sub aents. So if you're not familiar with sub aents or how agents work in general. So the previously agents had like uh a shared context and you try to do everything with an agent and like the agent has this amount of memory and you try to put in so many things then first the agent's

going to work less accurately and worse. Uh but then again if you reach the context size uh then it will crash. it will simply not work anymore. So sub agents can be used so that you can offload some of that cognitive uh functionality to other agents. That means like uh you have an orchestrator agent with this much context. Uh you you have it calling a sub agent doing some task that for example would almost deplete the entire context. Then instead of like putting everything back into the orchestrator agent, it simply returns with uh the output that you need, which might be like a single uh yes or no answer. So you can basically delegate a

lot of the work and work with larger code bases, do more complex tasks and uh as some of the frontier labs also charge based on context use. So if you use two too long discussions or too long uh agentic loops then you're going to end up paying more. So I use sub aents for everything um because it it kind of allows me to work on on a better way. And here is like one way that I I kind of wanted to show you guys uh a personal project of mine. So, I'm really into synthesizers and effects and u unfortunately they take up too much space. So, I have too many synthesizers that I want to get but

don't have I don't have enough space for them. And like the one cool thing about them is music technology. So, I I I'm much more interested in how they work and not how they sound. So I decided that maybe I could reverse engineer ARM binaries to find out the DSP and uh kind of understand how the signal works and uh what kind of algorithms they used. So this was like a small project of mine. Uh I created like a plan. Uh there are sub agents that download the firmware reverse engineer using CLI tools like Python code for example. It generates Python code. It uses XXD and uh yeah just basic reverse engineering applications. So don't even think like G

hydra or anything like that. So it it extracts the signal flow, what kind of algorithms they used, what were the magic parameters like filter coefficients and the parameter ranges, whether the parameter controls are linear, exponential or or any kind of like different mumbo jumbo that they use there and it documents it uh into a file that I can read. Uh it also generates a super collider code and the JavaScript and HTML and uh agents are lazy. So you basically have to verify that the implementation is accurate and truly fixes some of the inaccuracies that are in the code. And you have to repeat it two or three times to to get to the to the point where it doesn't really find

any more inaccuracies. So that's like a good confidence for me. Uh this is what one thing that I did. So clouds I reverse engineered a an open source firmware just to understand whether this works. It used like uh so how you updated it uh to your synthesizer is there was a vape file like an audio file and I had to basically uh turn that into an ARM firmware binary. Uh, another project that I did for for fun is I re reverse engineered a drum machine, a digital drum machine and turned it into a client side only web page. So that's uh uh it looks like this. Uh, sounds pretty cool. And if you automate even further, you

have like agentic frameworks where you can interact using like a framework like a visual framework like drag and drop uh like graphs or define them in code. So this is pretty useful if you have like a well structured workflow that you always want to execute the same way or you want to like formalize it because you are calling it a thousand times a day. So this is good for that. Uh this you can do the same uh rule rule generation just for example based on tickets that are in your Jira backlog. So for example, let's say that you find a high severity vulnerability and then you want to turn it into assess rule that you validate on

the fly. So uh if you allow developers to call the security review on their own third party components and basically have like a human in the loop that finally approves but everything before that is done by agents then that is also an example of what can be cool to do with agents. And uh one of the first projects that we did with AI was generating uh system risk and impact assessment for for the ISO standard. So these are all like cool use cases that uh that are something that you can create on your own and uh I basically wanted to show this is the SMG mobile security rule set. So I was super frustrated that there aren't

any good uh mobile security rule sets. So I decided to create my own just to see if it's possible and uh I basically parsed uh like Apple's and Google's best practices when it comes to security uh the OASP information uh available on mobile security and turn them into rules that you can execute on your codebase using open grap or or samrap. Uh the other thing that I did is uh it was it's kind of frustrating to to oversee uh what you have deployed on your machine. So this is the agent runners uh security dashboard that I did. So yeah, it kind of gives me an overview on the deployed services, deployed packages, uh resource consumption and

will allow me to kick off a new some new services, investigate their health. Um, I'm going to use it to deploy configuration methods to to ensure that uh I don't get hacked by supply chain attacks because I will be like checking every package before uh I I deploy them as uh as an update. So the agentic automation one thing that you need you should take home as a lessons learned is that you can fix a lot of uh problems using AI automation but you can just as easily create 10,000 more with it. So I would recommend you to use deterministic tools and only use uh like agents and AI when you truly need to because agents cost a lot of

money. Uh right now it this might not be so much money but once they start raising prices it's going to be stupid expensive and I've seen a lot of people just generating uh code and features just because they can. So the first question is you have to ask yourself is should I do this because you're wasting time on automating something that might not be needed and if you're automating something try to make it so that it's a synchronous uses batch inference and uh prompt caching. Uh so if you use batch inference that's already a 50% discount to your inference cost. So if you can make something go into a queue then only uh just use the batch inference when

it's needed then you're going to end up saving a lot of money. And what's cheap to do with agents is writing code tests automation and making decisions. So don't try to boil everything and do everything with agents because it's going to be like crazy expensive. You can put an agent into like certain choke points and decide okay that uh based on this input I recommend running static analysis I recommend dynamic analysis but even then like think about uh whether that can be a deterministic check instead of like an agent deciding that something should be done or not. So if if you have a business like a business critical problem that you cannot scale because you don't have

enough humans available then you can use it like uh just because it's expensive it doesn't mean that uh it's it should not be done like if there is a return on investment then yeah do it but uh it's it's crazy expensive for anything done at scale. So if you're it's a linear pricing, the more data you put in, the more expensive it's going to be. The more times you call it, the more expensive it's going to be. Uh so that kind of becomes a problem when you're like sending one terabyte or five terabytes of data. That's uh that's going to be like a $30,000 monthly expense for you. So try to use like small uh models that were created for a

specific p purpose instead of using uh entropic models everywhere and try to avoid like ambiguous or flaky tests that uh require an agent to try again over and over and over again. So we are using uh certain STA provider who has like really flaky uh checks. So maybe they are not the right uh provider or tool to integrate into an agentic loop that would end up failing by a lot of times and maybe goes down for three hours a day. And just as a final touch, uh some of the projects that I I did using AI is uh I wanted to convert my synthesizer patches from one format to another because uh like there are different

formats for the same thing. So I did a web page that does this and I also did like a video synthesizer that kind of executes shaders. You can rearrange shaders uh like uh set up modulations that type of stuff. So yeah, I just wanted to introduce uh some cool concepts that can be done using AI. And if there are any questions then I would take them. If you don't want to ask then you can find me after the talk as well. >> Absolutely. Okay. Thank you so much.

Prompts to Production: Building Effective Security Automation For Everyone

Related talks