BSides Buffalo 2026: From Chaos to Capability: Building Resilient AI Workflows

Name: BSides Buffalo 2026: From Chaos to Capability: Building Resilient AI Workflows
Uploaded: 2026-06-22
Duration: 59 min 37 s
Description: AI is remarkably good at producing things — blog posts, architecture diagrams, resumes, strategy documents, security proposals. Hand it a rough idea, and it will synthesize a draft faster than most teams can schedule a meeting. But that's where the honeymoon ends. After six months of intensive dail

BSides Buffalo59:379 viewsPublished 2026-06Watch on YouTube ↗

About this talk

AI is remarkably good at producing things — blog posts, architecture diagrams, resumes, strategy documents, security proposals. Hand it a rough idea, and it will synthesize a draft faster than most teams can schedule a meeting. But that's where the honeymoon ends. After six months of intensive daily use building real deliverables — thought leadership papers, consulting proposals, career materials, and a personal AI agent security stack — I've learned that AI without workflow discipline is just a very fast way to produce unreliable output. It forgets context mid-project. It drifts from your intent. It confidently delivers wrong answers. It has no concept of version control, traceability, or iteration management. And it will happily lose everything you've built together if you don't have a system for capturing and organizing its output. None of this is new. These are the same challenges every technology faces when it moves from experimentation to production — and they have the same solutions. Process, governance, clear direction, and operational discipline. Drawing on decades of building technology programs for organizations ranging from startups to Fortune 500 enterprises, I applied the same programmatic thinking to my personal and small-business AI use that I would to an enterprise capability rollout — and discovered that one person with the right workflow can achieve what used to require teams and significant budgets. In this session, I'll share the practical framework I built, the mistakes I made getting there, and the security and governance considerations that most AI users never think about until something goes wrong. Draft Session Outline: The session is structured in five parts across 50 minutes, blending presentation with a light live demonstration of the actual working environment. The Promise and the Reality (10 min) — What AI actually excels at (atomic production of structured content from rough ideas) and where it predictably breaks (context loss, hallucination, scope drift, output management). Real examples from six months of daily production use, not theoretical scenarios. The Framework: Enterprise Discipline at Personal Scale (15 min) — Applying programmatic technology management principles to AI workflow. Prompt engineering as requirements engineering. Iteration management through version control and structured repositories. The human-in-the-loop imperative — review cycles, quality gates, and knowing when to trust vs. verify. A walkthrough from raw idea through structured input, AI draft, human review, to final deliverable. Live Workflow Walkthrough (10 min) — A light demonstration of the actual toolchain: git-based document management, markdown-first authoring, and AI integration points. A real deliverable moving through the workflow, showing how raw AI output becomes reliable, versioned, traceable content. What "governance" actually looks like at personal scale — not enterprise bureaucracy, but lightweight discipline that prevents chaos. Security, Governance, and Guardrails (10 min) — The security practitioner's perspective on responsible AI use. What a defensible personal AI security stack looks like. Governance without bureaucracy — lightweight policies that protect your data without killing productivity. The threat model most people ignore: your AI conversations are a data source. The emerging regulatory landscape and why building good habits now matters. Takeaways and Q&A (5 min) — Key principles attendees can apply immediately, starting resources for building their own workflows, and open discussion. What attendees will walk away with: The ability to identify AI's predictable failure modes — context loss, hallucination, scope drift, and output management gaps — and understand why these are workflow problems, not AI problems A lightweight, practical framework for managing AI-produced content through structured iteration, version control, and human review without requiring enterprise tools or budgets An understanding of personal AI security posture — data exposure risks, prompt injection awareness, and basic governance practices for responsible use A starting point for building their own AI workflow using accessible tools (git, markdown, structured prompts) that provides traceability, versioning, and repeatable quality This talk is for anyone using or considering AI tools — whether you're a student experimenting for the first time, a security practitioner evaluating AI risk, or a professional trying to get reliable, repeatable value from AI in your daily work. You don't need an enterprise budget. You need enterprise thinking.

Show transcript [en]

Okay, I guess we're gonna get rolling. Uh, good afternoon everyone. How's it going? >> Yeah, >> it looks like a really great conference today. I'm excited to be here. Um, I want to thank Bides for inviting me and also I'm glad you guys all made it. Thanks for joining me. I appreciate that. Um, uh, because we're a bside, I really like to make this kind of conversational as opposed to, uh, just me babbling the whole time. I thought it'd be nice if it was a little interactive. Um, I really like to focus on business value, impact, and outcomes. Um, as opposed to here, we're not going to really focus on a lot of cyber risk, but more on practice and

executional risk for AI and how to use it and how to think about it really practically and pragmatically. So I got into tech really to support people to help others. Um and along that journey it was quite a while ago about three decades ago I found myself really helping organizations help their users right and that kind of pulled me away from what I was doing. It was a great journey. Um but tech was so complicated that I realized all the things that I like to do I really did them I had to do them for big organizations. Um and uh it was a great journey and I love that stuff too as well. But I've also really

been focusing on much more of the practical and pragmatic thoughts around AI um and the risks related to business as opposed to all the cyber risks although there are many of those as well and I have been of course digging into those. Um one thing I wanted to clarify was that uh on the bio it says that I'm part of Mnt. Today I am not. I left there in August. Um I'm currently at a little L my own LLC just doing some advisory and consulting and some blogging and writing and research. Um and uh I have been at eight startups. I started this journey. My first startup was back in 95. I ended up in Silicon

Valley and I ended up at an organization. We built the world's first SSL toolkit. Um, and it was after I had been at the state of Wisconsin and we brought I had been at the university and I helped the the state of Wisconsin get to the internet. Um, which was pretty cool. It was a place where uh when I went to that that uh that first startup that we did the SSL tool kit, it was an e-commerce startup and all my friends thought I was crazy. They were like, "You're you think we're going to use money on our computers? What does that mean?" And I was like, "We'll see. I'm not going to like make predictions, but

I have a feeling that this will have a much bigger impact on society than we anticipate as it has. And so, um, I ended up getting into security because the fir one of the first largest the the Department of Health and Social Services in Wisconsin that I went to work for, they wanted to bring up all of their stuff on the internet. And I was like, this is Medicaid and Medicare and SSI stuff, right? We were the largest state agency. We handed out like4 billion dollars in checks. And I was like, how are we going to put ourselves on the internet? What does that mean? So I started back then thinking about what does it mean? Um, but really I think of

myself as a technologist trying to help technologies, you know, move operationally sustainable and maintain risk, some notion of risk management, right? The reality is is everyone's going to face a breach or a compromise or some kind of impact. The question isn't really whether we stop them. The question is is whether we can identify them really quickly and recover really well. Um because the reality is is especially as AI is brought to us, you know, we can't control really the actors, the the malicious actors out there. You really have to just be able to respond and recover. And so um uh and I've worked in a variety of industries. I've done both. I've done consulting. I've worked at a lot of

vendors, as I said, eight startups. And I've also been a full-time employee at a variety of places. So I like to kind of bring those together because I think they all have different impact and vantage points. And so I try and like bring those experiences and skills in a way that are practical, right? Um vendors are really important but they really push product, right? Consultants also have a good third perspective on what's going on. They're not embedded, right? They can see outside but they also have their own um objectives and goals. And then organizations, you know, are constantly struggling to figure out what is a path to um actually leverage technology effectively, sustainably and maintain our risk, right? Um, and so,

um, I've been digging in a lot when I got to when I started exploring AL about a year ago. Um, I was really digging into how can it help me, how can it help my mom, how can it help my friends. Um, I mean, I had the cyber lens on cuz that's the world I've lived in for so long. But the realities was is I didn't want to like support large companies all the time and just securing theirs. And I wanted to figure out how this technology could really help advance the people I know, small businesses, the individuals, the students. How can we leverage it in a way that's um that has other value besides just you know building out

operations sustainably? And um it's been quite a securious journey. Um it is interesting. It uh my journey started so I really started around doing a whole bunch of blogging. Um it started with some like resume career stuff I was exploring and then it started also the other place I started was three paths was the you know um how do we secure agents because I really think we have the capacity to secure agents today with the compute power we have in a much different way than we have in the past. Um one example was I was able to build out a full stack enterprise scale environment just on a laptop which was quite shocking to me at the time. And I

was like, "Wow, okay. So, we can do this at enterprise level scale with enterprise controls, but we can do it in really small frames." And I realized that small businesses could really benefit from this. And medium-sized businesses, right? They're always struggling with technology. And technologists, they can't usually afford the best technologists in the world. Um, technologists often go place to place. And and I was realizing though that I really wanted to figure out how you could make how smaller businesses can make it really impactful cuz it's a really big leap forward, right, for what they can do. It does incredible things. Um but along that security journey, you know, um I realized um my the you know,

I've been in in front of a lot of technology, but I've always been at the point of the stick. And my realization with AI was that it just requires all the same things we've been doing for any technology I deploy, right? Whether you know back in the day it was new firewalls or intrusion detection or SQL servers, data storage and data capacity, right? All of a sudden I realized that AI still had all the same problems. While like the market hypes it as a solution, it's really attack and it's attack that technologists or technologist aware people really have to think about how to deploy securely and safely for their user community, right, for their organization, for themselves.

And I was like, oh, okay. So not only can we build sort of um really like enterprise level um processes and practices and controls around it um but we also really have to think about how the user is going to function with this and use it really effectively and how do we enable them to you know a small business needs to focus on their business model right not on their tech stack and I was like how and AI really enables you to do a lot of things and I was like okay this is like perfect like I've been kind of waiting for this for decades Right? Like as development tools came in and really advanced technology

and development life cycles and software development that was wonderful but the reality is the small business owners still couldn't use that right and then there was cloud that was wonderful too they didn't have to support their own infrastructure but they still had to maintain it all they still had to build all the apps and so I realized that like this could give an opportunity for smaller businesses to jump into much deeper tech but that it still has a lot of failings and a lot of challenges. So, I've been sort of focusing on the practical and pragmatic, right? How does a small business or how does my mom use it in a way that I'm not worried that,

you know, she's going to give away all her credentials and all her all her money and how can a small business start to use it without having deep technologists because it lets you do an awful lot of things. Um, and so um so as I was using AI, I was realizing all of this, right? I was running to all these failure modes for me and that seemed a little extreme. But what was happening was is I was believing the hype. I was like, "AI does it. It does brilliant things. It's really incredible." And um you get caught up in that hype in that allowing it to do things and then you realize, wow, um it's going to do things

the way it likes. So this this conversation is really about that. Um I did want to open it up to a couple questions and that is around is is everyone in here using AI? Are we using it like as um a tool to do your job? Are you using it as an infrastructure to build? >> Yes. >> As well all of these things. When you guys say you're using AI, does that mean you're using like cloud and chatgpt to produce papers, road maps? >> Strategy docs, things like that, >> co-pilot. >> What was that? >> Get co-pilot. >> Get co-pilot. Um like for your user base to be able to use >> actually just our security team doing

it right? >> And security. Oh, just the security team is doing it. Okay, that's really interesting. um use cases for each one. >> Yeah. Use cases for each of those audiences. >> Yeah, they're important. Um okay. And the other piece I was curious is is have people here built like organizations or teams or products and solutions? >> Because um when I've been thinking about this, one of the things I realized about AI, it was like, you know, and people hear this is these agents are your team, right? these clients are your team and it has all the same problems, right? And in fact, it even has more because uh and one of the things I ran into is is that

for example, you know, I would ask it to do something and it would go off and do something, then I would ask it to change that thing and I could change it. But the reality is is I needed it to be able to live outside of that session. And so if you what I realized like a human like all the people we're traditionally used to working with and managing it's like us it like loses context right it doesn't understand all the things that have already been done it it and so even though when you talk to an AI it seems to have all this context it's context for it within that session a that session gets compressed you lose context

B when you try and then go ask someone else about it or a different agent about it it it has no context and The funny part is is, you know, it was doing everything that I would run into with my teams. Hey, go build me a PowerPoint presentation and they'd go off and build it. But if I didn't give them the right instructions, right? They would build whatever they wanted. And so AI is exactly the same way. It's like, how do you give it the context? How do you have that partnership, collaboration, and relationship with it that I thought was really funny. I was like, wow. Okay. So, I'm now working with a really intelligent machine that responds to me

and talks to me like a person. But the reality is is they also have all the same limitations. They get overly excited or they're um particularly um you know they have they have both optimism but they also think of failure modes, right? And they also change their paths randomly. And so you know there was like really a lot of excitement but then I started running into a bunch of issues. So I'm going to talk about some of the issues that I ran into as I was walking through this um as a practical person just a user and then how I started applying sort of my like my skills and experience to think about this differently um on SCA at scale. So

let's see from chaos. So sort of like the idea of like I think the AI um landscape is total chaos, right? Like um people are build AIs are building AI solutions for people to buy and deploy and you know and and it's really smart but it tells you really a lot of great things and then you realize it doesn't actually know what it's talking about. Right? So this is a little about that journey. So the promise and reality you know the way I think about AI is really for me it's an intelligent threat. Okay, AI is incredibly good at weaving together disparit systems, disparit data, disparit processes, and sort of tying them all together. But again, it's

not the system, right? It is also incredibly well at synthesizing concepts and data across boundaries and patterns, right? It can see whether it's patterns in content, whether it's patterns in systems. It's it's brilliant at this, right? like how it can explore a blog or how it can explore a system and tell you what's going on is fabulous and incredible. And then it can take actions at light speed, right? Like much faster than any of us can function, right? It can write a blog and a um or a paper or a strategy within like literally minutes. And I have to admit like you get taken back by like how good they are. They're incredibly well if you're

if you're writing the right prompts. So that's how I think about it. It's an intelligent thread. That's how it should be used to weave together our systems whether a large enterprise has multiple identity and access management systems and they need to weave you know um they need to weave together uh let's say um Entra on your um on your cloud instance right and um you know uh your active directory infrastructure if they're out of sync it does a great job of being able to show you a unified view into what's going on you can see all the roles and groups all associated all the identities that are missing from systems, right? So, it really weaves

these together and then it can take action. You can tell it to go change a bunch of things and if it has access, it will figure out how to do that, right? It'll check, it'll test on the API, it'll test on a login, it'll test direct network access. It'll do it really quickly and rapidly and that's an incredible thing. In the same respect, um, so the promise is really good, right? Summarizes large bodies of data. It identifies massive patterns. Allows you to write blogs and architectural diagrams and resumes. It's really good at retrieving data if you're deploying at the right data sources, whether it's the internet, whether it's the data store, whether it's your local file

system, whatever it is. Um, and then it also, like I said, it executes multiple multi-step tasks and it can figure them out on its own or it can follow a user and then learn them all and then do it, you know, execute very well. Now, what I realized is that then the honeymoon ends. It gives you so much access and so much information, right? um that once you start to when I think of substantial once you start to build something substantial that really requires evolution and that really requires the use of others you start to realize you own the context and even though the AI can tell you like everything you think about it that

doesn't mean anyone else can right and that doesn't mean you actually understand why it's even telling you that you know one of the things you know I I realized that AI feels so human to me right like it you know when it's excited it'll tell me something is working when it's not, right? Because it's just enthusiastic about that, right? Um when it is um when it's not doing something well, um it'll just try to solve the problem in the background. I won't have any idea what it's doing, right? And then it'll come back. And so what I realized is is that what happened was is I started to launch. So I just started with some vlogs. That was pretty

easy. Individual single streams are pretty good as long as you take some time to review and edit and go through those steps. But once I had like three blogs running in concurrently, there were a couple problems. I couldn't track everything it was doing effectively across all three to five blogs. I couldn't track all of its changes effectively. It was throwing a ton of data at me. And all of a sudden, I realized this is the same problem we've always had with tech once you launch it, right? This is not Now I have to figure out what's the state of every single one of those blogs. Um, it would tell me they were in the right state when they

weren't and things like that. So I realized that like the evolution in the sustainability and the operational management of AI was the exact same problem we always had. Just like when word came, right? When everyone started to use word processors, right? All of a sudden they were writing 10 docs and they lost track of which one they were at, right? There was so many things flowing. Um and so to use AI for like a discrete task works incredibly well, right? One blog, one action, you can track it, you can follow it. I really found the problems happen once I got into multiple ones. So I ran into a few failures, right? Um context loss. Um I

was convinced like the AI I would tell the AI for example to do version control and sure enough, you know, do version control for a little while and then it would start to stop which is randomly not. So the files that it was writing were no more version control and I would have multiple version ones or whatever and I couldn't figure out where all that was. Right. Um, it also when you go across sessions like we were talking, it doesn't remember. It needs that context. And if either your session window runs out, which it does all the time. Um, or for some other reason you've built something that it needs to reload your clawed MD MD or something, you're going

to have to open up a new session and that new session doesn't know anything. So now, not only did I have to figure out how to track all the versions that it was getting, I had to realize that I had to be able to provide the same context for the next session, right? It didn't know what I was doing. So, it loses context. And like the the silent contradictions, it does this all the time, right? Like like I said, I we will agree, me and my my friend the AI will agree on versioning and next thing you know, it'll just drop it. Okay? Right. So, one is sort of this notion of context log um context loss where it's

going to forget what you're doing, where you're at. Uh the other one is like hallucinations, you know, confidence without accuracy. It tells you all sorts of things like it incited me frameworks that didn't exist, right? It gave quotes that didn't didn't exist. Um it invents statistics that says something is X, Y, or Z accurate, and that was failure. It wasn't. Um and uh and it doesn't know how to say I'm not sure, right? It really doesn't question itself. Um one of my friends, he's he's a lawyer at a really really big law firm and they had a s uh they were doing a brief and it wasn't until like you know 3 days before the brief was due for court that they

realized uh the AI had changed the core quote for the contractual language they were going into court for. And so everything was based on the wrong um part of the contract language and it misinterpreted it and people didn't find it until like three days before and they were just scrambling. So, you know, so there's hallucinations and there's this this notion of confidence that it has. Um and then there's drifts and tangents. It it does whatever it wants. Uh one day I just asked it to change the tone of, you know, one one paragraph and sure enough it changed the whole blog, every single thing on it. It changed the tone for the whole blog and it turned

everything into single sentence statements. Right? That wasn't so bad. I got to fix it. I forgot about it the next time and I went two or three revisions down just focusing on one paragraph and I realized that it had changed to three revisions before. And how do you get back to that? Right? Because your session if it's writing files, chances are once it writes the next file, it's not going to have the old files, right? Depending how you're doing that. Um are these things that are consistent what people see? people understand and recognize and it's running and that our users are out there doing crazy things that are really cool and then sure enough we hit along one of

the things um a friend of mine is at open AI and when they launched sort of their native AI first like they required everyone in the company about 3 or four months ago to do just AI first not to write it yourself um what they found was a really interesting thing is that their nonteS were producing much more um outputs and skills because they didn't come into it with all these frames that I worried about all this risk. So they started producing incredible things and uh the devs were much further behind in terms of their velocity of production which I thought was interesting and that was a real hard balance for them. But this is what it does right is it it um

it enables people without you know deep tech experience and knowledge to do an awful lot of things but it has all of these failure modes. So this isn't theoretical. One of the things that happened was uh for my original deck um it decided that I was the executive director of this conference. Um even though I changed it three times and changed it back three times. Um and that was funny, right? There's things like, you know, prompt injection. Um they sold a Tahoe for a buck and they had 20 million views and they had a backend process that was wrong. Air Canada had a refund policy that an AI chat built and um it wasn't their actual one and uh they had to go

to court. They tried to claim that um it wasn't their responsibility, it was the AIS. Um court corrected that. Um but I did think that was quite entertaining for them to claim that. Um and you know Starbucks, their inventory in the last several weeks or month or so went haywire and they had to pull their AI system. Um I thought this one was an interesting one. New York City built one that told businesses they could pocket tips and discriminate. So, a bunch of them went out and started to do that and then they realized the law was different. Um, I don't know if everyone has recently heard of Sullivan and Cromwell. There was a um there was a

law firm that submitted a brief to a court, a really massive one, and it had all sorts of AI errors and hallucinations on um their uh the data they presented to court. uh it's now brought in like now they're re-evaluating like 1,200 cases because of this. Um, and then I mean I don't know if everyone's heard, but obviously there was a teen suicide, right? And um, the state of Florida decided to take on Open AI again. Um, the team wasn't given any wrong information, but they were encouraged to do things that they shouldn't have. And so, you know, this is where AI and I'll get to see what's next. So, I might jump around a little.

Um, so so a so so these things are real. They're happening live. They're happening all the time, right? The data suggests that about 95% of the AI efforts out there in enterprises are failing to meet their expectations. Um, and only about 5% are successful. Um, and this is because, you know, again, the tool is so powerful. We get caught up in what the possibilities are, but we kind of forget how to get there. And really, I just think of it as like, you know, at some level I was hoping to toss away some of my like experience and be able to start fresh. And I realized that actually all that experience is just as required for AI, right? It's no

different than when you're deploying any technology, whether it's a cyber technology or user case technology. Um so so what I think about it is is, you know, it does a lot of things and it'll write content for you. But like I said, I produced a bunch of blogs and I didn't realize until three later that it had changed a bunch of things that I didn't expect it to do. And when I was running directly on au AI desktop um I realized I had no reversability. Nothing was being saved. Nothing was being um maintained and that I couldn't actually reverse back. So while there's really positive consequences, you got to remember that not all those consequences

are what you intend and you need a way of dropping back and being reversible, right? You need a way to go back to old versions. And so this is again where I realized all of these governance practices and some of the bureaucracy we put in place have a lot of value. And I think like this is important like you got to remember that and our users have to remember that AI was built to really be um user positive right they affirm everything you ask right they are very pleasing they move fast they always want to finish everything like it'll tell me hey there's 10 problems and then all a sudden it'll say but let's just move

forward you don't have to worry about any of those right and I'm like okay that's right and then you know for it it thinks of success like it would always tell me this is your final my own and you know but it it was like that's what I wrote isn't it done and so we really have to focus on pausing to verify right we need to to we need to do this on our own we needed to catch catch its mistakes we need to have it catch its mistakes which you know actually works in functions it's one of the things I ended up figuring out that I'll talk about um you have to kind of force

it to disagree with you have to tell it to have a really criminal yes please ask is there is there an AI model out there that is inherently not nice. >> Not that I'm aware of. What people are now posting all over I've been seeing is is you know prompts that you put in your master prompt. The fact that you tell it to challenge you to assume things are wrong to focus and that's about the only way that's effective that I've seen. >> So one of the things that I do is I I you know specifically say don't be single, right? Don't be don't be nice. Give me real things and all that stuff. And also talking about um that's one of

the things that's different about uh uh Opus 4.8 is that it includes a sub agent that is the doubting agent. >> Yeah. On that. >> Yeah. >> Have you tried it yet? >> I have not tried it but um it's very intriguing. >> Yeah. Um because it just wants to please us all and that's what was designed, right? Because it was really designed initially as a consumer product, right? They really wanted consumers to latch on to this and not stop using it. And so it follows all of our standard business and marketing paradigms, right? Come in and buy stuff and uh it'll look like it's real. It'll look like it lasts. And you know, sure enough, it falls apart in two

weeks, two months, two years. Um let's see. Oh, and here like the idea like it also becomes its own reviewer, right? It not only writes things or builds things, it tells you what it thinks about them. Um, one of the things I'll talk about a little is is what I did realize to this question is that I now have um, and I'll show it to you. I have the AI use other agents to review it. It can use a new session within that agent or it can use another one. So I built a system that forces a review cycle multiple times and it uses mine uses codecs, it can use Gemini and it can use

claude, right? And I'm running natively in claude generally speaking. So claude kicks off all of those subprocesses for me. And anything I build it goes through a multiple review process with other agents and other sessions. It also does a self-review process. Interestingly, we found that um you know when you run a review in a session, it has context. That's the problem and the benefit, right? It can dig deep in places that other people other agents or other people don't understand. But it also is blind to everything it's including as part of that. Just like the human analogy, right? We always forget, you know, it's like playing telephone with people, right? You whisper something in its ear and you know, by the time it

comes out on the other end, it's totally different. And um so I did find that you can a you can sort of tell them to be critical. But the real one the real benefit I found is when I have agents that don't have any context review the code or review the blog, right? You can give it you have to give it some leading context like what is your intent? What is the outcome you want? And then you can ask them to map it and that works really well. Um, uh, a guy I know called he built something called the echo chamber. And that's how I think about it, right? Um, you do have to be careful

if you have it go endless loops on review. It can be, uh, it can go like it can go from like building a hello world program to like all of a sudden trying to design, you know, the Wizard of Oz by mistake because if it gets caught on the wrong thread, you never know what each of them are going to say. But it does a really good job of that confluence of bringing together multiple AI sessions to evaluate the work they're doing. I found it to be the most valuable. It finds things all the time. Um, one of the things I did want to do real quick, so I um I wanted to for our demo

um I'm gonna show and we'll see how it goes. Um, I'm in a new session here and I'm going to let this run, but I thought I'd show you. So, I have this idea of um let's see. So, I built in some commands that are called like let's get started and this system on mine cuz so what I wanted to do was build a cse pipeline that worked right and I didn't want to manage the pipeline. I'm not a developer. I didn't want to get into all the details of git. Turns out it knows it pretty well as long as it runs it properly. And so um I built um what I call governance core and it is basically a CI/CD

pipeline for running through things and doing reviews and then spitting out outcomes and doing checks and tests. Um we're just going to start that up because I'm curious how well this is going to work and then we'll come back to it. This is running on a home server um back at my house just over a VPN. I have not published the MCT server yet. Um so okay so AI wants to please you it's a massive problem we are learning about that some of the vendors are starting to put in controls that can help you deal with that right people are becoming smarter about it's just like telling your partner your collaborator your teammate how to do something you need to give as

much context as possible and you have to assume multiple revision cycles that's just the reality even though the end result that they get on the first pass looks incredible um so here just to reframe this is how I think about it. So all of the problems that we're running into AI in my eyes are ones we run into any tech we deploy. Okay, it just happens to seem smarter, more advanced, quicker, faster, and more robust. But the reality is it has all the same things, right? You lose work, you have no history, you're confident in what's out, what the outcome is, but you're wrong. Um you you don't get a second set of eyes, you keep using the same

session, you forget it has context, whatever the case may be. Um things are shipped without review and no accountability or traceability. And so these things apply both in engineering, right? Whether you're building code, building systems, building applications, and they also apply in an editori editorial and sort of a business context, right? Same kind of thing. You have to think about it, right? Whenever you're writing multiple copies of your uh your project plan, right? You want to have it reviewed. You want to do fast fact checking to make sure all the systems are available and useful, right? You want to do a review and get a peer review so people see other things. Um, do you want to do sign off and approval

and you want to maintain records and governance, right? And I think that's critical. Um, for me, what I ended up building out was everything's stored in git, right? And that allows full traceability. Um, anytime it makes a change, it has the diff, so I can go back and forward. I can reset to a whole new branch. I can reset everything I need to by storing it all in git. Uh, again, this is one of the problems I think about like how would I enable a small business to do this, right? Um, but the reality is is that I think that there are versioning and revisioning. People know if you're not a technical organization or an engineer, I think

it's important just to use all the same practices you've always used when you save files, right? Create directory structures, do your own revisioning. You got to remember those things because those are what keep it all together. So, we have all these practices. Any business that runs does this in multiple places all over the place and we just have to leverage them for AI even though it deceptively seems to understand everything and doesn't need it. I think about like this is the same with all technology right um AI for me even though many people are looking at it as a holistic solution again I look at it as more instrumentation and tooling it's like the loom augmented building

textiles right it's like um typewriters helped replace you know handwriting right computers then took it to the next level smartphones allowed us to use multiple types of devices in all sorts of places and AI is just another one of them right? It augments. You got to think about it as we're not building out AI. I always think of we're building out AI um augmentation for anything. Whether it's identity and access management, whether it's publishing of blogs, like you really got to think about it as the thread. Again, it's not the brain. It has the ability to provide context and reasoning, but you still have to be the brain, right? Because AI is the augmentation. Um I put this up before but I wanted to

say that you know so with those learnings with this journey I took on the right you see it still needs oversight still needs intention and direction it needs guidance and guard rails right it's a human and here's one of the things a lot of my friends and people I know in the community are really worried about AI particularly technologists to be honest more so than I've been claiming and what I think about that is is that we just have to remember we're the pilots we're the pilots with everything and we have to maintain that and It's hard because the AI will debate you. It'll tell you what it thinks you should do and we it's easy

for us to let it just do whatever it wants. But the reality is is it is still it requires intelligence and reasoning at the human level that it doesn't understand. And I think let's see does the next slide get to that? Oh, I was going to share like as you can see here a friend of mine he's a founder of you know a very wellunded startup and they're they're spending $2,000 a developer a month on claw for example. But the one thing he's still doing is is every single line of code is reviewed. So his developers are no longer writing all the code, but they are doing comprehensive reviews all the time. Right? And he said he said he used the

wording everyone has been directed to review every single line of code that exists in any fashion. And so you know it doesn't get rid of us. It doesn't get rid of humans. It's again it's an augmentation tool that can help us accelerate what we're doing, help us do it better, faster, more rapidly, but it has all the same problems as your friend is going to have if you ask them to do it right. We have to maintain that. Um let's see. Uh the other thing that was really interesting, I got a chance to see Sting um at a really small venue and someone asked him about AI because of course it was an AI company sponsoring it and um

it was really interesting. He's like, "Look, it writes great elevator music, but it's not a human. It never fell in love. It never fell out of love. It doesn't know how to maintain relationships, right? It doesn't understand the stress and risk you're going through when you're dealing with something, right? Everything it's doing is based on other people's experience and it doesn't have firsthand experience." And I thought that was a really great way to think about it, right? It's a comp it's a composite of interpretations. Um, but also what that means is, you know, you we all know the bell curve, right? Um, an AI will go out there and source 500 sources. It'll create a bell curve, tell you which

one's the most popular, but the reality is is the most popular one very likely has lots of challenges and problems, right? It is not necessarily the best approach just because most people use it. And an AI can't understand and I trust Aaron here, you know, maybe over Rob. Sorry, Rob. I couldn't resist. Um, and and that's really important. It's just going to hand you here's the 10 artifacts I found. But it doesn't have any way of giving you any actual reasoning that, you know, Aaron's a cloud, you know, security engineer and Rob is more of a manager and leader and they come to this the table with different, you know, perspectives on how that hello world was built. And the

reality is is you might want to trust the developer in that case then just trust it. But it doesn't give you that. So I think this is important. It was a great way when I heard Sting say this how to think about it like right it doesn't have all of these um you know interracial interreational experiences that we've had both with ourselves and those around us to understand how to like evolve through things right it has examples but it has not done it so the way I think about it is is you know it's not whether AI is a good tool or not it's really like are our processes and our people you know guided

enough to use it really well right? Because it is it is a brilliant technology. Um I also think it's incredibly destructive and can be used to weaponize against all sorts of things like communities and staff. So I think we have that's all another reason why I think we have to be really conscientious about like taking control of what it does. So I think most people or many people are using AI as you just ask it a question you get an answer you're done you know and in this case I think instead of thinking about it as a tool when we build it for our customers or our users or we help our our kids use it

or the people we know we have to help them understand that it's a capability that still requires like a practice right the human needs to be there to judge it right you need a process around how you're going to handle the output um you need a process this around how you're going to leverage AI right in these contexts you need to be able to review you need to be able to version your output and you need to have an iterative cycle for how this is going to happen because it is not going to be right it is going to get it wrong you're going to get it wrong and so you really got to lean into it as a capability

process and workflow so I think you know this is just standard stuff that we do probably you know all the time we do it in our house when we're managing our bills right we have to tell ourselves elves and those around us, you know, what they're supposed to be be doing. And you want to give like requirements as deep as possible, right? Like you want it in purple or, you know, you want a slide deck or you want uh a document, whatever it is, you really have to give it as much instruction as possible and you have to expect to iterate through that, right? And then for your production side of the house, you need to do version

control, iteration management. You got to have your reviews and then you have to have your governance flows to make sure that these things are followed as effectively as possible. no matter what you're doing, whether you're having it buy you an airplane ticket or whether you're having it, you know, provide a really important capability for all your users. Um, so you know, we need to provide, you know, simple things, lots of direction, lots of instruction. Um, it's not just clever tricks, right? You got to remember who you're talking to and what you're doing. Context management, you need to really like maintain two things, both the outputs and the inputs. So, for example, you know, when I was doing all of those

um resumes, I would ask it to craft me a new resume for a new job description, but I would have to remind it all the time of what I've done or when I did it or how I did it. And finally, I just started creating like I handed it all my resumes, all my job descriptions, right? And so, whenever I ran a said, "Hey, let's let's try and let's look at this new job description." I would say, "First go analyze all of my other job descriptions that I've worked in and all of my other resumes." and then that would give it context. And I didn't like doing it every time I tried to start a

new resume. So I started storing them, pointing my AI at it. So it had all of that background and allowed me to not every time reiterate all the things that I wanted it to share about me in a resume, right? Um, and you can do that. You can both do that in their systems prompts. You can also provide much more data, right? You can provide an entire directory structure of résumés and say, "Here's my last 24 résumés. >> This is Ben Pod's paper. Have you read? >> No. >> No. He's a officer. Very similar. He he did it in in uh >> where is he? Gway or Dublin a researcher? His PhD paper. >> Okay. I'd love to see it. Yeah. Yeah,

that'd be great. Thanks. Um yeah, so you got to provide context management of some sort both for the direct thing you're doing and any other context around it. Do you need any other interdependencies policies practices whatever it is. um do some kind of version control. Like I said, I'm using git. Stores everything. It tracks the diffs. It tells me all the changes that happened to it. Blah blah. It works really well. Again, I think we as technologists, for those of you here, we have to figure out how to help our families do this and our friends and the small businesses and our communities because um they don't want to work on the tech. They're just trying to get a

job or support their business. And I think that's we have a lot of opportunity to help them leverage something without having to you know become an enterprise. Um so I use my repository as a workflow engine. It's a component of it which stores all the templates. It has output directories right? It has my specs my workflows and so uh it knows that sometimes it changes them randomly in ways I don't like. Also, um, again, we said this, review, review, review. You know, um, remember, it's going to it's going to sound really accurate. It's going to sound very definitive that it's factual what it's telling you. Um, and I've personally found that it messes that up at least

40% of the time. Doesn't matter if I'm writing code or having it write code or if I'm writing a resume. Again, like I said, um on this presentation, it kept saying that I'm I'm working at MTN, even though it is seen repeatedly 100 times that I'm no longer there. And we we've talked about that. It doesn't seem it doesn't seem to hold that information. Um you got to calibrate the review. Obviously, you know, if you're just doing brainstorming, you can accept that it gets some statistics wrong. Who cares, right? But if you're actually producing like a security architecture for your organization, you know, that's where like with my friend the founder, you have to go through every line,

right? Because it might assume a control is owned by someone else. It doesn't have to touch it, right? Or worse yet, it might assume that it has to touch the control that's not its, you know? So really, I think you have to think about it in terms of different levels of output require different levels of review. They just all require it, right? Right. And as you really get to public facing or business impacting or finances, I think you have to take it much more seriously. But also I will say that um I don't think we need as heavy bureaucracies, you know, like and that is really one of the nice things. So if we do this right,

we don't need hundreds and hundreds of policy documents, right? we can have them written out um for an AI and you know some notion of a m machine language or natural language processing that can accommodate them much better than having to hand out a 300page policy document right um we just have to really be cognizant that it could fail effectively um let's see scaling um this was the interesting thing like I shared earlier a single blog post it was no problem it was great it just felt iterative I was working with my buddy the AI. Um once I tried to do uh multimonth large scale um large scale efforts, it really started to fall apart, especially if I would, for

example, let it just run by itself. Oh, real quick before we go, I'm going to go check that screen. So, um you guys saw I said, "Let's get started." It tells me all of these things in my Rena. Um it tells me where I'm at. It looks at a checkpoint file to see if where I'm at. It checks on all my branches, any active locks because again I realized it would crush it. I would run two concurrent sessions off of the same repo and sure enough it would they'd be changing the same files that they were all working on, right? And so we had to put in active locks and these are all things developers and engineers

really understand. Again, I really think about how do we apply this to the business environment because they're not as dorky. Um, so here you can see it ran a bunch of checks. It tested validation. It looked at my road map which there's none here. And then it identified um that I have a building flashing rainbow um task that I wanted to do. So it looked through all my tasks and said this is what you want to do. We're going to try and do this here. Um and we'll come back to it. I wrote a you know a function that's called auto implement autonomous implement. Um let's see I think and it's not giving me Oh yeah, here it

is. Uh, no it's not. So, real quick, we'll see if this works. Um, so I'm telling it to run this. I'm going to tell it to run it tight so it doesn't like go out of bounds. That was another thing like I would ask it to look at one thing and it would look at everything. Okay. Um, you you ask it to look at one line of code and it would evaluate all the code and come back with things. So, I had to tell it, you know, how to do that. Um, and we're going to tell it, uh, there's one other thing I need to tell it. Um, minimum, so it what it does today.

It um, it'll run it through review cycles until everything is fixed, right? So, it'll do a review of a piece of code or a blog. It'll then have all of these findings based on that and then autonomous autonomous implement is told to fix any P1 or P2 and continuously do this. I had to give it a minimum and maximum cycles because like I said it could run forever. We'll see how this runs and we'll come back to it. Nope.

So, uh yeah. So managing lots of things at once is where it really started to blow. It's just like managing multiple users. One, you know, small pilot grouping you using AI is much different than if you roll it out to the entire enterprise. So I built this thing and this thing um it does planning with me like I'll tell it to I want to build a thing. It'll help me do planning around it. It'll open up an is it opens up an issue does some safety checks. It's told to automatically launch things in new branches so it doesn't crush my main and things like that. Again, this could be with the blogs or with the code I'm

running. Um, then you tell it to go build. That's what we just told my other agents to do is to go autonomously build all of this. Um, and then, um, it has a verification. Like I said, it runs it through a multi-A review verification set. You get as many in there as you want. Um, it then takes all that output. It dispositions it. It decides what to do depending on which stage it's at, and then it comes back. Then I can have it chip it, but I put in an automatic thing. it won't it can't merge without my approval because it merged some crazy things that I didn't want it to do and so I learned the hard way right that I

have to at least watch watch every merge if everything's in a branch or working tree that's easy it won't crush everything else and then it has a notion of close once something's implemented and deployed goes out and closes all the tickets it writes any comments and notes the other thing it does is it finds friction points and lessons learned throughout the process so if it notices it keeps using said wrong for some reason and then it figures out how to use said properly and it keeps getting pushed into my friction log. But then I trained it that um I designed it so that every set of cycles goes back and looks at all the friction log and it's a backlog and then

we just go and fix all those things. So it's kind of self-learning a little bit. Um >> yeah, so this is what I built. Um and so the idea is that again remember I'm trying to build it for small businesses. I'm trying to figure out if I can build a pipeline for false small businesses that what does it require them to use learn and learn all these nuances. So the idea is you build a master file with rules. Okay. And the rules like how many times do you want to do a review cycle? Do you want to do a security check? Right? Do you want to validate data on the internet? Right? So you can write

all of these rules and it'll do that for you. um it translates those rules into um the right um uh the right configuration files for the different AIS whether it's Gemini whether it's uh chat GPT right whether it's clot right because they all have their own like cla MD file so I produce a yaml that has all these rules that then gets translated into whatever um AI agent you want to use um and then within it it also has all these guard rails like like I like I said before, it's not allowed to merge without me, right? So, it runs everything through first of all functionality tests. Does it actually do what I said? Is it a blog? Does it

produce the intent I wanted? If it's code, does it put the produce the outcome I wanted? Um, and then so there are safety gates and then everything because I'm using git, it's all locked. So, you can see every single change that happened in every execution under the hood. Like I said, you know, my laptop is my front end. Although I right now we're using Yeah, please. So in this instance that you have like six developers pushing code >> is there I guess there's possibility to integrate like teams pop you can change you go >> yeah you can automate all that and then um it does this is what I realized is is that despite the fact that it's pretty

easy to use right the CSD pipelines we have to build the end toend workflows right require a And uh when I wanted my friends who have small businesses to start to use this, I realized that's where they were failing. They were like, "Yeah, did all these things, but it didn't have those endto-end cycles and parameters and it wasn't integrated." So, you know, they were just failing on what they were doing. So, um Okay, I don't like Yes. Hello. >> What you're describing to me sounds a lot like the difference between hacking code and a proper scrum agile methodology on a software team. It sounds like the way people were using AI before was >> hacking code.

>> Yeah. >> But what you're describing is a distilled version of Scrum compressed into very very very tight sprints. >> Yeah. You have to you want to make all the outputs as small and tiny as possible. Yeah. And and I I do that's how I think about it is is that it really can help non-technical people think that way, right? But it doesn't have that context. it doesn't know what to do and so you've defeated it all of these rules and guidelines and guardrails so that it can then act on behalf of let's say you know a business owner just trying to spit out invoices right um which is a simple thing they don't know scrum but those processes you

need to embed within the agent pipeline to be able to do this effectively right um so you have an agent you have a bunch of rules there's a bunch of safety gates and tests that run I've integrated it so it uses is my password manager for um where I store the credentials. Um and then there's a back end which is GIO. Like I said, I have a GitHub running Git running. Um there's a bunch of CI test gate things in there. And then I use vaults for all the secrets. So I don't have to like fill these in all the time. Um it knows how to get them. I have to just unseal the vaults. And so all of a

sudden it can access all my tooling as I need it. Um it's stored securely. the AI agent never needs to see it because that's the other risk you face, right? Obviously, if you're handing it all your credentials, it can do anything it wants. Um, so I wanted to separate that. There's a bunch of mechanisms for how people should do this. Um, I happen to use hash evolve. It stores all the secrets in one password um as my password manager and it just stores those and so it can do end processes relatively well without knowing all of the credentials. So let's see if my live demo is working. So, um, so here it's asking me if it's

allowed to write something. So, let's see what it did. This is going to be a little This is what I wanted to show. So, didn't think we're going to just allow this real quick. So, here you can see I'm allowing it to write a uh write a temp a file to temp, which is actually something here. This is a good example. Uh, I've continuously worked on access and it constantly messes up. Um, sometimes it has too much access and it can write anything it wants and sometimes it has zero access. And despite the fact that we built it very specifically, the AI doesn't notice it all the time. It'll still read something and it'll go, it usually will

say, "Oh, I thought that was just guidance." So, one of the things I've realized is I have to switch the model. I can't let AI actually be my orchestrator. So, I've built a new model which I haven't implemented yet, which is based on NADN as the orchestration. And you know, my orchestration AI will just hand u my orchestrator si single individual command sets or act tasks and it'll run it um it'll run it with with smaller sessions with AIS and then they'll compile it together as opposed to me letting the AI do it because it will constantly just tell you it's all done or it did it right or whatever the case may be. Um

so so here uh let's see open to see if not I have a session that already saved. So um you can see it's launching a review session here. Um it must be phase two. So I did it on auto auto implement autonomous implement. So it'll just keep running and cycling through. It's gone through all my gates. So it checks you know runs tests against things. It validates that the scope and the parameters for the review are effective. Although it gets that wrong a lot too. It often gives it the wrong scope. Um this is interesting because uh here's an example. So it knows that sandbox is there but it doesn't think it's registered. I don't know why. Um what I

found is that um different AI sessions as I was building this would write um different information in different places and I didn't know. It wouldn't tell me that right it would just say I wrote this rule somewhere. Turns out I find out it's just writing it all over the place and I had to give it guidelines. This is a new thing a new uh sandbox I built out. So I was kind of curious what would I run into, but it has forgotten that it's part of the registry and it didn't find that well. Um, so it did capture the reviews. Here's what you can see. It captured multiple reviews. Um, it's launching both of the

reviewers. This is going to keep running in this self cycle. I wonder, let's see if it would um um we're going to just ask it if it'll provide the full disposition level. Uh, the full disposition ledger.

See what it does

cuz I think that so what it does is so it goes and identifies a bunch of vulnerabilities or problems with what it wrote. Um it has those you know remember it does a review across three agents. It consolidates all of their findings and then it it dispositions them. Um, and this was a simple, right, just hello besides um, application. So, what you can see here is the different reviewers that found things. For some reason, it didn't find Claude as a reviewer. Again, it keeps like struggling over where these environmental variables are. Um, but it identified a couple things, right? These are all P2s. Um, and then, and you can tell it also does a self

review because what I really did find is the magic number was three reviewers. To be honest, self and two others found the best. Sometimes I like doing five and I found that um if you you can have really uh you can use um really what it also allows you to do then is as I don't want to pay for claw and I want to move to more free models or llama on my local or cheaper ones that are coming out can just plug in different adapters for different AI agents. Um and the goal will be to have it um use the cheapest ones first and then use the more expensive ones for like the third tier,

right? I'll have it build something on my local model. It'll give me a draft. It's good enough. We'll continue to work on it. And then I'll get to a place where I'm like, "Okay, let's have Claude and Codeexer do this or Claude and CH GPT. It'll come back. I'll work with the local one." Because I was realizing also, you know, this is becoming really expensive. And all my friends are like, "Eric, I can't afford $100 cloud license all the time." Um, and even, you know, having 10 pro licenses on, you know, chat GPT and cloud and all the rest. That's also expensive, right? I was like, how do we get to use this? And so

I built it so that it can also go against the local model. So let's see if it >> five minutes. Oops. Okay. So what you can see is is that was my my demo. I'm sorry it was short, but I built this scaffolding in the CI/CD pipeline so I could trust it more to do crazy things and go off and do whatever it wanted, but it needs guard rights. And um and again that's what I'm trying to figure out is how do I do that for my you know small businesses I know the smaller enterprises these don't hire Aaron and Rob. Um let's go back to the slide. We'll wrap this up real quick. Well how about we'll go with since we're

here questions thoughts perspectives. When you started to kind of stack everything where it was starting to build upon itself do review did you find that the more it tried to refine itself that it was introducing hallucination as part of that? It could um that's why I was saying like I had to stop the number of cycles it would go through or have certain break points. But I will tell you that having um the multiple reviewers look at it, it was amazing how many of those went away. Like if Claude introduced it in its context session because it knew what it was doing, the other code would find it and the other codecs would say, "I have no idea what

you're talking about." And so sure enough, then I'd get to a place where it would be like, you know, it understood itself, but the other ones caught it. >> It's interesting. So it seems like they have a better handle in other engines where I've tested that the uh rather uh AI platforms they start to almost hallucinate when you get to a certain point building. So >> so that's what I did. But what what what caused that to break was when I had other AIs provide that input because then we'd look at all of that then it would go back like claw would come back and say oh wow I missed a 100 things. I'm like I did.

>> Um >> any other Yes. >> Is there a platform that you built open source? um where I'm building open source. >> Oh, is the platform that you're building open source? >> Uh yeah, I think I want to just publish it when I finish it. It still is of course requiring more work than I thought, but the goal was really to yeah, push it out there and let people use it and give feedback to it cuz I just think that uh AI is so powerful, but really I want the people that don't aren't as savvy to be able to use it. Is there another quick thought? I think we'll probably end on that note because

we have to get to other ones. >> One more then followup. So whenever you do publish it, how can we be alerted to that? >> Um well, we can connect on LinkedIn. I'll probably post it on my LinkedIn page, but if we connect then I'll try and make sure you have it. I'd love any feedback or thoughts. >> Do you have a contact slide up there? >> I don't. I should, but I don't. >> Will you share the presentation as well? >> What was that? >> Will you share the presentation? >> I will. Yeah, as part of the the description and I'll add a contact slide if people have it. Okay, thanks everyone. >> How do we get access to the I think

BSides Buffalo 2026: From Chaos to Capability: Building Resilient AI Workflows

Related talks