Offensive by Design: GenAI and Docker for the Lazy Hacker

Name: Offensive by Design: GenAI and Docker for the Lazy Hacker
Uploaded: 2025-09-01
Duration: 24 min 24 s
Description: BSides San Antonio 2025 June 21 at St. Mary's University

BSides SATX · 202524:2458 viewsPublished 2025-09Watch on YouTube ↗

Speakers

Wes Wright

Tags

CategoryTechnical

StyleTalk

Mentioned in this talk

Tools used

CloudFox Docker Compose HP Fortify Semgrep

Platforms

Docker Kali Linux

Service

ChatGPT

About this talk

BSides San Antonio 2025 June 21 at St. Mary's University

Show transcript [en]

Okay, that was an enthusiastic response at least by one person. So, thank you. Um, we we are going to talk about offensive by design. But before we get to that, I want to uh welcome you through now the um the the middle part of the afternoon of Bside San Antonio. I'm only because they mean so much to us. You'll hear every session we thank USAA and St. Mary's just because they do so much to make this all happen, especially the facility. So, thank them very much, our diamond sponsors, USAA. Also, I want to uh thank for their sponsorship. I want to make sure you get right into getting as much as you can from our

speaker, Wes. Wes writes going to be talking about offensive by design, genai, and Docker for the lazy hacker. How many here are lazy? Oh, he lied. You raised your hands. >> You're not lazy. But Wes is going to try to do it. Wes, please take it away. All right. Is this all right? We got it. All right. Thanks everyone for joining. So glad they gave me the big room for my first conference talk. No pressure. Um so as we said, we're doing uh Offensive by Design, Gen AI, and Docker for the Lazy Hacker. I uh had a moment of inspiration and wanted to um kind of design a talk around it after doing like a late night vibe coding session if you

will. So, a little bit about myself, a senior security consultant at Bishop Fox, founder of my own LLC called Hill Country Hacking. been in the offensive security field for about 10 years doing uh mobile application cloud and enterprise network pentesting as well as web apps uh bachelor of science from computer engineering from&m finishing my information security engineering masters from the SANS institute um GT x 11 CISP and CompTIA A+ started it all off um so why this talk we got another talk on Gen AI but this one's going to be a bit more storytelling and a little more fun. Um because Gen AI is fun. Uh like I said, I had an evening of inspiration. Um after

about 3 hours of doing this uh over one night, felt to be a good talk. Um so I used Gen AI to write the abstract to submit to Bites. So um a lot of exciting capabilities that are becoming increasingly mainstream uh learning uh making uh development prototypes as well as supplementing existing workflows. So um as as we say it's another talk on AI but it's a lot of fun. So, um, I'm sure there's a lot of people in the room who've said, "This is the greatest thing ever." And then almost immediately followed up by saying, "I want to throw my computer out the window." Um, while you're trying to get some code to work. So, if you've done that, this is a talk

for you. So, uh, as I said, this is my first time presenting this talk. So, hopefully we get some engagement and, uh, interest on the topic so we can identify what my next steps will be. Um, quickly we're going to go over Gen A on Docker quick quick high level uh the mission to learning the buds the demo and what comes next. So we're going to live demo on my first presentation. So hope we made our demo god sacrifices. Um so quick primer on genai and LLM. Who's who's familiar with those? Everyone. Okay. Most. Okay. So we this is why this is super quick. Um we've had about 10 talks on it already. So I don't

want to bore anyone else. Um but gen AI you know generative uh generative humanlike responses making realistic videos um ideas and it's powerful tool for creativity large language models generative AI trained on vast amounts of text um to produce humanlike language these are non-deterministic so you don't get the same thing every time and they are uh you know prediction they are not understanding we'll get into that in depth later. Um, Docker uh works on what are considered portable containers run consistently across different environments making it easier to develop, test and deploy software. Uh, for this particular project uh, and supplementing my workflows, I used Cali. Uh, has a minimalistic Docker you can download well-maintained easily

easily configurable, has app installs for making common tools really easy. So um very easy makes it lightweight. Over here we have our um quick high level of containerized applications. My original workflow was with VMs um which got incredibly bloated uh and would easily fill up a terabyte hard drive with all my snapshots um backups and things like that. So Docker really reduces the footprint um and capabilities um and portability. So um it's why I kind of chose those two. I hadn't taken a lot of thought for Docker but during um my last couple months at work I started to try Docker a little bit more seeing different applications especially in cloud testing uh environments. So I

started playing with that and then I wanted to use chat GPT to learn. So using chat, our mission was to use chat GPT to learn Docker, create a packageable CLI tool and streamline procket work project workflows. Um and as you'll see, it was a fun journey um that should help with future projects. Um I'm starting my research capstone for my masters um which is to use Gen AI to enhance password cracking when resources are limited. So maybe the results of that research will develop a new talk next year. Who knows? Or maybe I will have thrown my computer through the window and you won't see me at all. So um let's get into it. So the prompt

after a couple rounds of prompts um learning docker docker compose making docker files uh I use this prompt spelling errors and all um to enhance my um bash script that I have for work that I had been working on. I know bash what a monster but um we have uh as we see here the prompt let's add the ability we have created to modify attach script to use it um and then we get all the way down to I want compose files which should have been docker files but see I learned um to easily update uh if I find new tools that should be installed so I wanted everything templatable easily um updatable as tools come and go um and

extensible So um I wanted to try from scratch to see how J chat GPT could do this by itself. Just give it a prompt. See if we could get a functioning program um at the other side. Um disclaimer we did not. It took a lot of help. Um which is where we get into um in our next couple slides here. So this is the phase where I started realizing that the talk I could give would be less about making a really cool tool but identifying how to use um AI to code uh and supplement our workflows as security professionals uh offensive defensive red theme anything like that. Um, and so as I started doing this, we I

started running into common themes and pitfalls that I'm sure most people have run into, have stories themselves. I know I talked with my manager. He was doing this for some Azure templates uh to work on one of our tools. We have CloudFox. He wanted to extend it to add some Azure things. and we had a talk where we were talking about how uh how cool it was and how quickly it could develop things, but how wrong and confidently wrong it would be. So, let's dive in um to my biggest headache. Um missing files. I asked this thing about five times that setup pie was not in the zip package it gave me. After the fifth

time, it finally said, "Yep, I goofed. Um I didn't add it. Let's fix it now. So, that was an annoying 30 or 40 minutes of my time trying to just get the file I wanted. So, we finally got it fixed, but we didn't. It strikes again. This time with a uh misconfigured entry point in setup.py. Um but as you see here we can put our trace backs errors and um different things in the uh in this case chat tbt uh and it will try and resolve those errors. So that we'll we'll discuss that a bit later as what uh applications you can do for that. But so um also why does it choose to place things um where it does? So, I

had my Docker Compose and Docker files that were in the libraries in in the package that we eventually generated, but it chose to put my notes in the Python code uh and just write them uh on each run. Uh it does work, but it's not a great method because you introduce a lot of complexity and and um opportunity for error. So, would have been served as a template. Uh, let's see. I got curl. The the curl creator calls um AI bugs uh AI created bug bounties AI slop. Uh, and here we see a bunch of messy code. It tripled up on this particular um function here where it's going to tell me what to run if we don't have a Docker

uh compose file that's existing. It was meant to be just a quick check. So if we run uh this with a no docker, we'll get a project file and we can run it again. If it exists, it'll check uh and then it'll tell us to do our docker compose up. Uh but it tripled it up while we did our our um code reviews and bug fixes. So that's messy. Uh and then as you see under here, this is part of our docker compose file. Uh we have a double mount of the workspace directory. So when you try and compose up, it yells at you and errors out. Um, and my favorite, which I don't have shown here, if you have two

directories that are named the same, it will tell you to either delete one of the directories or wait a day because it uses the date. I was like, that is less than ideal. Uh, but as I mentioned earlier, one of the nice things to do is do a step-by-step run through looking for logic files bugs. So you can just run chat GPT on it's the codebase and it'll run through. Uh I did a couple iterations looking for logic flaws, missing files, syntax errors. Uh as you can see here, I did a couple at once. You can also use this um to highlight certain parts of the code and walk you through what's happening. um which was

stepping away from the development side using this as any sort of code review as a developer or secure code review as a offensive security professional. if you have a local LLM or a corporate license that has a uh the ability to kind of keep your uh code in secure spaces or um you know you have uh NDAs and things like that that allow you to put um client code into a generative generative model. You can use this to help you do secure code review. uh highlight like highlight section say this got flagged by for example uh SEM grab or fortify and say this portion of code flagged for crossite scripting what um what is wrong or what can be fixed or

what are the entry points in the codebase that we can get to that to compromise it. So, it's a great uh additional pair of eyes on the code um to explain either line by line or by function what's happening and you can really get into a uh a good flow there and it's a valuable um way to expand your your workflows. Um this one didn't happen. This is a bonus. This happened on one of my initial tries of this. It was a different project I was trying um about I'd say this time last year. So I believe it's been fixed but uh as we see with hallucinations or chatbt just filling things out it will give you a

completed project but it may not be right. So this one uh import a ACE tools as tools was actually um a pretty prevalent one. Ace Tools was an internal open AI Python library um that got uh trained into one of their models uh and it started spitting it out. Um and there's actually a knowledgebased article in the community forum for OpenAI. I actually installed this but luckily there was a dev who in who went and registered the the Pi Pi package and it was just empty so it installed nothing. So it was benign, but this really has a lot of security implications as we go that we need to be wary of the outputs and not trust it. Um

because it wants to be helpful and it will finish it for you even if it's wrong. So kind of the takeaways here, we must take AI as an assistant that's incredibly smart but makes mistakes. Um, as I developed this, we kind of abused the uh old re research theme that using new technology to enhance old techniques and we're now seeing vibe coding. Um, so we need to know how to have good vibes. uh when I found that uh complete reliance on the AI we had um some decent results but a lot to be desired with bugs uh and it developing things not like a human that's um probably the key takeaway is is it's prediction not

understanding. So when it generates the code it does not generate like a human or a developer. So it puts, as we saw, things in weird places, triples things up. Uh we come up with all sorts of weird logic flaws. Um so we trade off our hours of coding to hours of debugging and managing a tool. So you become less a developer, more a manager. So if you want to be a manager, maybe try a couple of these and learn. Um so yeah, and you kind of get into a cycle of QA and code review, but it makes mistakes. But it can go into QA, but I ran out of bullet points. when we go into QA and it goes and goes and goes um

until you kind of get to something. So, let's see what we came up with and see if I can switch this and

switch two. >> All right, that is tiny but we got three screens. Can everyone see? Okay, good enough. Yes. >> Okay. Um, so we ended up with Fox Docs. Um, so amusing name. So as we can see, we can pip install it. Um, that was one of our goals. See if we can make a package. We have our, uh, this was just a test. So we'll do it live now just in case things broke. Um, so we're going to do setup client. Service line is EPT. Oh. >> Oh, cool. I did. That's the first time I misspelled it that way. Nerves. I'm going to call nerves on this one. It's set up. There we go.

So, now we copy paste this. See if we can also get my favorite error before we do that because I left it in here. So, we'll discuss what I want to do next. Choose a different client name or away the day. Why? How about pick a different name? Um, so here we go. I pre-built all of these images last night because I am working from an ancient 10-year-old laptop and docker compose up from a fresh build takes 25 minutes. So, I mean, I could have done that for the whole talk and just watch had y'all watch docker compose. So, we have our compose file, we have our container, and we have a cali image. It's do often times we have

multiples. So, Let's do an internal pen test.

So again, this is all generated by AI with some bug hunting by me to get it to a state and um did pretty good. So, so we got our B sides, B sides, B sides, uh, our workspace. So, start with our external workspace. So, now we have a, uh, workspace that is configured per our service line uh, with common themes that we have uh, loot, notes, recon, report, screenshots, terminal logs. Uh, and so one of the nice things I did here, um, to make sure you always have, um, all of your tool outputs is I have a terminal log that starts on each, uh, container. So, let's attach our shell, drop right into a new nice new Cali.

So, we're on on our Cali box. Alas, let's do a test recon file.

Put that in our recon test tool. Put that out. It's not. All right. And then let's check our terminal logs.

Oops. Yeah. See, live demo.

There we go. So, full terminal log based on everything we've done. Um, because I have a bad habit of doing those. Oops. And it's gone. So, or I discovered on this laptop if you do a middle click for too long, it will destroy the tab too. That was fun when I was setting this all up. Kept losing my shells. So, we have all that. We can see that our recon has a tool output. So, what's nice about this workspace is we've mounted it to the file system so that if we look in our documents, where did I put it? Boom. Docker share EBT workspace recon. All our stuff should be there. So, that when we're done with our

container, we come back over to Fox Docs. Give us a status real quick. see what's running uh and then destroy our containers or first we will archive it. So, so we're not believing this comes from already there. Oh, arox client.

And this is another kind of flaw in the program. It makes everything network. uh maps all the EP to network. So you actually have to give it network, not EP. Uh and the date. So that's one of those like why why would you do that? And this is 0621. Yeah. Uh service. Oops. Blind. There we go. So now we have an archive of all of our outputs. So we should be able to open this up, see our archive deliverables,

which will have our notes and our recon logs and our terminal output. So if our clients need to ask us what happened when we have it all logged then we can do destroyed R2. We should see the EP go away here in a minute.

My ancient laptop showing its age. The container's gone. But our mounted file system is still there with all of our outputs. So, um, let's switch out so I can transition out. We got about what? 5 minutes. >> About 30 seconds. >> 30 seconds. All right. Maybe we'll go over we'll we'll speed through. So, that's Fox. Um so the next steps if you all liked that um and kind of enjoyed the topic let's um my next steps would be to develop a blog post that examines the tool step by step of all the code flaws we found um in a code review style uh and then try different IDE integrations models um codeex came out like the week after I did all this so I

didn't want to redo it with codeex and try and redo the slides um but so if you all like that that's my next idea here and thank you. >> Yes, thank you very much. Um I I know if you haven't done this before coming up with the first time doing a presentation and a live demo and finishing on time, congratulations. Here's a little token of our appreciation for that and thank you very much. That was Thank you, Wes. And we'll be starting our next session in about five minutes.

I didn't get the chance to come say

it. No, it's not what you thought. You put that. The way you put that

Offensive by Design: GenAI and Docker for the Lazy Hacker

Related talks