Josh Rickard - LLMs: Prompting, Agents, Assistants, Oh My

Name: Josh Rickard - LLMs: Prompting, Agents, Assistants, Oh My
Uploaded: 2026-06-09
Duration: 51 min 19 s
Description: A beginner-friendly tour through the current LLM landscape — prompting, vibe coding, skills, agents, copilots, and assistants — framed for security practitioners and engineers. Rickard walks through terminology, shows concrete prompt structures and skill definitions, and demonstrates how reasoning m

BSides KC 202651:1919 viewsPublished 2026-06Watch on YouTube ↗

Speakers

Josh Rickard

Tags

CategoryTechnical

DifficultyIntro

StyleTalk

Mentioned in this talk

Tools used

Algo DeepSeek-R1 GitHub Copilot

Platforms

Service

Frameworks

Languages

About this talk

A beginner-friendly tour through the current LLM landscape — prompting, vibe coding, skills, agents, copilots, and assistants — framed for security practitioners and engineers. Rickard walks through terminology, shows concrete prompt structures and skill definitions, and demonstrates how reasoning models, state machines, and tooling fit together. Closes with reflections on the broader societal impact of LLMs.

Show original YouTube description

Join me as I breakdown all this LLM gibberish and provide guidance on how best to use these tools for socio-technical and engineering work. This talk will explore the current nomenclature and provide clear examples (use cases) that show how to use these tools properly as well as explain why. This talk is aimed at beginners but also dives into how assistants work from workflows to hooks and more. At the beginning of this talk I will explain the current terminology used in the realm of LLMs. This includes genres concepts like promoting, vibe coding, skills, agents and more but we will also share concrete examples that are focused towards security practitioners & engineers. This talk aims to breakdown this new technology while sharing examples along with real world tools that you can begin using today. Even though I believe that LLMs will transform our world in many capacities, I will also share some thoughts about the importance of understanding the external impacts LLMs have within our society as well. Come for the deets and either leave empowered or depressed. Up to you.

Show transcript [en]

So, man, the intro is great. Give it up for all the volunteers, please. They're this conference would not happen without them, dude. And I cannot wait to see this working tattoo. I cannot wait. >> Feed the mic. >> I'm sorry. I'm sorry. Okay. Okay. Okay. I'm not used to this. So, >> Feed the doll. >> All right. All right. So, today, all right, again, my name is Josh Rickard. I'll kind of give you a little rundown. Sorry, I'm going to try to be good with the mic. But, today we're going to talk about LLMs. I have this Um I love themes when it comes to talks or whatever. I think it kind of plays into

everything that we're storytelling and everything else. So, today we're going to go deep though. So, if you have any experience with LLMs, who here actually has experience with LLMs like on a daily or weekly basis? Yeah? Okay. Who here is brand new, never use it? Okay, good. All right. Cool. So, we're going to go through and just kind of show you some tips, but also just kind of the evolution that we've seen so far in the, you know, past few years. So, LLMs, prompting agents, assistants, oh my. Again, my name is Josh Rickard. I have a background in blue team and digital forensics. Before that, I did like sys admin and Windows security stuff. I've worked at a soar company and I'm

currently a senior threat detection engineer at Sublime Security, which is like a fishing defense platform. Do a lot of research. I have experience in product management, people management, and I was a senior software engineer for um um 2 billion event processing pipeline. Like, awesome stuff. So, anyways, you can find more of MS Administrator. That stands for Microsoft Administrator, not Miss Administrator. It was a long time ago. And then Let's Automate It is my blog. So, you can read anything that's kind of reference papers and all that stuff. So. All right, it's enough about me. So, let's talk about this whole analogy that we're going to kind of go through this this trip down Oz. So, we're in Kansas, so I thought,

"Man, why not have a Wizard of Oz themed talk, right?" MIZ? No? >> Go you. >> All right, all right, all right, at least one of you. Um So, in this case, we're all Dorothy in in this scenario. So, just kind of think of us as people and humans in general are Dorothy. We have all been inundated. We're just at home chilling. It was COVID and everything else, and then man, LLMs hit, and we're just like, "Holy [ __ ] what is all this magic? What is going on? Why is everyone freaking out?" Blah, blah, blah. I know we all don't want to be here. I didn't really expect to be here. Uh but we are. And so, I think we have a

choice to either learn and and adapt or or not. And I think that it can help in a lot of reasons. As previous talk talked about, I think uh I'm in the the stance of LLMs are an amplifier of your own skills. I I don't think that they're going to replace it, and I don't think they're going to augment and take away jobs. They will some, but I think if you are intelligent about how you use them and how you use the tool, I think you'll be successful and you won't have a concern long term. So, we'll kind of go through this path. We got hit, right? By this tornado. This onslaught of [ __ ] Uh anyways, have

you Who's here heard of GPT-3? It's old. Did Did anyone ever use it back in 2020 or before? Okay. Uh I did not. I didn't even I kind of knew about it afterwards. It really all started way back in the day in like the '60s of Eliza. Uh anyone here ever heard of Eliza? It's a it's a model that uh basically just repeated itself uh back to you. It You asked it a question and it would just respond. It was super basic uh program back in the day. You can actually find examples of it online right now if you just search Eliza. Um you'll you'll find a whole bunch of stuff about it. But, then ChatGPT came

in 2022. That was like the big, uh, epicenter of like, "Oh, man, we can just type in this stuff and get some cool responses. Cool." Oh, [ __ ] I mean, it it was worked okay, but it was pretty pretty crap. And then GPT-4 came out, which was more of a an advanced model, uh, with a little bit of reasoning on top of it, and we'll go into the details of that. And then you have, uh, Claude, I think was the big epicenter or the big, uh, when Claude Jim and I and Lama, those are other models, were released. I think that was the prep the the start of something completely different and what we're seeing today. Uh,

anyone here heard of DeepSeek-R1? It's a model that, uh, basically was a way more efficient at its inference, uh, or inference, uh, phase, not the training phase. They basically threw 671 billion parameters at it, uh, and, uh, only activate 37 billion. So, the training is huge in a huge computational event, but then, uh, the actual inference is super low, meaning you don't have to run the model on super high-end GPUs, you can run them on lower-end GPUs and still process. But, it's a reasoning model, and there's problems with that, and we'll talk about that. And then Openclaw, who here has heard of heard of Openclaw? >> [snorts] >> Okay. Openclaw is kind of the thing that

I think was in January this year, kind of, uh, was released. There's been other variants, and we'll talk about that, but Openclaw was like the innovation and, uh, well, basically why all Mac minis are off the market right now is because of Openclaw and other variants. Uh, it is a huge almost semi-autonomous uh, LLM that can run 24/7 all the time, do actions on your behalf. Uh, it has a lot of access issues, and there's a bunch of problems, but really that's kind of the prep preface of where we're at right now is like Openclaw has been like the name the main thing uh, in the past year or the past couple months. There's other variants. and Again, we'll

talk a little bit about those. But, let's talk about also the players here. Right? We have Open AI and ChatGPT, um, which is pretty well known, pretty the basic. Anthropic creates Claude, Claude Code, Claude Desktop. Those are different variants. Uh, Google has Gemini. Anyone here has a Google Workspace by chance? If you do, you pretty much Claude or Gemini's like an integrated in every damn thing that they do. Uh, Meta and Llama. I've never used Llama, uh, but it is uh, a model that is out there. I heard it sucks, but who knows? Um, I just haven't used them. XAI, uh, and Grok, which is Elon's, uh, Twitter bot, basically, or X bot. Is it X bot? I don't know. Um, and then,

again, Open Claw. Uh, who here has heard of Cursor as well? Okay. That's another big one. I actually, what was it? A week ago, uh, Elon said he's going to buy it for $60 billion. Um, that started with $3 billion of an evaluation. I don't know how those numbers work, but, you know, they are. And Cursor is a basically an IDE. So, it looks like Visual Studio Code. It's actually based off of Visual Studio Code, uh, the Monaco, which is like the underlying framework, but, uh, integrates LLMs into it. So, it's like a code editor on steroids. Uh, instead of a Copilot, which you see in Visual Studio Code, or Microsoft Copilot, or GitHub Copilot, those are

integrations on top. This platform is actually built on LLMs, [snorts] really. And then, uh, all the AI startups, right? The the hundreds of billions of dollars that are being invested, people pitching all these things. Um, so, all the players are everywhere, and we all see this, right? So, let's talk about some terms here, though. Like, make sure we're all on the same page before we kind of dive down. Uh, LLM is the large language model. That's what you see. That's the Open AI. That's the prompt. Um, this is not AI. People [snorts] think that it is uh it isn't. Anyone here know of Alan Turing? Yeah, Turing had a imitation game was his um his thesis that basically until you can

tell that you were not talking to a computer, it is not AI. It's kind of a um standard uh measurement, I guess. And so LLMs are like that, but they're not thinking. They're they're um Basically, they have a huge map of a bunch of different data and attributes and and other things that are predicting tokens, predicting the next um iteration or the next uh word in a sentence. Prompt. Uh so, that's what you input. Uh one thing that I'll say, and we'll go into the details here, but garbage in means garbage out. Right? Like if you do not give it uh guidelines, it will basically hallucinate and try to do a bunch of [ __ ] that it doesn't really

you you didn't want want it to do. Token. Uh Anyone here know what a token is? We keep talking about that. Token is basically a word or a phrase or like a segment of text that is indexed. And they they insert these tokens uh into these models and it basically can re-reference, okay, where all has this been referenced, this word, and kind of build up a big map of data sources and attributes and weights and scores and there's a whole bunch of [ __ ] but uh really it's it's just a word or a phrase is how I like to describe it. Context window. So, that's a big Everyone says, "Oh, you need context, right?" Like LLMs have a limited memory

or limited uh scope that they can actually load into their own memory. And that's called the top context window. If you go outside of that window, it starts forgetting things and it starts not remembering. So, some models have like uh 17 gigs of data you can throw at it. And that's that's its huge parameter window. Uh but some most are going to be pretty small. They're going to be like 37,000 [snorts] characters or uh you know, 170,000 characters or or or I'm sorry, words that you can kind of throw it. So, some models can take huge documents and you can just throw it at it and it has no problem. Others won't be able to handle that that load cuz

it'll forget about earlier things in that context window. Uh rag, who here has heard of rag? Yeah? It's kind of a weird ass name. I don't really It's basically a vector database. This is a data science thing and a machine learning thing, but basically it can inject snippets of text at appropriate times during the reasoning process. So, it's basically just like a side loaded like cache of like, "Hey, add C++ data here or add, you know, this data when you're Python language information or an API doc or whatever like in the context when you're processing reasoning." Uh agent itself, that's an LLM. There's a lot of different terms here, but I like to call agents an LLM that

basically calls tools. Uh maybe an another third party. We'll talk about MCP as well. Maybe they'll take actions and there's some sort of loop process. It's kind of semi-autonomous, but it's but it's really just kind of a you give it a prompt and it kind of goes through a process. A lot of people can just use ChatGPT as it is and not have any issues, but then when you're really wanting to accomplish something like autonomous code generation and all that, you have to provide a little bit more detail and that's where the agent kind of comes in. Uh MCP, who here has heard of MCP? All right. MCP is the model context protocol. This is really a way for APIs or LLMs to talk

to a bunch of different services. So, whether that's Salesforce, whether that's an internal database, a local file, a CSV file, whatever. It's really a wrapper around being able to call and send it text like, "Hey, give me all the IP addresses to enrich." And you have an MCP server that may call out to VirusTotal and and hybrid analysis and like all of these different products and then return one aggregated result. That's usually what the MCP is for is so an LM can talk in text, really. And then vibe coding, anyone here ever vibe? I So, I'm old, so I like to call it chill coding, but you know, that's just the '90s in me. But, vibe coding is

really just kind of iterating and prompting and then just kind of getting a response and saying, "Hey, let's update this." And you're just kind of going through the flow of communication where we'll talk about more of like how you can use it in a in a more robust manner, at least for my opinion. All right, so we talk about all the the good witch in this scenario. Some people would say it's a bad witch, but good witch in this scenario is the investors and the evangelists that are out there. The hyper evangelists and AI's going to take over the world, it's going to do amazing things, there's this magic, we're going to cure cancer, we're going

to do intergalactic space travel, no more working, we're all going to go back to nature and do nothing, right? Awesome, sounds great. Then there's the other part where it's like, man, it's going to take all our jobs, all the white collar jobs are going to be out. AI overlords, Matrix, you know, all that. Um, I don't believe in either one. I think that it's a tool just like everything else. I think we'll actually go back more to humanities and poetry and everything else as we evolve as humans, but that's a huge philosophical debate that we can have at another time. The wicked witch, right? It's really our conscience. Uh maybe it's just me though. So, I'm going

to kind of go on a little tangent here. So, I I wrote a paper recently and it's about the externalities of AI. If you don't know what externalities meaning the impact that AI has on our ecosystem and our lives or human lives. Um it's a paper I wrote with a professor friend of mine named Scott Christensen from Mizzou. And we actually presented it at a symposium last week, but the paper you can find it on a on my blog there, but it's a draft. But the point is is that there's a lot of external impacts that LLMs have on our society that I want to make sure everyone kind of knows cuz I think it

helps understand your usage and and how you can think and frame the impacts here. There's a huge environmental impact of LLMs that we've seen. Data centers are are encroaching where we're constantly uh being you know, you see requests in like I think Festus, Missouri and a whole bunch of others are trying to install data centers all over the place. The problem is that there's a huge like problem with water extraction uh water evaporative cooling for data centers. It's projected at 7 billion cubic meters of water being actually pulled out of the of our ecosystem by 2027 because of data centers. Uh to put that in perspective, Fulton, if you know where Fulton is, it's a nuclear power plant, right? Uh

it's is point uh 5% of that in its total capacity at 49 million cubic meters for a full operating year at full capacity. So, if you think of 7 billion, it's it's a huge huge number of water that's being extracted. Um so, there there's energy costs though. So, energy to power all these, you know, you we have all the Fang, you know, Google, Microsoft, Microsoft invested $16 billion into Three Mile Island that's supposed to be open by next year or 2028. Uh just to get a mega megawatt hour basically or um it's uh megawatt uh and then we also have Google uh and Facebook investing in small modular nuclear reactors for the first time uh

by 2030. And uh we have not even talked about how we're going to waste how we're going to like take care of all the e-waste or the uh impacts that it has on our environment like where we putting all these spent rods and and what happens if one of these melt down in LA? They should We don't really know so there's a lot of precautions that we just don't know. And they're investing billions and billions of dollars in this. Uh there's also the huge impact that it has on um recycling. GPUs are are fast cycle in data centers. They're improving and videos improving all the time AMD blah blah blah. What happens to all the e-waste?

We're not great as a society to to actually recycle [ __ ] >> [snorts] >> So what do we do with all these like weird chemicals like erbium and turbine and you know [ __ ] like that? Yeah. So sorry going on a tangent here but the other part is that mental health. You know there's been cases of psychofancy and uh product liability issues where people have actually committed suicide by chatting with uh a an LLM. Uh there's been multiple multiple cases of that so there's a huge mental health problem that it that it surfaces. Also a scraping cost problem. Uh the internet I think right now as of we know it uh has a potential to change. Um

companies like Wikipedia uh archive.org those things uh are free right? They're free resources but LLMs are scraping them at huge cost which is an infrastructure cost. If you've ever worked in software engineering you know that egress and databases and caching all of that has cost to it and it's all imposed on them and not by anybody else. Um so basically LLMs are gaining all the profit while putting costs on a lot of companies. And so, it's a huge concern that no one's really talking about. Uh, and then there's other costs as well. But, yeah, go ahead. >> One thing I'll add as we're talking about responsibly using it, >> Yeah. >> there's also the PII

that the cleaners that the humans they use for the training. Uh, >> Yeah. >> that they have in Venezuela and India. >> Yep. Yeah there's uh, especially with photos and and images and there's like some crazy [ __ ] that you have to dig through that to uh, from a research and data collection perspective, yeah. So, again, my tangent. You can read the paper if you want. It's not very long. It's like 60 pages or something like that. It's getting peer reviewed now, but um, yeah, you can go check it out if you want. Then, so, I know we were all kind of in this this whirlwind of [ __ ] with LLMs and uh, we don't really want to be here,

but we're here. And so, we need to start We just need to wake up and and start progressing towards using these in an efficient manner and in a responsible way. So, I think that uh, we just need to get up and start start walking. So, we're on our little path, right? On the yellow brick road and we're we're walking down. And you know, you come across kind of the first iteration, right? It's the scarecrow. Uh, it it's the the simple prompting. It's literally just kind of um, using it and getting a response and and getting uh, kind of iteratively um, communicating uh, through words, right? Vibe coding, that's the other uh, a lot of different terms for vibe

coding. Some people just think it's like you're sitting there and you're on a mic and text or speech to text. Some people will just think you're just prompting back and forth, just saying, "Hey, add this." and blah blah blah. All depends. It's just chilling. Chilling back. Maybe you're drinking a little bit, who knows, you know. And then, uh, you kind of next evolution is we get into the small and medium size complexity. We're adding a little bit more requirements to your prompts and and other details. And we'll show you some examples. So, anyone here written a prompt that looks like this or similar? You like read this file, create a script to do something, whatever. This is a pretty basic. Like

this is the start of using an LLM. So, right here there's a lot of different variants of what a Python script can do. You know, it has to pull down that JSON, it has to do a bunch of stuff, but you're not really telling it but it's not a complex task either. You're not really telling it all the details that it may need from like a business perspective or a um uh scripting perspective. But it'll figure it out. And there it's pretty basic. But the next kind of phase is really we want to start Well, before we get into that action, let's talk about what prompting is again. It is the natural language to describe

your problem, your question, and your goals. Like that it's just using your own words. Use your own words. Just like you're talking to your buddy or your mom or whoever. Use your own words. Provide the context. And we'll go into the details of what that means, but those are resources like files or other databases or maybe your constraints. Maybe you want a Python script that that parses the JSON, but maybe you just want all the T, you know, technique value numbers. [clears throat] I don't know. I'm just throwing out something. You you provide those kind of examples as well. Like what do you want the input to be? What do you want maybe the output to be?

How do you want it to be formatted? All those kind of details matter because it'll actually perform that reasoning. And one other tip is that you keep track of what you're actually working on. And tell the LLM to actually generate uh an agent or or you can just say a a markdown file so I can reuse this same action again later. And you can just tell the LLM to generate that and then reuse it and update it every single time as you get more and more advanced with your language and the way you kind of design things. Again, use speech to text if that's easier. I like writing. I seem to just be better at writing and

speech to text, but I have friends that are really um use it all day. So, I don't know. Who here has used speech to text with chat GPT and all that? Yeah, work pretty well? Yeah. Some of that I just ramble, so like it doesn't really help me, but try not to ramble here. And please ask questions, guys, if you have any or I say guys, everyone, please ask questions if you have any questions. Yeah, go to Yeah. >> Um so, you were just talking about like after we do a one prompt, save the same that as a markdown file and keep it as >> Yeah, so you can Yeah, you you can basically tell the LLM to generate a

Say "Hey everything that you've learned over our session, let's say like of this chat session, whether it's you know, you're continually asking questions, modifying things, just say any of the lessons learned or whatever, output those so we can use them next time. Or output it this in a way so the next time I can just give you the context and all the logic and [ __ ] will be the same underneath." Um so, you can kind of iterate and I'll show you better examples of that, like more complex than a JSON file. Um one thing that I will tell you is that I basically have written production apps just purely using prompting, uh 100% and I do it pretty much every

day. So, it can work really, really well. It's kind of creepy. This is that like mid-level prompt, right? That This is what I call a mid-level prompt. It'll be probably be a little bit more, but the point is is that you you use your experience when you're starting out. That's how I recommend. Like as a security analyst and fishing threat detection engineer and as a software engineer experience with Python, Flask, or Docker. Do you know why um Well, I'll just tell you. So, because of you're you're giving those constraints or or those requirements here, you can actually um because if you say, "Hey, build me a fishing an app that takes a fishing URL and does DNS and and lookups and all

that stuff." It might do it in Python. It might do it in some other language. It might do it in TypeScript. Might do it in something else. So, providing that context will make it generate um code in in the format and the requirements that you've set. Uh plus use your your own experience because that's what you can understand the most uh and easiest at least starting out. Uh I am one code I am do not know TypeScript and we completely run web apps out of TypeScript and I don't even have to really look at the code. I just kind of test it from a QA perspective. It's kind of creepy. Um but we have a lot of setups. So, uh

The other piece of this is that then you're saying, "Okay, create a simple Flask app uh to collect all this stuff." And so, you're giving it the input like, "Hey, I'm going to pass you in a URL and I want you to display all this data about this using this framework. So, that's kind of a mid-level prompt I would I would say. Again, a little bit more on on prompting in general and vibe coding. It's great. It's really cool when you starting out. But there's no real true thought or design when it comes to um LLM and just kind of what what comes out of it um because there is a context window that we mentioned that uh it'll forget

previous conversation uh even in the same uh session. So, if you just have one chat in ChatGPT like, you know, the little I don't know what they call like chat history or whatever the hell it is. Uh it'll it'll start forgetting things because it can only remember so much contextual data at one time. And that's both the input and the output that that comes out of it. And we've seen ChatGPT and all that being very verbose in their data. Which is good in some cases, but not in others. Prototyping, so it's great for prototyping and ideation. If you just want to quick do something, great. But when you get into the more advanced prompting, and we'll talk about those

details, we have requirements, constraints, the tools, what not to do, what to do, scenarios that you've come across, examples, so on and so forth. How I kind of usually start, again, we'll show you kind of frameworks around this, but is set up a a just markdown file that that has your person personas or identities that you want it to to focus on, so your role, right? Typically like as a again, uh threat detection engineer, use multiple, you can stack them. And uh the reason is because those tokens are basically indexes into other data points in the model itself. And so by using the words that are kind of unique, it'll focus more on uh the data it can extract from those

tokens uh and match them up with the the models inputs basically. Uh you specify the technologies, the frameworks principles etc. Uh you can even specify like, "Hey, I want it to be this structure or this best practice. Use this template, whatever." And then provide again perspectives and goals. Um I Anyone here ever heard of uh critical systems thinking? It's a concept where um if you think about the body, right? You you have a your blood system, your uh um neurological system, your your, you know, um muscular system, but the whole is the system itself. And so it's breaking problems down into the whole, not focusing on one subsystem of the system. It's kind of a weird thing. Same

thing in architecture. When you think about um infrastructure, you don't just think about the problem that you're trying to face. You think about how is it going to be impacted? Who's going to modify it? Who's going to, you know, interact with it? All those pieces are all part of that puzzle. And it's just a way to break things down. So, use terms or or definitions that are helpful and there's like a API design best practices and things like that. So, again, when you're done until the come to output that personality and it'll and you can tell it to always kind of update it as well. Especially during your sessions. There's ways to do this with hooks and a bunch

of other technology, but usually you just tell at the end of your session saying, "Hey, everything we've learned today, let's update our person, you know, our profile or personality file or whatever you want to call them."

Tools. So, you also want to specify in these kind of documents what tools do you have access to or what what it has access to. This is usually just say, "Hey, I have a local Docker container that I have running or SQL database or something local, a CSV file." And or an MCP. Again, we'll talk about that in detail on the next slide, I think. But, provide those boundaries. Like use you Anyone here a developer? Or know Pythonish? Okay. >> Sure. >> UV is over poetry is like a package manager like pip. UV is another variant, a newer variant called Astral. But anyways, you just basically specify those kind of hard requirements for your environment, your

setup. So, only in only invoke these MCP calls when given an actual IP address that's a not a local address, basically. Or those other kind of requirements cuz if not, it's just going to do whatever the hell it wants and keep going. And then only perform hunts after all of data extraction. Just add more context of when and how and what to use at what time. There's also the documentation, especially if you're internal and you have like a web app that or like VirusTotal or whatever, you would specify and provide it that documentation or link to the JSON or API specification. So, it doesn't have to go and look that [ __ ] up on its own.

It knows the direct link and it just provides quicker access cuz if not, it's going to have to reason and go search and do a bunch of crap that it didn't really need to do. If you can just provide it, it'll just look it up and and go on its way. Uh provide jargon. So, if you have any internal jargon or keywords or anything like that, provide that because context matters again. And then uh provide examples of the past, but you also want to specify what not to do. Right? Like like don't use a tool when um don't look up a uh SHA one hash on VirusTotal, use SHA-256. Something like that. Uh and then define what to do if a

situation actually occurs. The next is we see all the time security risk. I'm not going to go into huge details here, but everyone knows what prompt injection is. Kind of yeah. It's kind of a nebulous thing, but for the most part, it's basically someone manipulating it to uh read some data that uh it shouldn't. Uh it's reading text. And so, you want to kind of mitigate those as much as possible. Uh not again not going to go into the details, but there's ways depending on your setup uh to do this, but not probably 100%. Like if you're reading files, uh I'll use an example. If I'm reading a phishing email, I don't want any of the content in that

phishing email to be considered malicious or to provide any instructions to the LLM, only follow my instructions. And you can provide guidelines and specifications around Uh there's data leakage problems that we've seen uh, people sharing links to chat GPT conversations that they're hooked up to their uh, you know, Facebook or mailboxes and and now anybody can continue that session. Uh, there's hallucination information or misinformation. Uh, one of the biggest things here uh, I just again did a talk at a loop uh, law school symposium and one of the biggest concerns that they have is kind of crazy uh, is that if a LLM hallucinates a case in case law and that case law didn't actually exist

when it gets put into case law does it actually mean it's true? It is. That's the fact. If it once it goes into case law, it's true and it's on fact even if it does not exist. >> Hooray, justice. >> Yes, crazy. I know, it's kind of creepy. Um, and then abuse and information obviously we're seeing mythos in the in the news about how it can train exploits to do magic and you know, I don't know, hack NASA. Like I don't know what the hell it could do. But um, yeah, mythos. No, it hasn't come come out come out but like the news about like it can hack a bunch of crap. And then authorization abuse uh, obviously

abusing you know, credentials that are saved in an LLM and in sessions and all that. Uh, we've seen that. So, we've been on this kind of journey of prompting and and kind of leveling up a little bit. The next is really we're going to focus on APIs. Um, this is the integrations, the tools, the the the frameworks that are out there. Uh, APIs who doesn't know what an API is? Cool. >> Be honest, no shame. >> Yeah, no shame. So, API is a there's a couple different REST is probably what everyone knows. That's like JSON back and forth with the server. And you're saying, "Hey, give me some data, update this thing on the server." Blah blah blah.

Uh, as are cool but there's a lot of layers to this. When it comes to integrations with different products, they all do the same thing different ways. Even internally, I've seen products literally have the same product. And then their APIs are completely damn different for the same variant of the the different variant of the same product. It's kind of creepy. And then you have MCP, which is again model context protocol. We'll explain kind of the abstraction there, but then you have tools again plugins or or tools. Those are like the the framing of that information. Skills and rag and then assistance and co-pilots. Let's just skip on to MCP. So, MCP is a standard that came out

about last year. I think it was it last year. And it really was a standardized way for LLMs to communicate to APIs. It was it can accept text as an input and it received aggregated text or prompt as an output. And really it was a way to abstract and allow LLMs to communicate to internal services as well as a whole bunch of external services. So, I have one called enrichment MCP that you pass in an IP address or a hash or whatever and we'll go look it up in 10 different sources across different services and bring back the aggregated data into one API call. So, if an LLM says, "Hey, look up this IP." it'll go

and search all 10 of those services services and then return text that says, "Hey, this is all the information that we found from all these different services in a in a standard way." And so, you provide this like individual way of this like business logic kind of abstract, especially if it's internal services like an API or a database locally that you don't have public access. MCP is a great way to kind of standardize it. It again makes framework or the development of them really simple cuz there's like Python libraries and go libraries and things like that to make it super easy to implement. I have a high horse. Anyone here know what GRPC is? >> Yeah.

>> Yeah, it's Google's remote procedure call. It's how Google actually works. If you Google things underneath, it's HTTP2 and HTTP3. Meaning high speed connections. The next level we're all on HTTP1 most of the internet. HTTP2 means that you could basically take a million connections on a one single server instead of the HTTP1 which is like 10,000 or 64,000 ports, whatever. And so you with GRPC it's super fast. It's for high speed data transfer and all that. It's a standard that Google and most of the industry if you're in microservices or anything like that uses. And it already has security in it. It already has authentication. It already has all these features and they decided to just

write their own for some damn reason. So. But can't win them all, I guess. Tools, sorry that was my high horse. High horse. Tools are really just kind of files again that that describe different sources. They'll describe scripts. Maybe if you have one script that you want to use all the time to pull data or to format things or whatever. They'll have definitions of any of those MCP servers that you have set up. They will have the APIs or any docs of like how to use those APIs. And then local files, so on and so forth. The really the the big thing is defining how and when to use those tools. Because if not, you'll just say, "Hey, I

have a connection to VirusTotal." And it'll just try to guess, you know, when to use use that tool which may exhaust API and like there's limits and and other things. So you define when and how to specifically use those in certain cases. Cuz if not, then it'll just kind of again do its own cuz its whole goal and LLM's whole goal is uh how fast it can get to the reasoning or the output that is desired. That's its whole like and this is we'll talk about reasoning models and what that means, but um the the whole point of LLMs is really to come to the conclusion that you want uh in the most efficient way possible. Or

the most accurate and efficient. There's a couple different layers there. Uh provide context. So uh how do you understand the results? You know, if it's a internal service, you may not know what those results are or it may not understand what you're trying to get at. So you have to provide that context like um we do a lot of uh threat detection um identification of like false positives, false negatives, but like understanding the context around that, uh especially in like phishing, is uh not something that's natural to us LLMs. So you have to kind of explain it a little bit more of of what the hell is going on here. And then again, reminder that a lot of

this results will flow back into the context. So uh LLM or an agent may call out to a tool, do all its thing, have this logic, but it returns a result, then then it continues on its work uh downstream with whatever it was prompted to do. Skills. Skills are another uh kind of Anyone here ever heard of skill.nd or agent skills? It's a standard that's kind of come out this past year. Uh but really it's a again, a definition of capability. So um let's say parsing PDFs. Right? You have one single skill that that's very good at parsing PDFs or understanding RFC compliance for like uh email headers. You know, like it's very very specific, but you give it all the

resources that it needs uh in a in a structured format. So you can reuse that skill over and over and over. So, like perform Facebook OSINT. Like that might be one skill very very good at or some other OSINT or whatever, but you try to focus it on these little tidbits of knowledge. >> So, are those >> Yep. >> those made up of like your previous outputs? >> They can be. >> That you have it >> Yes. >> that you you >> Yeah, they can be. It can be your own notes from past, you know, experience. We'll kind of I'll show you what the layout means, but there's there's a specification actually like provide when to do things, the loop like logic,

like how to think about the problem. There there's like steps to kind of walk you through it, but I'll show you what that means here in a minute. Actually, let's go to the next skills. So agent.skills.io. There's a whole bunch of repositories out there that have skills already. One of them is a security company called Trail of Bits. Anyone here heard of Trail of Bits? They're an awesome boutique consulting company that has produced like Algo, which is a personalized VPN and and proxy open source. They've done tons of open source tools. They're they're huge in blockchain and all that. Super math nerds. But they also have a ton of skills that they've created that they have shared,

but then there's tons of other repos out there. If you just search GitHub skills, you'll see thousands of repos of different skills for different things. I saw one the other day it was like create any sort of architecture diagram in markdown. Like that was a skill. I was like that's dope. Like have you ever done a like a UML diagram in markdown? Hell no. Like you would never you would never do that, but this skill can do that. It's kind of cool. So, but really it's just a markdown file. The skill is like a definition itself. It's just words. It's literally just a markdown text file. And you list out any of the documentation to reference. So,

maybe again internally, externally, whatever. Any scripts that you wanted to run, maybe before it runs things or afterwards. There's like what they call hooks. But then you also provide kind of the structure. So, you can again see where it says PDF processing or data analysis. Those are kind of broad, but you kind of focus on very specific capabilities as much as possible. The next kind of feature what people have seen is called co-pilots. Who here use GitHub co-pilot or Microsoft's co-pilot, right? Auto-completion and all that. Works pretty damn well. Those are more They're kind of in a unique category. They're They're They're learning off of the code. They're not really Some of them have chat GPT-esque, you

know, reasoning in there. But really it's all about the code base and the context there. Again, a variant or open source, which is It's kind of in a weird category. But really the the whole point of them is they're agented development environments. They're They're used for developing code or automating some process. And usually they're all done with these state machines. Anyone here know what a state machine is? State machine in programming is basically if you have I like to use a ticketing system as like the best example. You have, you know, your your or Jira Kanban board. Let's use that actually. So, you have like, you know, backlog, to do, in progress, blocked, and done.

Let's say. Each one of those are states. And so, you can't go from backlog to done without going to in progress first or without going to some other phase. Those are just states. And so, you're kind of transitioning from one to the other and you create those rules. In programming that's what it is. It's basically you cannot go You have to go in this certain order and you can't go outside of that order. And it's just called a state machine. Uh and a lot of them use like hooks or uh this there's libraries called LangChain uh which simplify that from the development perspective. But really it's just um do this, investigate uh let's say we're using IR as an example, you

would um you know, obviously plan, but but after that you you would do your recon, your research, you would identify any evidence, you would keep going down this kind of path uh all the way to uh lessons learned. So, you can define that in the state machine to only go through those processes. So, we've come across and we we're still on a path, right? We're still learning. I know this is a lot. So, uh please stop me if you have any questions again. But the next kind of big uh transition that we've kind of gone on is with the line. Uh we have ChatGPT and Claude, these are new reasoning models. Uh the basically they they've done a few

things. They've added reasoning to it. We'll show that on the next page. But uh we add UI and persistence, uh tools, and memory. >> [snorts] >> Uh these are memory is supposed to be remembering past conversations, past contexts. And a lot of these are based on these multi-agent frameworks. Again, finite state machines and just workflow in general. Um that's what Open Claude is. You give it a task and it goes through a react type framework. Um basically respond, engage assess something else. There there's like phases to this idea. And it really just kind of uh goes through a troubleshooting uh loop. And secondly, you again you have these reasoning uh anyone here know what reinforcement

learning is or base reasoning? Welcome. So, uh a lot of these models when they started out they're just text generators. That's all they do. They just regurgitate text based on relationships to other entities. So, they've collected tons and tons of data from all corners of the world, all over the internet, and basically said, "Okay, this word relates to all of these data sources and this uh and this database, here's some URLs, here's some sources. Uh these have this score, this relationship." And it's just a huge map neural network basically of this data that's mapped all together. But, how they've actually evolved, and this is where things have really upped their game with LLMs, is this

reinforcement learning. Basically, they tell the model, "You have completed X task." Based on a whole bunch of tests that they had and and validation, it said that you gain rewards. Why the hell does it care about rewards? That is a whole other argument that we could talk about, but it does. And so, there's a lot of different ways that it can learn that. And it does it based on like outcome base. So, uh doesn't matter how many steps that it took to answer the question, as long as the outcome was right. Like, if if the outcome was right, cool. That's an outcome-based reasoning language model. If you have chain of thought, meaning the efficiency of how you thought or how

many steps you took matters in that reward pointing. Uh and then process based, though the most of these are going to be less popular. You're not going to see these as much. They are options, but most of them are going to be outcome reasoning based. And so, they just um go through these loops until inappropriate um outcome is met based on validation of tests. Um if you're not familiar with how training works in these models, uh it's basically a massive computational event that happens for weeks and months at 100% GPU in data centers. Uh and when we talk about uh training these models, it's um it's like lots and lots of energy. And like for long periods of time to do

this. It's kind of creepy. We can talk about that later. Autonomous agents, that's kind of the next phase I think. And this is where we're at with Open Claw. These are the I like to call these more of assistance than what they call assistance with co-pilots. I hate that term. I think this is the assistant. Here's that you have defined these kind of semi-autonomous processes. And Open Claw is kind of on that that page. Another great project was a slightly different. Anyone here know Daniel Messler? I think that's how you pronounce it. Follow his stuff. He's pretty damn smart guy. But he's also just really into this field and he has really good data about

it. But he has a personal AI infrastructure where he tries to mimic your brains like you. It's called Telos. It's a learning knowledge framework that tries to learn on your past experiences, your personal habits, your your thoughts, your career goals, your successes, your failures in life and it builds all this up. Kind of kind of creepy. You should You should follow It's really cool though. And but it eats tokens like a like a gumball machine. I mean, be careful if you run it cuz your token limit will be eaten in minutes. But it's very powerful. So to get started then, I want to actually leave you all with at least something that you can start with. I

have a template that I can provide as well if you're you're interested, but so there's a couple different standards. If you use Claude code, which I think a lot of people may do or or Claude in general, you can create a what they call a Claude.md. And it's basically everything that we kind of talked about. You you list out your roles and based on your own experiences. One thing that I have found if you have blogs or other documentation like a file full of old docs that you've written at work and and uh research or or papers or or whatever, uh just tell them to read all that [ __ ] and generate it for you.

And then you can modify it as needed. Like you don't have to do all the work. Like like use the tool to actually generate like, "Hey, list out all the tools and frameworks that that you see in this repository of data that we have." And then it'll just list that [ __ ] out and then you can go and modify as needed to your heart's content until it's what you want. Put in your goals. Like like actually put in your personal goals, your work goals, whatever, maybe the project goals or the agent goals. Um so if you want it to only do threat intel lookups, like set that as a specific goal because it reinforces

again the context and and the learning that it has. Uh provide tools again MCR, MCP, but if you don't have an MCP or whatever, APIs, what EDR products do you use, what SOAR products do you use, what SAS products. So it kind of has a little bit more context about your organization and what it's trying to do when it tries to come up with a solution. Who here I love this. Uh Who here has um Remember when I remember when I was saying like write documentation like incident response processes and and all that [ __ ] Oh, my bad. Uh and yeah, so uh yeah, use that. Use that data. Why not? It's already there.

Your processes are there, so let's do it. Sorry, I know I'm I'm over. I'm sorry, but uh Emerald City, so this is where we're trying to get to. Really, but no one really [ __ ] knows what it's going to be. Uh we it's all kind of a blind box right now. We're we're on this progression. Things are um moving huge. Uh so I know I'm over, but I'm going to go. Uh agent agent protocol, so if you want to leave, I'm sorry, but agent This is where I think we could end up uh is an agent agent uh communication protocol. I actually have a blog and some other thoughts around that. But [snorts] also micro data centers. Uh I

think the data centers are becoming powerful. um I don't think the infrastructure can handle it. So, I think the move to more efficient GPUs, local hosting, local Mac minis, blah blah blah, I think is a better approach long-term. And I think that's where we might go. And then AGI, which means basically fully autonomous smart AI. I don't think it'll actually happen in the next few years, but unless we tell them to not disclose. But that's it. Who is the wizard brain? Who is the wizard brain? It's a human being. Yeah, I think seriously though, like we we get to decide all of this. Like we're at the precipice of this like huge technology change. And I think the

security is paramount for this to to bulk and release some of the [ __ ] problems that we have all the time. I'm sorry to be vulgar. The mismanagement, the prioritization, how to investigate them. I think will tremendously help. But I think we all have to be on a day-to-day and get more and more efficient with it as we go. That's the talk. Sorry I went over. Thanks.

Josh Rickard - LLMs: Prompting, Agents, Assistants, Oh My

Related talks