
Thank you for the introduction. I appreciate that. Um Yeah, super excited to be here at B-Sides Kansas City. Um Uh last time I spoke here was about 3 years ago, so I'm super excited to be back. I really love the theme and all the the volunteers that helped put this all together, so it's uh super Yeah, super thankful to be here. As you said, I'm just Sam Wallace. I'm just a just a guy. I'm local to Kansas City. I'm not a vendor. I have nothing to sell you. Um I've been in this space for about 15 years now, uh largely on the security side of the software development, so um which is as you can
might imagine over the last like 3 years has led really nicely to like the AI security space. Um I've watched I've watched it all kind of unfold, all the developers go crazy. Um and I really wanted to talk about that in this talk, uh just given all all the madness we're seeing in the world. Um as a disclaimer, this is a Mythos-free talk. I won't be talking about Mythos at all. >> [clears throat and laughter] >> Uh as a disclaimer, everything I talk about is my own opinion, doesn't reflect my employer. I do have a day job. It's nothing to do with that. Um if you do see anything that looks like your production environment, that's purely
coincidental. It has nothing to nothing to do with this talk. Um so who who's this talk for? I would say um there's a there's a couple of AI talks after this one, and I think this talk is going to be a great like pre-req uh to kind of get some of the foundational concepts. So it's I mean, largely if you're at a business right now, most businesses are thinking about agents, they're deploying agents. Uh there's like third-party vendors they're working with to deploy these agents. Um and this talk is going to go into detail about that. Um and largely if you're like a student and you're learning about foundational stuff, this will kind of extend the
benefit of that. Uh And really, I mean, we all we all keep hearing about AI and force. I mean, if you if you go on LinkedIn, it's it's AI LinkedIn, and it has nothing to do with security more, it's only about AI. So, this this talk is for you. So, [clears throat] if we if we kind of think back on like every like generation of technology that comes out, you know, we we see that every every every technology is like forced upon us, and they kind of get that free pass initially. So, if you look at like the web applications that came out, we had all the cross-site scripting, we had the SQL injection type vulnerabilities,
super prevalent. Then we had the cloud. The cloud came out, it was here to save us. Everything got shifted to the cloud, all the cost savings that never actually happened. All the security issues that we're really still dealing with that. And then we had the the containers come out, right? And we had all of our code within the runtime is now in a container. Everything is running as root, all the dependencies, all the issues there, right? And now where you are today as you sit here is you're like in this AI agent world where we have these new AI agents we don't necessarily have all the security controls we'd like, we don't have all the monitoring we'd like.
However, the business is going to do what the business does, and they're going to ship it, right? So, that's where we are right now. And so, here's like a clear delineation between like what a chatbot is versus what an agent is. This is This is really the core difference here. The The big thing here is the consequences of getting an agent wrong is is much different. So, we look at like the chatbots, that's like you you you have this user interface, you do the chat in and out, right? It's It's very basic, nothing too crazy, there's no agents involved, just like pure inference. And the worst things that come out of this is really like, "Hey,
we got a new system prompt, and nobody really cares." Like Opus 4.7, system prompt, is on GitHub, nobody really cares. If you have a a chatbot out there like for your business, someone gets your system prompt stolen, nobody really cares, right? So, that's kind of like that world. >> [clears throat] >> The The second world here is what we're going to talk about, which is the agents. The agents have API access, they have full autonomy, they have network requests they're making. These agents are acting on behalf of users, and the risk here is is far far different than your your traditional chatbots.
As a as a real-life example of here of an agent kind of gone wrong, um the state of Utah, they had this great idea to have an agent sort of handle prescription refills. So, not like an not like a doctor AI, but like just like the prescription part. And some security researchers looked at this and they said, "Well, we were able to sort of manipulate the prompt in a way to like triple dosage. So, they were able to like feed it new updates, and it were it would trusted those updates from the user. They were able to triple the dosage. They were able to reclassify methamphetamine as a safe treatment. Again, with careful prompting, careful
manipulating the agent. And And then And lastly, they were able to manipulate like what the agent thought was was true, and sort of like get it so have misaligned vaccine like conspiracy kind of claims. Yeah. So, I'll I'll quote Aaron here directly, he said, "These targets are some of the easiest things I've ever broken in my entire career." So, thank you for that. So, I would say agent security, these are like the main three pillars. We're going to go through each of these. We have the guardrails, we have the identity, we have sandboxing. We're going to Let's go through some of these.
All right, so guardrails. So I know I know we've all heard of like jailbreaking and like prompt injection and like there's like this new world of like AI specific vulnerabilities that exist. There's also like a new control mechanism here called guardrails. So think of guardrails as as kind of like a web application firewall. Well, it works most of the time, but it's not perfect, right? It's not a silver bullet, doesn't block everything. The intention here is whenever input comes in before it hits inference, it goes through a policy-based engine and it has that, you know, thumbs up, thumbs down, or monitor this or monitor that kind of thing. So the it gets inspected on input
and then right before it goes back to the user, it gets it gets inspected again and get with your policy engine for like that output validation.
I think guardrails been largely I mean getting people to understand like this new like language model world is a little bit different. I think a lot of that is rooted in what we're we've been taught like in school, we've been taught at like previous jobs. Um regexes don't work. Hashes don't work. Signatures don't work. Um all these like rule engines that we really like from other tooling they just don't work. Um that's because these language models it's not about exactly what you're saying, it's about the intent of your message. There's this new class of vulnerabilities where you're sort of manipulating language models to do things. Um so you can't you can't really write a regex for that and that's why that this
new guardrails engine is super critical to kind of fill this gap.
And to delineate here, there's there's inference which is like that auto complete engine that we're all familiar with. This is just like the in and out of the model. And then guardrails itself proper. So for guardrails there's there's open source solutions out there like Um, you if you've ever played with like YML, they got a guardrail section and they have like a laundry list of like uh guardrails you can try out. Uh, [clears throat] but you can actually like run your inference in like Azure Open AI on their AI service as a AI service and then you can run your guardrails like out of AWS if you wanted, right? So, these aren't necessarily coupled. Um,
often they are coupled and they kind of sold together as an easy button like, "Hey, just add this to your Terraform and get free guardrails." Uh, but you can separate these entirely.
And this is as you kind of look at guardrails, you're going to find very quickly that they're going to force you to be a little bit more social. They're going to force you to talk to the other side of the business. A lot of what guardrails has to offer is that there's like the compliance and there's like the quality of the response. You have all of the normal like denied topics like, "Maybe I don't want my company agent to talk about like my competitor. Maybe I don't want my agent talking about bombs. Maybe I don't want my agent talking about whatever, right?" Like that's not necessarily a security thing. However, these uh guardrails configurations have that kind
of all built bundled up in one config. So, of course they have like I would I would call it guardrails has like a subset of security and it's primarily not security related. This kind of how I feel about it. And here's kind of a example of like the different like things that guardrails have. This is just from AWS has guardrails and again I'm not affiliated by AWS. This is just a popular one. Uh, they have guardrails if you're running like a agent core runtime. This is an option for you. So, as you can see I kind of labeled these like, "Hey, this one like content filter is more about safety, AI safety than it is security."
Or denied topics is maybe about your business. Um, of course we have a security one like maybe my agent shouldn't ever return like PII or PHI data. We must have fully outside of scope of agent. So this is this is really like business specific stuff and you're going to need uh you're going to need input from the other side of business to really implement this correctly.
Um so when it when it comes to agents, one of the critical parts of agent and I'm sure you've seen this like everywhere is identity. Um that's because these agents do have that autonomy, they do have the ability to um interface with like third-party APIs, databases, or whatever other tool calls that you're intending to implement. There's largely like two like camps, I'll say, two patterns. There's the top pattern, which is the service account pattern. This is like the most common pattern that I've seen. This is giving your AI root. Um your AI has like a like an administrative level key, has full access to this API, and then all your tool calls um are using that API key,
right? This is kind of like it it works really fast on your local laptop, it gets deployed, right? This is just going to be that frictionless experience that uh developers are expecting. The uh the bottom pattern here is like token delegation pattern. Again, a little bit harder to get right. Um the core difference here is instead of um giving your AI like a full admin access, what you're doing is you're actually using the user's um token and acting on their behalf. >> [clears throat] >> A good example of this is like if you're using like any like chat products, um and you want you click like, "Hey, I want to connect to my Google Drive."
Um like Interop doesn't have access to all of Google Drive. So when you give it access, it's going to be acting on your behalf using your permissions um and making and facilitating on your behalf.
Um even with the token exchange, there's like I want to say like two like there's like two ways to do that as well um um with this flow. So, the uh there's impersonation and there's like true on behalf of token exchange. So, for an impersonation, this is where uh you get the JWT or like the user's token and you're passing that directly onto that third-party API. >> [clears throat] >> Uh what what you're losing here by doing this method is that API has no idea it's coming from the agent, so you lose that uh audit log the audit audit trail there. Um so, you're probably dinged on the compliance side. Um and and then on
the token exchange like true on behalf of, what is happening here is the user goes to your identity provider, they get a JWT, the agent then exchanges that for a a new token. This This token is going to be the claims will be scoped down to just the tool calls expected. Um it's going to have a shorter lifetime, so it's not going to have a full lifespan that your the user JWT have. And then that API also gets the ability to see like this is a action um from this agent that on behalf of a the user to you of the full picture there. >> [clears throat] >> Um last slide on identity and kind of
where I see it heading is there's a new RFC called uh cross-app access uh largely led by Okta. Uh it's still in draft, so it's like you know, it's still pretty new. Um this may not be like the RFC that everyone's like going to abide by, but I think the the principle here will stay. In short, um imagine you're on the enterprise and you want to roll out like an agent solution like I want to roll out like Anthropic like enterprise to everyone. Uh well, everyone when they connect their third-party apps, they have to like consent consent consent, and everyone absolutely hates that, right? What cross-app access is intending to do is you have you have a
enterprise level settings where you like pre-consent for all your users, and you let them know what they have access to, and they don't have to go through that flow. So, I would say if you if you hear that, keep an eye on it. I think um it may not be exactly this, but that that that principle will uh will surface will surface.
Okay, so the the last topic is uh sandboxing. Um so, really what what sandboxing is the we still have like that Everyone's [clears throat] figuring out agents. Everyone's deploying agents. And what they want to do is they just want to deploy it as is, full file system access, full network access, full admin key, right? Like this is just This is just the easy way. This probably reminds you of like your cloud days. This probably reminds you of like when we first started doing web applications, containers. We just need to get it working. We need to prove the value, right? So, this is like what developers want to do. Um and then like on the right side, this
is like what we're going to be talking about the next few slides is um we're going to have like virtual file system. Um we're going to we're going to use least privilege account. All that good stuff, right? So, this is like a common theme of like new technology comes out, rapidly deploy it, and then kind of figure out security as we go, right?
>> [clears throat] >> Um so, when it um when it comes to sandboxing, I'm going to cover tool uh tool design first. This is like a super like critical super critical to the next few slides here. So, um you you can't really do sandboxing unless you design the tools in a way that you're able to. So, the the top here in the the red is a tool call. It says execute super query. It just takes any query as a string, and it's going to run it, right? This should immediately have like red flags going off in your brain. This is like single injection all over again. Like single injection as a service. It's not even a security
vulnerability. It's it's like intended design is to run any signal like code and just let the user do that, right? This is really good tool call for like infinite like infinite capability. And when it comes to tool call design, there's a few um I think I've got a link to the bottom of the last slide, but there's a there's a few like like white papers kind of out there and talking about tool call design. But the the short of it is that your your tool call design should be intentional. So we look at get customer orders. We take a string. This is like a predefined flow. It's purpose-built. It's scoped. And there's not like a lot of
flexibility here. And when I search products, and then look at expect it for It's really like a lot of these lessons we're having to like relearn from like 2005 kind of like again in a new age of way. It's It's going to remind you of like parameterized queries if you're familiar with that. >> [clears throat] >> Uh with sandbox, I think there's there's like two There's like two main patterns here that is pretty common. Um There's uh pattern one where the agent is living in the sandbox, and then there's pattern two where you're actually sandbox sandboxing the tool call itself. So I'll I'll cover these. The first one, pattern one, very very simple. Um your
agent lives in its like basically its own container. It's own little world. It's access to um It still has outbound network access, but it doesn't have any like any access to your network. It's fully containerized. It has its own virtual file system. Um the problem with pattern one largely is that you you do have to have the actual API key inside that sandbox environment. >> [clears throat] >> This is because for your agent to actually be useful and to make those tool calls, that API key must be accessible if your third parties are depending on that. And then on on pattern two, um so this is actually sandboxing your tool execution itself. So this is going to
have a concept of a host and then like a a sandboxed tool execution environment. So your API key is able to be separated living on your host and then not uh used directly in your like compute environment. Uh, pattern two doesn't really make sense. That's why I have another slide. I'm just going to go over two um to kind of kind of walk through it. Um, so like like for pattern two, um so if if I'm if I'm saying I can't give my tool execution an API key, how is that going to work? It it needs to reach out bound and it it needs to make those network requests. It needs to be useful to my user that's using
consuming my app. Uh, how you should think about this is you're going to have host side tools and you're going to have some tools that you're able to sandbox. So host side tools is going back to that uh the good that you need to you need to design your tool calls in a way where you have like the get customer ID and I have an example here that they're thoughtful, they're intentional. Uh, there's not a whole lot of abstract going on either. It's very structured. And then on the right side, your sandbox tool, these are very like flexible tool calls that really let you run like arbitrary code safely um that make your agent useful. So imagine
imagine you get a question inbound from a user and they want like they want some amount of data but they want parse it a certain way but they're making a deck for presentation, you might do a host side tool call where you get all the data and then they want to organize it a certain way, you pass the data to a sandbox tool. The sandbox tool doesn't have any network access, doesn't have API keys, it doesn't have anything. It's just an ephemeral like execution environment. Takes the data, um it's going to take the model's output, that Python code or whatever it's going to run, it's going to manipulate the data and then return it back. So
in the [clears throat] event that the um the user is able to manipulate the model to like generate some bad code. There's like there's no exfil opportunity here. There's [snorts] no it won't There's no ability to get out to the internet. Uh there's no network access here. So, um this allows you to like build better to provide a sheet to really know like what your agent is going to be doing. You have a thoughtful design behind it.
And uh lock largely um when it comes to the the agent world, um like what you're learning in school if you're a student or if you're um or like you're early on in your career or whatever, it doesn't really matter. Um don't necessarily need to start from scratch. A lot of these concepts are you're going to you're going to sort of remember and they're going to they're going to rhyme, right? Like we we talked about like how these new technologies come out. If this one feels the same to me as like going to the cloud or they're like forcing everyone to use containers. Um so, what you need to do is think about it that way.
Take these skills like your traditional application security skills and apply them to this like agentic world and it'll it'll make a lot more sense. So, when you think about like SQL injections, think about like prompt injections. It's it's similar enough. Um another one that resonates with a lot of people is like the web application firewalls like WAF. Very similar to a firewall, right? It's not the silver bullet. It protects a lot of stuff. Not perfect. Um and then the uh parameterized queries. So, like whenever you build your SQL queries, that's very similar to thinking about tool code design. So, this is like working with developers to get them to understand that your tool your tool code design um matters and and
being intentional about that allows you to sandbox your execution properly.
>> [clears throat] >> Uh so, here's here's like the five like the five key takeaways um I would say when it comes to like like working with your developers and then get like get getting them to think about like their agents a little bit different is really like um the the agent AI stuff is definitely different. Um it's it's a new threat model for sure. Um it's way different than chatbots. There's much higher level business risk. A lot of these agents are like they represent your company, they interface with your customers directly. So this is a certainly like high risk and you should be thinking about you should be thinking about identity, you should be
thinking about guardrails, um and sandboxing as well. For two, uh guardrails. How I think about it is it's kind of like a laugh, but um really it's going to force you to be a little bit more social with the other side of the business. It's probably you're probably going to start thinking about like like responsible AI and safe AI. Um in that kind of world, there is a subset of security and guardrails. And then um identity I I identity is like is is really easy to get wrong, so we talked about like the service account pattern um and then we talked about like if you can get your developers to avoid impersonation and do a true on behalf of
flow, which is really um critical for you. Um four, we talked about sandboxing. So this is like write your agents in a way where you're even able to sandbox your tool execution. So don't um I mean don't write tool calls in a way where um like every every tool like every every tool call shouldn't require an API key, it shouldn't require network access. You should be able to split the responsibility of tool calls that need that level of access and then tool calls where you're able to have them safe um executed environment where you can run random Python code from your model. And then uh five here, um this this is not really like defense-in-depth. I
don't think if you get any of these wrong, uh you just completely lose that protection. So there's like you you really need to think about like these um these are like individual pillars that don't necessarily like protect you if you do just one of them. You really got to do all three of them just right. Um or to put simply uh stop giving your hands a rest. So, yeah. That's
Happy to