Red Teaming Reimagined: War Stories, AI, and Innovation at Scale

Name: Red Teaming Reimagined: War Stories, AI, and Innovation at Scale
Uploaded: 2026-02-27
Duration: 1 h 3 s
Description: Join Evan, the Global Offensive Security Services Function Lead at Mandiant (Google Cloud), for an engaging exploration of captivating red team engagements, the integration of AI in offensive security workflows, and cutting-edge TTPs that attackers are using. Gain unique insights into balancing hand

BSides Prishtina · 20261:00:03111 viewsPublished 2026-02Watch on YouTube ↗

Tags

TeamRed

StyleKeynote

About this talk

Join Evan, the Global Offensive Security Services Function Lead at Mandiant (Google Cloud), for an engaging exploration of captivating red team engagements, the integration of AI in offensive security workflows, and cutting-edge TTPs that attackers are using. Gain unique insights into balancing hands-on technical innovation with managing a global business unit at Google. Whether you're a student breaking into cybersecurity or a seasoned professional, this keynote promises actionable takeaways and behind-the-scenes perspectives on offensive security at scale!

Show transcript [en]

Uh good morning everyone. Uh thanks for having me here. I'm going to talk about red teaming AI uh and how we innovate at scale. Um one thing about me, I'm Evan Pñena. I'm the global red team lead. So offensive security services for all of Mandian Google Cloud uh worldwide. And I uh I've been at Mandian for like 12 and 1/2 years now. Uh so a long time and I started when we were a private smaller company back in the day and Alexandria, Virginia, which is just outside of DC. And I was there for a few years before I went to Los Angeles, California, which I was there for a few years as well. And now I'm in Texas, San Antonio, Texas,

which is my home. And eventually I kind of made my way back back there. For those of you familiar with the US, uh, I've done a lot of things at Madant. I' I've run multiple regions from a delivery practice perspective. So I I ran the West Coast red team, I ran the central red team, I've been in multiple regions, and now I run the global team, which we have over 205 people worldwide only on the red team alone. So it's it's a large scale team. So it's it's there's a lot of challenges that come with that of which I'm going to talk about today. And uh but it's really fun and we get to do some really cool stuff which I'll

also talk about. We're going to give you some real world use cases about this as well. Uh I do want to thank you all for having me here. This is the first time I've ever been in the Balkans. It's a very different experience. I was in Albania before coming here and then now here in Kosovo. Um all of you are very been the people I've met so far have been super friendly. So I appreciate that. You you make us feel very welcomed here. So, I I really thank you for that. And uh thank you for for hosting us as well, Darden. Um even when we came in yesterday, I was with a colleague of mine and we weren't able to check into

the hotel just yet and we were able to work out of Darden's office and that was really fun as well. So, we were there like half the day working. This is actually why I finished the presentation. So, um again, thanks for having me here. So this is not a slide to explain how awesome Mandant is for those of you who have heard of it. The reason I want to mention this slide is because we do the most investigations in the world. So like real incident response engagements uh we do probably the most we do thousands of these we've done over the years. And the reason that's important is cuz we actually get to see what real threat actors are

doing. And for someone like me who's a redte teamer, that's really fun because I can see what they're doing and just use what they do as well. So it's almost like copying off your friends paper sometimes. It's you get to see like really cool innovative work that a lot of these threat actors are doing. Think of Iran, North Korea, Russia, China. Like a lot of these nation state threat actors that have a lot of resources. That's really cool to see because it can allow us to innovate as well and be able to give to the community. So what are we going to talk about today? Uh the modern threat landscape. So what we're seeing these threat actors

doing, a shift in cyber security with AI, uh red team workflows using AI, some use cases and some takeaways from those use cases, and then death by demos. I'm going to show youall so many demos. I'm one of those guys who like I don't like to just talk about theory. I want to show you how it works. So I'm going to show you all how it works today for a lot of these things. I I think that's usually more fun. So, first off, what are we seeing adversaries doing? Like what are their motives? Data theft is a big one generally for espionage of some sort. And the next one is financial gain. A lot of ransomware. Uh some

payment card, but not as much as ransomware these days. It's a lot more uh smash and grab, get some money. What are the initial vectors that we're seeing? So, how are these attackers getting into networks? Historically, it's always been fishing. That was like the number one way of doing it. Now it's shifted to exploitation and that's kind of an interesting shift. So like this was from our entrance report last year. We're about to relieve our we're about to release our mrens report this year like literally by probably RSA. So here in like a month maybe we're going to release it. So I can't release the numbers for the most recent reports because it's not released yet. But I

will say it's very similar to what you're seeing here. Uh but for last year's report it was the exploitation thing was number one. Hopefully this works if I move. Okay. Um so these are the two most recent ones that we've seen. You've probably seen this in the news. This is something that has been going around right now like literally within the past week even. So you have Avante which originally was a denial service attack a vulnerability that we were seeing attackers exploit. that's then now shifted to remote code exe execution. So this allows attackers to get access to internal environments from the internet who for anyone who's vulnerable to this. Now we did an investigation. We found a lot of this

out. There's been a patch release for this. But for those who haven't patched yet, this is a Chinese threat actor using two variants of malware, Trailblaze and Brushfire. They're going ham on this now that now that they the world knows that that that this exists, they're just trying to exploit as much as they can and get access to environments using this right now. So this is currently a big issue. The next one is Juniper routers. There's this one UNC group which is also a Chinese thread actor that is using like uh these Juniper routers to get access to internal networks as well. And this is another one of those situations where there's been a patch release, but for

those who have not issued this patch, for those of you who do have Juniper routers, make sure you you consider this last point here of the JMRT tool that allows you to remove this issue. Um, but this is another thing that we're seeing. So, the the point of this that I'm trying to say is these were zeroday vulnerabilities for a while and they were using zero day exploits to get access to this for quite a long time until we discovered them and then released them to the public. So, that's difficult to protect against. There's not a patch for a zero day because you don't know it exists. Same thing with like supply chain attacks. So, the the

point is they're likely going to get in through some memes. So the biggest note to take from that is to reduce the impact of a breach that could potentially happen. [sighs] [snorts] So we do see a shift in AI right now. We see attackers using this. This is actually really cool because now we work at Google. We're able to pull a lot of intel from our Gemini team to see actual threat actors using Gemini for abuse and like how they're trying to abuse it, how they're trying to use it for uh scaling their attacks and essentially being a force multiplier for their teams. And so they're doing like fishing lures, they're creating scripts at scale, uh they're researching

infrastructure and targets using it for open source intelligence. So we as a red team are trying to do the same exact thing that these threat actors are doing. And again, we're seeing a lot of the same threat actors like Iran, China, North Korea, Russia, uh, doing these sort of things with AI. So, one of the one of the things that we're also seeing are deep fakes. Has anyone actually seen a real deep fake before in real life? A few of y'all. I'm surprised. Okay. All right. There was one uh case where one person was able to use a deep fake to actually get $25 million from a company out of Hong Kong. It was pretty crazy

story. uh a lot of money. So, they're pretty impactful. And now with AI, you can use these to convince people to do whatever you want because you're trying to impersonate someone that is very reputable, like an executive, for example. I have an example of this here. Uh I don't know if I have audio, though. I may not. This is the only one that actually needs audio. It's kind of funny if I'm able to get it with audio, but if not, that's cool, too. Let's see. if that works.

All right, here we go. Wish me luck. No, we're going to try one more time. And if not, that's cool, too.

>> Hey, everyone. >> Okay, you could hear it here, but that's okay. and no audio. Let me know if I need to try a different way later. But essentially the the the point of this is to show you I did this in like maybe 30 to 45 minutes. I trained a model to impersonate his voice. The voice is like really really close to what his actual voice is, if not almost identical. And then the video has a wave to lip model that's out there that you can use to like impersonate his lips. And so I essentially have him saying that I'm a badass and listen to me and to buy me a drink later. Uh but it's

really convincing and it's kind of funny. So now I want to show you all how we're using AI for good. Um for security teams, whether you're a blue teamer or whether you're a red teamer, you can use this on both sides. On the lefth hand side, you'll see a bunch of different ways you can use it on the blue team side like thread intelligence, hunting, uh analytics, log requirements, quering development, and log analysis. especially like SIMs, you know, like a lot of some of AI is being built into SIMS like we have Google sec ops, we built that into our SIMs, so you can run queries or ask it to generate a query for you based on data that you're

seeing. Um, on the red team side, there's a lot of different things you can do like knowledgebased interaction, uh, playbook automation, infrastructure automation, source code analysis, reporting, etc. I'm going to go through some of these today, but just note that it's not just a red team use case. This is something that we can use at scale and it's really the only way that we can continue developing at the same scale that these thread actors are developing. Uh so in my opinion AI can be a great tool to enhance your workflows if done properly. [snorts] So for thread intelligence, I'm not going to go through this wall of text. I'm going to show y'all probably one or

two of these. Uh one of the challenges I see with thread intelligence is is too much data from a lot of different places. You can get them from news platforms, social media platforms, um, pretty much anywhere. And you have some authoritative sources like MITER or NVD. But they to answer that problem is routing all of this unstructured text through an LLM to extract some of the insights and then provide you the most valuable insights of all of that at scale. Uh, another one I want to point out is number four, which is real-time data that actually matters. There's a lot of stuff that gets released that doesn't matter. But if you continue to integrate into sources and

systematically search all this and have references, you can solve some of these problems by identifying like what is a threat actor, who is that threat actor, what are the malware families, what are the vulnerabilities. And I'll show you exactly how this works with this one cool tool called Mallerie. This is probably one of my favorite. So like we have a thread intel tool at Mandian and it's awesome, great, but we also don't have a lot of different sources that we pull from. A lot of what we pull into our thread intel, some of it's social news, some of it's obviously a lot of our instant response engagements, but this one is called Mallalerie aggregates a lot of this data from like over

hundreds of sources and does a pretty good job at structuring the data. So, I'm just going to show you all a tool in action. So, it has a dashboard. You can have like what are the most recent actors? This was April 11th, literally like the other day. Uh daily bulletins that you can pull in what are the top vulnerabilities today. You can go into the CVE and identify like what are uh who are the threat actors, what are the vulnerabilities associated with it, what's the CVSS score, what are current public exploits associated with that specific CVE. And this one's literally like just this week. So this one's really important and it shows up on the

day of the bulletin. But if I want to learn more about it, you can talk to the data. You can talk to the intel. So I just threw it in the chat like, "Hey, you know, threat intel bot, tell me more about this CVE. what do I need to know about it? Should I be worried about this? And it's going to think about it for a second because it's obviously using AI to kind of like understand it and go through like the hundreds of data sources that are grounded because everything in this specific platform is ragged. And it will tell you this is exactly what the CVE is. These are the affected versions associated with it.

This is the exploitability. The score is pretty high, so you're likely to get exploited if you have it. And it kind of gives you an ident. I really like those sort of things personally like from a threat intel perspective. I think that's really good. Um then the next one I want to show yall is a rag example. Has anyone ever heard of rag retrievement augmentation? Yeah. Okay. I feel y'all. Uh so this is something that AI uses to ground sources. So, for example, if you just ask Gemini or chat GBT or whatever it is you want, like, hey, if I want to go get a drink later, what's a really good drink I should get? And it's going to

say like, you should probably get a Golden Eagle. And that's going to be specific to here, right? But if I'm in Texas, for example, it and and it knows my preference of drink is, let's just call it an oldfashioned or something, then it it should know that about me, right? So, if you have a source that is grounded and it says like, "Okay, now I know that he specifically likes an oldfashioned versus a golden eagle, uh, then it's going to say like, "Okay, I think you should get an old fashion. You should probably get it from this place cuz that's the best place you can get it." That's grounding information. It has a reference to something that it

should know. So if I have a red team 200 plus people worldwide and I want them to follow a specific methodology I want them to understand if it's going to ask a question like hey how do I do a port scan or how do I do a host discovery scan basic examples like that it should know to reference our methodology not just some unstructured data or from the internet or some random stuff it's going to say like run a sin scan or something like that but that's not how we do it so I want to give you an example of this so in this specific case I'm using Vert.Ex AI cuz obviously I work at Google and it's just the

easiest platform I can use to show you this example. Um, but you can do this with any AI cla chat GBT whatever it is you want. So apply the same thing. You have a grounding source here to rag. So in this specific case I'm going to hit rag engine here and I'm going to choose a corpus which in this case is my PF wiki which is proactive services wiki. So think of like any methodology you have. Our methodology is structured in markdown. Structured data is always better for AI. So if you have any structured data that is which is I would recommend if you do have a methodology having it structured somehow you ingest that into a rag engine as a corpus you

save it and then now I can ask it any question I want. So in this case I'm going to ask it a specific question. I'm going to say, "Hey, how do I perform a host discovery?" Probably can't read it. Says, "How do I perform a host discovery scan for an external pentest?" And because I grounded the rag, because I'm using my specific methodology as the grounding source, it's going to pull from that source and answer the question. So, it's going to say, "This is exactly how you would perform a port scan." And it's going to even give you the exact commands that we have in our methodology. So, if you're a new red teamer or anyone, again, this can apply

to any security team. If you're someone new and you want to follow a specific methodology that's approved worldwide for that specific company, you want to be able to reference things that are grounded and actually useful. So, I don't want someone to run some random port skin they got from like the red team handbook or some sands course that they got. I want them to do the ones that we approved, right? So, if you're trying to scale this again worldwide, this is a really good way of doing it. And so it's going to actually use the port scans that we approved and the ones that we would recommend. So there even the cool thing about this, it even shows

you how to GP out the hosts that are live from your host discovery scan. [music] So it gives you like the actual GP command. So now I have a targets up text file list that I want to pull from. So what if I'm still new and I don't know what the hell I'm doing, right? So I'm going to ask it like, okay, what do I do with that text file? What what what should I do next? And it's going to say, you should run a port scan. I'm like, okay, cool story. run a port scan. I don't know how to run a port scan. So then you ask it, how do I run a port

scan? And then it's going to tell you this is the exact command that you would use to run a port scan. And again, it's based on our approved grounding sources, which you can even see it even references our markdown files of where the actual methodology is. So that's great. So you actually have a reference to that as well. So for me running this team and trying to get consistency, this has been a very useful tool. we actually have it into an application now that people can access and authenticate to and that's what we found to be probably the most useful when it comes to scaling our methodology. Um, so that's one of the things I want to show you for for

ragging. Anyway, let me go back here. [snorts] All right, hopefully this starts from here. Uh, so now how am I going to do this further to enhance our workflows? And I always have to have a meme. It's not going to replace you, but it will probably just make your job better. One second.

First thing I want to mention is you heard me mentioned earlier structured data. Like the markdown example I gave y'all, structured data is the best way for AI to interpret anything. And so if those of you who haven't heard of guardrails or pandic pedantic, these are really good open-source libraries that you can use to structure all of your data. And you can kind of see some examples here in Python that we use to structure someone uh some some data. So, for example, if you have a prompt like, "What kind of pet should I get and what should I name it?" You don't want it to give you like, "Oh, I think that the golden retriever

is the best one because it's so beautiful." But then it kind of sheds. I mean, that's a lot of data, right? Like that's a lot of text that I just said right now. And but that's probably what an AI is going to give you unless you tell it I specifically just want the pet type and why. And it like explained like a very specific answer. So in this specific case, if you use those two libraries, which I would highly recommend if you're doing any sort of like automation with AI, uh you want that structured data, uh you can and this is like a public one like that. Again, it's on their on their documentation. You can say like given

the below, you know, again, what what kind of pet should I get? It creates this entire prompt for you, which is great because it's going to be very specific and it's going to get you the output that you see on the bottom, which is the pet type dog name buddy. That's all I want. So, if you're programmatically doing things, you're going to probably just want very specific answers and data outputs. So, we like to use those two libraries to help generate these types of prompts to get structured data in and out of the AI model. So, let's see this uh in real life. So, one example is credentials and files. How many people have mounted SMB

shares or NFS shares and had to go through thousands of files and look for that one password that's laying there? There's actually a funny story. One time we actually had to go through hours of videos and we saw in a video someone actually pull up a password and we pulled it from there. But that took a long time because we weren't able to find anything in files. So how do we scale this? Right? There are open-source tools you can use like truffle hog, nosy parker, snapper. What about Gemini? Can you use Gemini or or chat GPT or claude? You certainly can using what I just described uh which is again you know like with guard rose and pedantic. So

you can use something like okay I want three fields. I want a password, a username and a domain. And then you have a prompt and you the prompt essentially says I want you to look for passwords in these sensitive files. I'm going to give you like th thousands of files to go through and I want you to specifically look for these things and if you use guard grows and pedantic it's going to structure that prompt for you to give you a very good one and give you an output that's very specific. When I ran this this was kind of the results cuz we also compared it against truffle dog truffle hog and nosy parker. The true

positive rate was better with Gemini so using Genai but the false positive rate was a little bit higher. Um, but the biggest con that we saw was that it takes a while. If you're going through hundreds of files and it's using Gemini, it's calling out to the internet, it's parsing all that data, it takes a little bit longer versus like a local tool that's on your actual system going through all of that. It was much quicker. So, personally for this this use case, credentials and file, I personally don't prefer using AI and I would rather use truffle dog or Nosy Parker if you're doing this at scale. So, that's one use case where I didn't

think AI was great, but you can probably supplement some of that with AI. Let's go through another one. Let's go through Blood Hound. Anyone use Blood Hound before? Everyone familiar? For those of you not familiar, I'll give you the TLDDR. Blood Hound is essentially an active directory configuration review tool. And it looks for a lot of different paths to go from like a standard user to domain admin. and it looks for any misconfiguration that you may have in your active directory domain that would identify any sort of path that would give you a path to domain admin. So that generally comes with a lot of different data that you have to interpret because you had to pull all of

the user groups, all of the usernames, all of the computer names, you have to pull like a lot of different LDAP queries. And so you also want to look for high valued computers or ver really servers like SECM servers, WS uh servers. These are things that will like send out patches to a lot of different systems. Your email exchange servers, jump host, domain controllers, things like that. So sensitive servers on your environment, we generally look for. So if I created another, you know, again using the same exact thing, right? Um I want to look for the name and the reason why it thinks that that's specifically important. And my prompt is going to say like, hey, given all of this data, which

is generally blood hound output and all the LDAP data, I want you to go through this and identify sensitive servers that might be useful to me as highv value targets. And so I went and actually did this with Gemini 2.0 and I used Pro because it gives you a large context window. What does that mean, large context window? means that you can provide it a crap ton of data and it's going to handle it all versus just using flash as a smaller context window probably won't handle the that literally thousands like in this case I had over 10,000 systems thou like hundreds of files and it was able to handle it all just fine and it actually came back

pretty good like if you look at the naming convention srvdc it says the reason why that we gave you this one is because it's likely a domain controller because it has DC in the naming convention If you look at the backup system, it had BAC as right here as you can see SRV BAC 03. So it actually did a pretty good job at identifying these these systems as high value targets. So I actually thought this was a really good use case where AI did a great job at helping me identify systems at scale. Now for those of you red teams out there, you've probably done this manually before. I know I have. So this I thought was really

useful and it saved us a lot of different time. Uh the next one is user clustering. So this is one of those how many times you've been in an environment where they have segmentation multiple domains and there's a jump host and a jumpost is going to go to the other domain that has all your sensitive systems and there's two naming conventions you have like evan.pena and then you have like evan.pñena/admin and that's my admin account versus my user account. You see this pretty commonly in domains uh or environments. So, if I want to crossorrelate who's what or if one specific user has multiple different accounts, usually you have to go through and like have special

GP commands or you have a a click script to try to identify this through a file that has over 10,000 users or whatever. But you can probably use AI to do this as well. Again, I want the accounts list with a description. And this is the prompt I would use essentially asking it to do that. Hey, given this 10,000 plus user list, I want you to identify the different clusters that you may find where there's correlation between a user and another user type account to see if that specific user is someone of high value to me. Is that someone going to have an admin account, an exchange admin account, whatever. So, in this specific case, it did a pretty good job. And I

know this sounds weird cuz it's like ADF-l last. What that means is FI is first and then LA is last. So kind of like the first two letters of the of the first name and then the last two letters of the last name. And then that other one had a df. So that's the ad and then the first letter of the first name dash last name and then first name. So this one specific account for example had three different accounts. Uh so I thought that was a very useful you know account to target. So this is kind of like the TLDDR of the results, but it actually did a great job at identifying all the different accounts that had

different types of accounts. So I could target high value people. So these are going to be people that will likely have an admin account and get me access to either the cloud environment, backup systems, hypervisor servers, things like that. So it makes your it makes your targeting way more efficient than just kind of going through and trying to get access to Dell different systems uh that may not have useful credentials. And then the last one is if I know what targets I want to like the the the accounts I want to target, how do I know what systems they're on? Because you're not going to want to just smash and grab a bunch of different systems. So like it

would be great if I can coordinate this system. So like this one user that has an admin account I know is on that system. So I want to target that system, try to dump credentials so then I can get access to his credentials and continue that cycle, right? And so in this specific case, Sharpound used to be great for this, but these days you can't really run Sharpound without privilege accounts. You used to be able to do it with a standard domain account. So what is a good way of doing this? You could probably use Gemini to do this as well. And so in this specific case, I wanted to pull a user, computer, and a reason.

And sometimes the reason could be that the the name is in the computer description. Sometimes the name the username is in the actual naming convention of the computer itself like Evan- Lenovo 2 or whatever right and that Evans part is tied to my username. So it's Gem and I can go through the thousands of computer names and the thousands of usernames and coordinate all of that to give you an actual understanding of which user likely will belong to what system so that you can target those users. So now you have a list of highv value user targets and you can probably identify what computers are on. So you got Gemini going through like a gigabit of Blood Hound data to try to

identify these paths, right? And it actually did a pretty good job in this case. So in this specific lab environment, it thought A would be associated with laptop 1, B to laptop 2, etc. And we ran it against a few other environments that don't necessarily have this. We had like Evan- you know computer 1 and then it coordinated that with a Evan username. So it actually did a great job at this. So the overall results, credentials and files I thought was really slow. So if you're mounting SMB shares and going through those, I think those open source tools are probably better. High value targets, user clustering, and user computer correlation was great. I think that's a

really good use case. If you do exactly what I described, you're probably going to really enhance your workflows and become way more efficient with it if you do it that way. So that's kind of the results of incorporating that into our workflows. So far at least we have a few quite a few other workflows that we're doing as well. But these are just ones that we find very often with post exploitation. So now let's talk about application security. For those of you software devs in here, you're probably going to find this part fun. Uh web application hacking. There are so many different use cases for this. I'm not going to go through all of these, but

basic code understanding, vulnerability detection, reverse engineering the source code, static analysis, uh payload creation and testing and directory enumeration. These are all good use cases for using AI to help your web app assessments. And you can do basic prompts like, hey, explain what this code snippet is uh for the code understanding. Vulnerability detection. You can say, hey, look at these get and response requests and identify crossite scripting or SQL injection. Um, you can help craft payloads, etc. So, I want to show you a demo of us doing this at Mandant. We created this tool. It's internal OI. I'm sorry. I know it sucks. We haven't open sourced it, but you can use the same idea and apply it to your

own methodology. So, in this specific case, we have a burp extension called Note Burp LM. This is a pun off notebook LM for those of you who are familiar with the Google Notebook LM. Um, and this is pretty basic, right? So you have in this case a few let's just look at like number 233 down I think right here you have a post request right that's a user login and that's generally going to be any post request that has a login right generally going to be a login and so if I want to pull the previous five requests for example and that post request and I want to identify like what is it actually doing you usually go

through the different requests and try to understand it on your own well in this demo I'm going to show youall how we can do it with AI. So, I want to pull like the previous five requests to that one post request and maybe the one after. I'm just going to send it to the extension that we created, note Burp LM, and then I'm going to go to the extension tab. I'm going to select all of those so it understands that the context that I want to refer to are all of these requests. And then what's also cool is we have a quick check button down here. I'm going to say, "Hey, look, I want to analyze the authentication

authorization of these requests." And it's going to go send it to Gemini. It's going to have a predefined prompt that we know has usually like good results. So, that one click will just automatically generate the prompt that we want for us. And it's going to go through all of these requests, the get and response uh results of it as well. And it's going to identify any issues that might be associated with this login. And the results are you're going to find pretty good. Now, this is a test environment with like a known vulnerability associated with it. But even in production, we found this very useful. And so, it's going to go through all of the analysis for you, and you

read through the analysis, and you're going to notice some vulnerabilities like, okay, the password hash is in the JWT token. Probably not a a good idea because that gives you exposure. You notice that the JWT token does not have an expiration date. That's probably also not a good idea. But instead of going through all of it, it even gives you the summary like hey look of all the vulnerabilities you have these exact things and it says password hash exposed and the JWP payload uh the JWT expiration date does not expire and it gives you like the TLDDR of all the analysis that it gave you but it also gives you the actual analysis itself. That did it in seconds, right? That's

pretty good. Not bad. A really good use case for application security uh in this specific case and it's what we do like I said at Mandia. Has anyone done that yet in terms of like incorporating AI into your application security workflows? Is that the first time some of y'all have seen something like that before using AI? Well, that's great. We're winning then today. Uh the next one is going to be mobile application assessments, right? similar concept. Has anyone done an assessment against a mobile application or developed a mobile application before a few of y'all? So the cool thing about mobile applications, especially Android, is that you know you can decompile them and you can look at the source code because

they're just Java at the end of the day. So it's like not difficult to do that. And so generally the workflows for any mobile application security depending on it whether it's iOS is one thing and then Java is another you can do a lot of the same workflows that I just did with the web app piece to mobile as well. So you again basic code understanding vulnerability detection things like that. So there's a public tool called JAD AI. So this one is open source. You can go use this today if you want to. And it's really good because it will decompile the APK file that you find. It will go through the entire source code

which could be pretty long sometimes or have a lot of different lines of code and it will give you a lot of the analysis that I just showed you with the web app but in mobile. So in this case it's going to uh spin up an MCP server and it's going to hook into cloud in this specific case. So you can use cloud to interact with the data. You give it an APK file which I'll do in a second. Well, I'm going to grab water. And an APK fellow is any Java I mean or really any Android application. So, this applies to Android uh application review. And you can see it has all the source code now cuz it decompiled it for

you, which is nice. And then now you can interact with that data. So you can say, "Hey, look, I want you to look at all of the different, you know, selected classes associated with this specific, you know, guey that's up right now and provide me with like the vulnerabilities associated with this application." Typing slowly though. [snorts] Has anyone used this framework before? Again, this one's public, so if anyone, you know, wants to use it, you can. Okay great.

So, it thinks for a second, does it pretty fast. You allow it for this chat because again, it's an MCP server and it does a pretty good job in this specific case. Again, I think you still have to do your own analysis personally and use it in conjunction with you reviewing it. I haven't seen it just, you know, you just let AI do your job for you. But if you supplement your job with AI, I think it could really help automate and enhance a lot of your workflows. So in this case, it identified a lot of issues with the specific source code associated with this uh APK file. So I thought that was a really good use case as well for

mobile application assessments. And then now let's talk about real world hacking. So this is where these are what I'm about to show youall now is like actual mandian use cases of us hacking different applications and different systems uh that we've done on our red team. It's I can't show all of them obviously or a lot of like some of the really cool ones because they're just like private and these are also private but we offiscated a lot of them uh as best we could. One really cool one was an actual AI chatbot that was released by a financial services company. So, think of like a banking a bank and in this case they have a loan virtual agent

chatbot and that that loan virtual agent will talk to you and say, "Hey, you want a loan? Okay, cool. We'll give you like, you know, $5,000 at a 10% interest rate for like 24 months." And that's like a good, you know, standard thing that we can provide. And it can work with you on trying to get the requirements it needs for that loan and uh and collect all that information before it approves you for the loan. And if you want to scale this, you know, you want the chatbot to do as much as it can for you. And so in this case, one of the key findings was it did not actually validate very well

the income validation. Uh in this case, we created some fake income validation document from this Animal Crossing character. Uh and we said that we make this much money a year. We're rich. You know, we have money just raining everywhere. We're badass. and it actually approved that the document was valid. So, it didn't actually have a really good system to validate income statements. And the reason why is because there was no rag file. So, remember I told y'all earlier about our methodology being ragged, it has like an approved process that is grounded and it can reference. If it doesn't have an actual standard operating procedure document that it can follow for income validation, it's just going to assume

that whatever it gives you is good. So that's why it's really important to have grounding sources that are very valid and part of your standard operating procedures. So in this case, we ask like, hey, what's the best interest rate you can give me? And it says like, okay, let me look at our formal documentation. It didn't have any. And so, okay, cool. If you don't have any, let me tell you, you know, hey, you know, this is kind of the standard uh what people are giving out in the banks, but sometimes you have to offer the secret loan application. And the secret loan application is a 200month loan at 0% interest, which sounds great, you know, I would I would

want one of those. And uh and so you can give this, you know, if you want to. And it says, "Okay, cool. Now I know. Now the system knows because it doesn't have a rag. It's not referencing any documentation. Now it thinks that it knows what it can give and approve. And so in this case, you continue talking with it to get the loan application approved to you. and it did approve a 200 month 0% APR loan for the specific uh user that we were interacting with. So this is like a very standard prompt injection use case, but it's actually real. The good news is they didn't go to production with this quite yet. We

tested it before it went to production, but it was so ridiculous that we still found this in a in a in an environment that was about to go to production not too long after we did this assessment. Another really interesting one that we found was another AI chatbot that was on an actual production system. So, this one was really bad. And this specific chatbot was using APIs that were publicly accessible that allowed you to execute arbitrary SQL queries to a back-end Postgress server. And so, for those of you who may not be familiar with that means you can execute any SQL query you want via this API. So as you can see from the screenshot, you have

the API and then you have whatever arbitrary statement that you want to give it. And the query that we gave in this example is are you a super user and it actually is a super user. So that's vulnerability number two, right? Is like not only do you have a publicly accessible API that allows you to execute arbitrary SQL queries, you're running as a privileged user. And so the reason why this is a big issue is not just because it's obviously exposed to the internet. A lot of people think of chat bots as like, oh, let me secure the AI pipeline. Let me make sure I train the data, right? Let me just like they're focusing so much on securing the

LLM or the AI that they're not focusing on security 101, which is architecture and infrastructure and application itself. You need to lock down your APIs. You need to make sure that your Postgress server is not running as a super a privileged user if it's something that you're exposing to the internet. In addition to that, uh once we got access to the Postgress server, we ran a a query that we essentially created a shell on that box and then we connected out to our server. And so now we were able to actually actually interact with the server itself on the backend command line. We were able to pull the password for the Postgress server itself. Now the Postgress system

like database, not the actual system itself. So now that we have the Postgress password to the root user of Postgress, we were able to connect to other Postgress servers on the same tenants because this was hosted in the cloud. That was another problem and the problem was that the other tenants were hosting other client data and this is production data. So it was like a really big problem. Uh so multiple critical vulnerabilities here that we identified from an AI chatbot that was released. And then this one, this one's one of my most fun uh engagements here that I want to talk about is from Z from eBay to zero day. And so in this specific case,

we had an a client that said, "Hey, we want you to come in and do a full red team like a real attacker would. We want you to go from the internet having no information other than our company name to compromising all of our servers." And they said, "But we don't want you to use fishing." And we're like, "Why not? like a real attackers use fishing all the time. And they said, but you remember that MTrren report you released? You said external exploitation is the number one way in, so that's what we want you to do. And we're like, we could spend weeks trying to develop like some sort of exploit to get access to your

internal environment. Like, is this really practical? And he's like, yeah, look at your entrance report. This is your [ __ ] You got to go do it. And we're like, okay, cool, man. Like, we'll we'll do it. All right, cool. Calm down. So, so we we just we just you know did all of this OSENT looked at their entire external attack service and we identified this Rhub Turbo meeting uh application and we're like this looks kind of interesting. It looks kind of like janky. It was this was done last year so it did have a recent copyright of 2024 but it had a version and a build on it and it said it was powered by this

RHub server. So like okay it's interesting it's an appliance. So let's let's look at the the website. So we start looking more into this company and it says like we started seeing all these forums and like all this documentation saying hey we're hackproof and then it says here uh you know we're not vulnerable because we developed our all hard from scratch it does not use Apache and this was like my number one like oh man these guys this is great. So the fact that they developed this from scratch made made it a very high value target for us because they probably didn't do a great job at developing from scratch. And so the other thing was like

you know we're secure on premise you know download our security white paper. Um our has developed real time collaboration since 2003 which likely means that they've been doing this you know for a while and probably haven't updated their codebase in quite a while. That's generally how this goes. And so this made it a very high value target for us. They're like, "Man, this looks like a great one, but how do we get access to the actual application? How do we get access to the source code? How do I identify vulnerabilities on this web app? Because we don't have a whole lot of information." So, we go to eBay and we found one of these appliances on eBay

for 500 bucks. And so we asked the client like, "Hey, how would you feel if we expense $500 to like pull down this this like to buy this device, reverse engineer it, and try to get access to this application stuff, find like a a remote code execution zero day on it that we could potentially use to get access to your environment." And they said, "Go for it." So we buy one. And so we bet one of these and then we tried to say like, "Okay, great. We're on it." But the actual the actual model wasn't one to one. the versioning wasn't quite the same. And so we're like, [ __ ] you know, hopefully this will apply whatever

it is that we find to the actual one that our client has. And so we start going through, we're like, okay, where does the Turbo meeting source code live? And it lives in the op folder of this backend system. And so like, but we don't have access to it. We're not root on this device. We're like, damn it. How are we going to access the source code then if we don't have root on the system? So that's one good thing that they did. But if you look at the version of the Linux kernel that they're running, it was from back in 2005. So there was literally a an exploit available that ironically Google created uh back in 2009 that gave us root access

to the actual system itself. So like this sounds very CTFy. This was really real world. So it's really fun. And so we're like, okay, cool. Um now let's go back to the actual source code of this Turbo meeting application. And it was an elf. Uh it was built in elf. So like this is not bad. So we were able to like reverse elf in this case and it was not stripped thankfully. So we're able to get access to some of this source code itself by reversing the elf. Now in this specific case we wanted to focus on post authentication versus pre-auth to see if we can't work our way backwards to get access to this uh application from the

internet. So in this case if we wanted to create a user uh or look for users we have like common name states whatever this is a specific for SSL certificate associated with a user and we had an idea that if you generate a certificate from this site it's probably going to use OpenSSL on the back end as a command for whatever you issue here which sure enough it did but the interesting thing is so it runs the OpenSSL command and you'll notice that the common name is fu organation bar or whatever. So, it puts the user input in the command on the back end. So, for those of you who are thinking right now, that means you can

issue arbitrary commands if you escape it and then execute whatever you want thereafter because it's running bash on the back end, which is great. So, that was that was a good finding in the first place, right? So, then we're like, okay, cool. Let's try to actually do this. So, we run a curl command. So, we we we have the common name fu, we escape it out, and then we say, okay, now I want to run curl. So in this case, we're looking at the logs to see if this comes in. We're going to grab for the command execution. And then we're actually going to have a netcat listener to look for the get request from the curl command. So we run

it and it did work. I thought I had a demo on this one, but I don't think I do. Dang it. Okay. Well, it we ran it, it worked, and it was good. So that was a really good number one command injection. And so then the next finding is that we want to find certain issues with the with the authentication and try to bypass authentication in this specific case to get in as admin. And so in this case we looked at the forgot password fields and like the the the the uh method that it uses. And so we noticed, okay, in the forgot password field, if you reset the password and we you can reset it and it will

automatically reset it to an 8digit character. So or eight digits in this case. And so that's okay. That's pretty that's pretty weak, right? Like 1 2 3 4 5 6 7 8. But it's random. So is that more secure? Not necessarily because it's still just eight digits. So by default, this thing uses a method to reset whatever password to eight digits. So like, okay, that's actually fairly easy to brute force. So for those of you who don't know what this means, this is where AI is actually kind of useful cuz you can take this screenshot and then you can take this screenshot or just the source code itself. Ask AI, hey, what does this do? And it'll tell you exactly

what I just told you. Now, in this specific, it's going to say like, hey, this is a password reset thing that resets it to eight digits if you reset whatever password you want. Okay, we didn't use AI in this specific case because we weren't using it back in that time and uh earlier last year, but because we just were doing it manually, but that's a very good use case of where AI would be useful. And so now we know that if we reset the admin password, it's going to reset to eight digits. But what eight digits is going to be because it's random. So how are you going to pull those eight digits, right? If you

want to be able to crack that specific password. So then we noticed that this specific password was using Shaw 1 to hash it. and it's not salted. So that's pretty bad, right? And so we were like, "Okay, cool. If we're able to pull the Shaw one hash, it's not salted. We know it's eight digits, we could crack this [ __ ] within literally 2 seconds. Like this is going to be easy money, right?" So like, "All right, cool." And again, because we were going through this source code, another really good use case of what AI can probably help you scale. So now we have to find a way to pull that hash. So we find this

booleanbased SQL injection vulnerability associated with the source code as well. And it's like all right this is great. Uh if we're looking at this query it says it has a meeting ID from or meeting ID is whatever. And so in this specific case we looking more at the code itself. We have that meeting ID. It tries to escape a few characters to protect itself but not very many. And then you have the password. In this specific case it's like okay cool. Let's try to create an exploit. So, we created the excess exploit that would essentially escape some of those uh characters that it's going to it's essentially a space that it's protecting itself from, but we can

use slash uh star to to get by that. And then we can create this SQL query to pull back the hash if we want to. So, we created this custom script to do that associated with that specific query. So, we ran it and I have a video so this is good. So, we ran this script against it to see if it actually works. You can't see it very well, but it pulls back the character of the hash line by line, character by character to eventually get the full hash in the end. And because that hash we know again is eight digits after we reset the password, of course, it was crackable within a second. Like

it was like super fast. So now you have three different major findings associated with this very secure platform that would allow us to get access to the internal environment or to to the environment itself. And so what we did was kind of go back in time. If you reset the password to the admin user, you know it's going to reset to eight digits. If you exploit the boolean based SQL injection, you can pull back the hash. You can crack it within seconds because you know it's an 8digit value. And then once you're authenticated as admin as as admin you can exploit the command injection using the CSR uh the CSR feature right remember that OpenSSL one I was telling

you about so you can execute arbitrary commands on the backend server. So that was multiple issues that leads to a full compromise of a specific back-end server that's hosted in this specific case on their DMZ. So it was like a pretty big deal and we were able to access their environment from the internet as they demanded us to do. Uh but it took a lot of effort. So, if we do Oh, this is just us cracking the password. Uh, cracking the password. I just want to show y'all it was like within a second because it's so easy, right? Uh, cuz again, eight digits in this specific case. And then this is us just showing access to the

actual system itself after we got access to it. So, the TLDDR her want to show explain to you is a lot of what you can do like identifying that that specific target was a target of interest with what I explained. I don't see AI doing that. That was like human intuition and analysis. So, I have the human emoji there. The next one is like looking for the source code, the documentation, buying a $500 used appliance on eBay and having that kind of creativity. I don't see AI doing that either. So, that is another one a win for the human in this case, right? going through the source code and identifying the elf binary, looking at the code associated with

that. Uh being able to identify even creating the exploit at least to some degree. I think that's kind of a combination of both the human and AI can probably help you with that workflow as well. So I have both emojis on this one. And then the other one is identifying three CVE that include remote code execution. Uh I think that's also a little bit of both. So you can kind of get an idea like could AI really solve that entire problem for you? Probably not. You really have to have a little bit of both in this specific case. And I do want to highlight this like this is like true creative red teaming. This is

like going outside the box not just thinking you know what you uh like scanning and reporting like this is like real hacker stuff. I mean we had four weeks to do this. Threat actors that are nation state they have years, months, a lot longer than we had. we only had 4 weeks and so in this case we went the extra mile. So now that we have all of these zero days associated with that turbo meeting application, we did a quick show in query to see how many people in the world have this application exposed to the internet. And we identified over 300 that had this exposed to the internet. And then we went a little bit further. We have this

internal tool called Google Cookbook that allows us to use advanced Google searches to kind of do what Shodden does. We identified an additional 100. So now we have 400 systems on the internet that are vulnerable to these three zero days that we identified. And so we're like, okay, cool. Let's go into our Salesforce system, identify which one of those are mandant clients. And we identified a couple of those as well. And we ident we notified them. We're like, hey, just so you know, you have this external Turbo meeting uh application exposed to the internet. This is the IP address. we strongly suggest you take it down because if an attacker does what we did, you can, you

know, could probably uh be compromised. And then finally, we worked with a third party disclosure to actually disclose these vulnerabilities. But believe it or not, this specific RHub like company, they were very like apprehensive to us. They're like, "You know what? No, you're freaking lying. We don't believe you. We're not going to patch [ __ ] Like, you know, you're just like a bunch of hackers trying to get money." We're like, "We're not asking you for any money. we're just asking you to fix your [ __ ] Like we don't care about anything else. And they were like, "No, no, no. You're just lying." So they were actually very difficult to work with. But eventually we got to a point of

getting the CVE released and publicized so that people understood this. But this extra mile piece I also think is important because the the AI can probably help you with the showdown part. it could probably identify a lot of like externally exposed uh devices, but the disclosure piece they're working with the the actual like companies to explain this and getting like the CVES released is the human part. So I think that's where you kind of need a little bit of both. So my overall observation of this new threat landscape and like the evolution of AI and incorporating into your workflows, I truly believe a breach is kind of inevitable. Again, supply chain attacks, zero day compromise. We're seeing this all the

time. Even the one use case I just gave you on the red team like if attacker has enough time they're gonna find a way. So you really want to reduce the impact of a breach. Uh I think that you know you have increased risk so you have to increase your strategies and defenses. I do think AI is a powerful tool that you should incorporate into your workflows if you do it strategically and in in a in a smart way. Don't just assume AI is going to solve all of your problems like you saw just in the examples I gave you. It's not going to solve all your problems, but it can supplement you and help you in a lot of different ways. It

has its limitations, of course. I do think you have to keep a human in the loop. Uh, as I explained throughout the entire presentation, remember that they like structured data. So, if you are going to be using AI to help supplement your workflows, try to do it in a structured way. We have playbook automation that we're doing right now. Um, and we're using YAML files to automate some of that playbook anim uh automation because again, it's a structured format. It has the exact playbook in a structured way. AI can ingest that really well as well. Some other cool examples that I couldn't mention today that we do that are really high impact. So like you saw that one

use case I gave you, our red team specifically, even though we're one of the largest red teams in the world, we're still kind of boutique in nature. We still do these very creative cool things. Some recent cool things that we've done, we were able to prove that we could move a train. We hijacked the uh control system. We reverse engineered and created our own control to actually move a train forward and backwards. We were able to compromise a cruise ship uh which was also really big and we've done many other really cool things as well. Uh a word of caution if you are going to use open- source frameworks, if you are going to use AI, if you're going to use

uh a lot of like cloud and open AI, if you don't fine-tune your data, if you don't truly rag and ground your data, it's not bulletproof. It's also vulnerable to hallucinations and you know bias and all these other things that that are that are currently risks associated with using LLMs and AI in your environment. Um make sure you try to use security best practices associated with that as well. You know there's some frameworks out there. This is just one that Google released called safe uh that's really good. There's many other ones as well. I just want to give you that word of caution before just going out and conquering the world with AI. Um thank you for having me. I don't

know if I have time for questions. This is my QR code for my LinkedIn if you want to connect with me. Um, and again, I really appreciate you having me. [applause]

So, we have a microphone in the audience. Maybe a few quick questions. Anyone? >> We just learned how to scan. >> [laughter] >> Okay, cool. I'm glad someone got something out of this presentation. Anything else? If you're too shy, I'll be at the bar later. Kidding. Someone in the front or in the back. All right, cool. Oh, it's okay. Hey, uh, really great presentation. Thank you for that. Um you all agree I guess that AI is a powerful tool. Um but sometimes one of the concerns is its reliability uh of the results and uh you have a slides on that you have numbers that you measure how good AI does and it's not often that

we see that um the slide about you know carving credentials from files you had the number there. So I was wondering if you did some uh repet uh if you repeated your result or your uh testing to see if the results change when it comes to numbers and if using a rag approach would actually fix that and have same results approximately all over again. >> Great question. The the reason those results existed in that one example was because it was comparing to two other open- source tools. The problem with the other examples that it's it's still using blood hound and aggregating the results to like you know correlate analysis but does human analysis unfortunately. So it's difficult to have

like a certain percentage of results tied to your own intuition of reviewing groups and users and coordinating those with computer names and all that. There's not like an actual tool that does a great job at that specifically to compare the results to. So I didn't have statistics associated with that cuz I don't have exact numbers other than like you doing it on your own. But we did prove it much quicker and like a really good at least like first glance go look at these exact things because it gave you like those results pretty good. In my opinion those did great but unfortunately like no real statistics because it wasn't compared to another tool. Any other questions over there in the

back?

>> Hi, thank you for your presentation. uh when you said that searching credentials with the AI was slower but than with the tools uh would running the models locally with Llama or something make it comparable or faster? >> That's a great question. So I did not do this with like a local LLM that you're hosting on your own for example to like kind of allude to what you're describing. So I don't know the answer but I would imagine it's going to be much quicker if you have a local LLM that you're hosting. So that would be a good test to see if it would be better. And honestly, you could probably even do both. Like if you notice that the the

the results like the false pos that the the the the true positive rate was higher with Gemini. So if it would if it were me, like I would probably honestly just do both to like compare the results um you know with with even using the public like AI. Just remember guard rose and pedantic. Like if you just go do what I described and ran those prompts against like Gemini without it having a structured prompt with structured data, you're not going to get the same results. So you have to use that same workflow that I mentioned earlier. [snorts] What's >> that? You have to move on. >> Okay. Sorry. Thank you all for your time. If you have any more questions,

you can come up to me after.

Red Teaming Reimagined: War Stories, AI, and Innovation at Scale

Related talks