← All talks

Securing Space: The Next Frontier for Security Engineers

BSides Seattle · 202630:3616 viewsPublished 2026-03Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
Bsides Seattle February 27-27, 2026 lecture: Presenter(s): Ankush Gupta
Show transcript [en]

Hi everyone. Uh good evening. So just to give my introduction first, I am Ankusha. Uh I'm enterprise architect for one of the largest telecom organization here in United States. So just as part of disclaimer, so I'm not just naming the my organization name but yeah. So uh I am a senior member of ITLE E and Sigma XI and I am a researcher for past couple years on cyber security AI and cognitive science as well. In cognitive science we focus on uh the human behavior and uh how to read the human behavior and convert into machine instructions. So I have associated with the one research organization where we do some research on these areas while my expertise areas primarily focus on cyber security and artificial intelligence.

Okay. So today uh I'm going to speak about red teaming AI systems for security validations. So just to give you a high level of introduction. So as part of my day-to-day work in my organization, so we are focusing a lot more on AI and uh ML. So on AI side there are lot of data exposure where we need to keep our eye on the data, how data is refining, how data modeling is happening, how we are building a product, how we are designing a product. If I ask a question, so how many of us really focus on security during the product design? I'm not talking about the product architecture. Product architecture is definitely a second phase. But once we start working

on a product design, as far as our business requirements and the orchestration is concerned, probably very few. So security is very very very much required at the beginning of the product inception. So I can give you one example one real example. I'm sure that most of you already heard about it. So in 2023 so there was a big data breach for one telecom organization and for one bad actor or for one incident transaction. What was the cost for that? It was 325 million USD. That was the cost for one incident and one transaction. And there were lot of news on on everywhere on Yahoo or on Google. It was a very big incident. The customer PII data was stolen.

that includes some digits of SSN, social security numbers and also the credit card details as well and it was a portion of the customer data not the whole data because probably that enterprise was having more than 100 millions of customers and similarly one incident happened uh during the COVID time in 2021 early 2021 if I remember correctly And that in one incident cost was over $800 million. So now we can understand the urgency of this cyber security and that should happen a proactive approach not the reactive approach. So moving forward so today's agenda is so I'm going to talk about uh the red teaming. So what is red teaming? How do we form a red teaming team and what are

their function? What are their job? Why we have a red team in the organization? Spec specifically very critical transactions that involves the day-to-day customer transactions. It could be financial organization, it could be uh telecom enterprise or it could be retail organization as well. So one we can see in 1 second how many transactions happens on Amazon.com. So there are lot of transaction in in fractions of seconds happen in those large organization and then definitely the vulnerabilities. So we I'm going to speak about little bit on AI vulnerabilities and I can give you a few examples in my organization. So in 3 days of time a team of cyber security experts we were needed to fix 30,000 vulnerabilities in 3 days of time and

organization was so so crucial that we want to employ as many as possible but we don't want to get delay more than 3 days 30,000 vulnerabilities in 3 days and it happened I was part of that team I am going to discuss about some uh uh rank teeing implications with respect to AI and then I focus on red teaming process methodology and some of the real world examples and case studies. So in one of the case studies at Microsoft which I am going to showcase go ahead please continue. Yeah. So one of the case studies here in Seattle um from Microsoft. So I'm going to present a small 2 to 3 minutes video as well that case studies focusing more on AI and then implication as far

as red teaming is concerned. So red teaming what is the function what how that analysis was done what were so I'm going to present that case study also here and then finally defense strategies and I would be happy to take to focus on key takeaways and then conclude my presentation.

Okay. So, introduction and uh the importance of AI security. So, there are some traditional security testing. So testing is also one of the important aspects on the AI because in AI there are various models in AI and now everywhere we can see AI, generative AI, we can see robotics, we can see agents, we can we we hear um generative AI. So there are lot of data exposure. So lot of data centers are consuming trillions of data every day. So there are a lot of data exposure and uh hence therefore there is a high level of risk on this on the data that's the reason we really need to focus on cyber security. So uh I'm going to focus on some of the testing aspects on

uh security cyber security and uh red teaming some of the adver adversarial testing which reveal AI aspects for future implications for the products. It could be finan financial product. It could be telecom product. It could be e-commerce or it could be manufacturing. It could be anything. It could be any domain specific product that really does not matter. Then uh the very much common framework in red teaming which is compass RD. So compass RD is a unified teaming framework which I am going to discuss about little bit on this framework in my next slides. And finally AI is now embedded everywhere in every aspect of security mission critical systems the business orchestrated uh products everywhere. So I'm going to discuss about those things as well.

So before that I would like to present my case study uh one case study on Microsoft which was done recently on uh uh red teaming.

He is. You're better at holding the microphone next to your mouth than most people. Good for you. This is not working. Why? My mouse is uh Sorry about that.

So excuse me. So somehow my another screen is not coming up. Probably it's focusing on only one screen. Can you please come? Oh yeah, I think uh I just wanted to showcase found it now that I'm Oh yeah, I think I got it. Working on AI. This year we'll see even more capable AI models and applications. The foundation models that uh we come to love are becoming multimodal and the foundation model providers will continue to release models at a backyard cakes. Uh and simultaneously a few hundred companies are building tools for developing and deploying AI agents. So generative AI is really rapidly changing our world but it also introduces new security myths. Recently Microsoft conducted a study where they tested over a 100 of their Gen AI products and they

highlighted the urgent need for rigorous security testing. Their study emphasizes the importance of structured red teaming combining alterated tools with human expertise. So let's explore some of the studies findings and why red teaming is pushup for securing AI technologies. First let me describe some of the uh details of the study. So first Microsoft tested uh different AI systems uh ranging from chat bonds to image generators. They also used a mix of security and responsible AI tests looking for both technical flaws and harmful cults. The team employed a hybrid approach. They use automated tools like pirates and expertdriven creative methods. The goal was to evaluate how effectively they could mitigate figure gaps and maintain safe points. Their study revealed a

few key findings that I think are crucial for anyone building AI applications today. First, context is king. Red teing must be tailored to the specific use case of the AI system. Next, uh, if surprisingly enough, simple exploits like prop engineering and be just as effective as some of the more passive paths. They also found that many vulnerabilities stem from system integration issues, not just the AI model itself. So, this is not unlike many other software systems. And when you're doing testing, automation is essential, but human creativity is needed for new wants to be in specific flaws. Responsible AI hearts are very hard to quantify, they found, and require delicate frameworks for measurement. And then most important of all,

security is a continuous process that requires frequent retesting and updates. Let's turn the key findings in the Microsoft study and turn them into practical recommendations. First, you should shift your approach to AI security and focus on real world implications of AI misuse. Concentrate on worst case outcomes and tailor tests to a specific uh domain to your specific domain, user base and use case. Don't just focus on the model. Uh really think holistically in terms of your entire system. Scrutinize authentication endpoints, plugins, and external data flows. Dance automation with expertdriven evaluations especially for complex risks. Adopt an iterative break fix mindset. Continuously testing and updating your AI uh application in the underlying systems that uh uh come with it. Also know that security is just one of several risks that you will need to

manage. That's why I really believe that companies need a unified AI alignment platform to manage all this including legal compliance reputational cyber security because stirring tools are siloed and cannot ensure AI models behave properly. You really should investigate and desk in as AI the line. Finally, no matter what you do, you should assume the worst case that your AI applications will at some point fail or get breached and you need to have a plan for that. So you must prepare for an inevitable AI incident and had a robust response plan that includes containment strategies because uh uh incidents stem from the technologies inherent uh uncertaint uncertainties and not just from malicious act. So in full saying this Microsoft study provides a great foundation for

gen AI security but it also has limitations right. So uh responsible AI harms are subjected and difficult to standardize. the findings may need to be adapted to different industries and contexts cuz remember uh his tests were performed mainly I think exclusively on Microsoft AI products and models and also the this place is booming tax. So the ever evolving nature of AI requires continuous learning and adaptation and we need to refine our automated tools to expand testing to do AI systems including agents multimodal models and f foster open collaboration. Thanks for listening. If you enjoyed this, please subscribe to our newsletter which I'm finding and reuploads substam. Okay. So we have seen multiple uh uh LLM models and uh various aspects of AI and

what are the risk associated with them and what is the impact on on that. So now moving forward to red teaming. So what is what do we understand with red teaming? Hello. Yeah. So red teaming uh in red teaming we create enterprises create one virtual or simulate one environment where we hire ethical hackers. They actually work on hello they work uh on the realtime applications production applications and they try to explore or find the opportunity where that one might be. So where uh bad actors or there is a threat possibility so they try to find those rooms and then address those issues before the time. So in red teaming so there are multiple teams within the organization where they come and then they try to uh address those challenges

before the hand. So red teaming uh differs in multiple ways. So we have vulnerabilities assessment in automated fashion and surface level and some of the non-weaknesses. Apart from that we have multiple level of testing that include contract testing that include penetration testing and some of the testings we follow throughout the product life cycle. Even most of us know very well that in some of the products from Microsoft like CI/CD then we have all the scanning all the security scanning security gatekeeping activities within CI/CD GitHub. As soon as our product u move from non-production to production environment at every step we scan that code whether do we have a secrets whether do we have some credentials whether do we have

PII data for the customer and we capture those information before going moving forward to upper environment or even in production environment. So those things we normally do in GitLab, GitHub and then um GitLab. So apart from that we have red teaming for AI focus on the misalignment prompt injection tool misuse and uh abuses in multiple workflows within the organiz organization. So during the product development when we design a product and start building the start building the product during the all the orchestration business layers and business logic and we move our product development towards the testing then we always try to have a focus on the cyber security as far as all the business workflows are concerned all the business

mission critical transactions are concerned we have rack poisoning in LLM MS and rag models. We have retrieval manipulations and some guard rails with respect to safety uh circumstances. Okay. So this is the very well industry uh common framework compass RD methodology. So that is very well known in uh red teaming and in compass RT framework. So we do three checkpoints on high level. So we have risk based assessment or escoping we can say. So once we start doing product development so then we try to do assessment for the whole system how our product product development life cycle will go starting from the development and then architecture and then testing and then moving to non-production and elevation to production

environment. So there is always a mandatory need for the risk based assessment or scoring. So we do all assessment for all the business logic layer and we have a checklist where we want to keep all the checkpoints in numbers. What is the risk assessment? What is the risk associated and how we are going to mitigate those risk. So that is the first part of this compass RT framework. This framework always let enterprise know how we are doing the assessment with respect to security once our product is ready. So we need to think before the hand we need to have a very proactive approach only then we can follow this risk based scoping as first pillar of compass RT

framework. Second hybrid adversarial testing. So this is a crucial step for the testing not after the product development but during the product development. So we have many tools available in market which we use for the unit testing or the product development life cycle during the product development life cycle. So we have many tools which we can test on a piece by piece or a portion to portion basis where we need to do risk assessment and then risk testing for that and once our product is ready then we move and then we do a consolidation or the final testing for uh for the product what is ready to deploy in production and finally we have continuous

validation. So continuous validation at every step at every aspect. So we do continuous validation that includes but not limited to these factors which I did mention here regression test postmarket monitoring that monitoring includes the dashboards we normally create the dashboards we do all these statistics analysis how my product is doing in production what is the ROI for that production for my business and apart from that we have do all the toggling comparisons for the defense mechanism so we have multiple defense mechanism M for especially for the products which are very very specific to data. So for example we have some products which are having multiple lot of transactions within fractions of seconds as I was

um um I was mentioning that e-commerce products uh on Amazon. So Amazon is having lot of transactions for example payment right. So we have stripes even my one of the uh colleagues uh that I was talking to him about the cyber security. So they have millions of transactions in a second for the payment for the stripe. So, so that payment validation payment gateway validation is very very crucial and for those type of transactions if there is a loss of let's say for one incident and once incident that is stay for 15 20 minutes in the system then we can understand the loss for the revenue for the company. So it could be millions of dollars in 15 minutes of time because

the payment is stuck the product sale is stuck everywhere. So they cannot sustain they cannot survive without security. So that is the reason there is a huge impact on the organization with respect to security. So these are the three main pillars risk based assessment hybrid adversarial testing and then continuous validation as far as this compass RD methodology or framework is concerned for red teaming. So moving forward so some of the results which come across AI. So there are different AI system types where we have results uh as far as our framework is concerned on the red teaming. So there are various results conversational LLMs. So we have multiple models, multiple LLMs with respect to rag. There are multiple models on the rag. So if I remember

correctly, so we were developing one rag model and our own rag model. So that was erhitated with uh with the existing standard standard model for the AI and we had we had uh uh if I am wrong I'm not wrong then we had to spend consistent uh 3 weeks of time with respect to the coding aspect for that. So we were using uh multiple tools AI generated tools like copilot chat GPD but normally we are using copilot. So using copilot so we had spent 3 weeks consistently day and night and then we were able to write 3 millions lines of code and 60 to 70% code was generated by Microsoft copilot but we definitely need to had had to um refine that code and uh and uh mix with our code as per

our business requirement for that model. Second is rack systems. So we have corpus poisoning, retrieval manipulation or retrieval augment augmentation and apart from that we have security which are heavily dependent on data integrated and uh and models data models. Third we have agentic and then tool enabling models where we have unauthorized tool inoc invocation workflow abuse and octa and um so we had uh octa as a single sign on practice in our organization for multiple applications I think more than 150 domains and more than 3,000 applications and then we migrated over to um Microsoft Andra and uh that requirement was completed within 3 months of So finally we are finding that is the highest risk profile is small prompt changes and large

real world consequences. So as why as I was mentioning some of the challenges and the risk associated with them. So those large real world consequences are really really needed to address. So continuation of result across AI system types. We have key observations here and then compass RT framework which we utilized and the outcome for uh this framework. So key observations includes different architectures where we have different risk profiles and risk associated with the different architectures which we have addressed during this framework compass RT framework in red teaming and red teaming must be unique across all the heterogeneous systems and applications business applications and compass framework. So we have provided unified reporting and measurable and impactful metrics. So we have multiple dashboard like graphana like Splunk. I'm sure that all of

you know very well. So we have created multiple dashboards and we wanted to monitor how these products functioning in production. What is the ROI associated with them? What is the business outcome for that company has spent $150 million for this product? Then what is the outcome for that product? And uh we have a lot of um um the best practices uh as part of this product development and then data modeling for all the AI products. So some of the governance and then challenges for future and then conclusion. So some of the governance integration which I did mention here and then key challenges we had with respect to data privacy and adversarial attacks evolving rapidly during the product development integration complexity talent gap which includes

AI and cyber security. The future trends would like to focus little bit more on future trends. So we have explainable AI for transparent risk decisions, quantum enhanced security analysis, fully autonomous self-healing security systems, collaborative AIdriven threats, intelligence and red teaming in CI/CD pipeline in GitLab as I was mentioning earlier. So we had best practices in uh in gitlab for red teaming and then finally we have expanded all the coverage for multi- aent systems. So there are some key takeaways and some key learnings for us which I would like to conclude here in my presentation today. So proactive risk assessment which is a which is a necessity which is a mandate not an option not a choice anymore otherwise organization will not

survive. Organization will survive if security is taken care at the first step of the product development that is product inception, requirement grade gathering, talking to business or product design even before the architecture before we start you know carving the requirements into product architecture otherwise organization will not survive security uh having lot of framework. So we have implemented zero trust framework and that is a very well common framework and zero trust there is no trust at all everything should be very very safe secret all the wallets should be in place all the credentials PII data customer information everything should be very very very safe otherwise it will be a huge problem because from legal point of view also if there is a secret

if there is a bad actor who is stealing my data my SSN number then I can sue the organ organization I can file hundred millions of dollars for the or organization. So this is very very important that we need to understand all the aspects of security from the in inspection of the product and uh compassity we have lot of operational AI red teaming at scale once we scale up the team and we have resiliency. So resilency is only once we consider all the security best practices and take into consideration and we have lowering attack success rate and strengthening our governance policies within the organization. AI security must be continuous. It should not be a one-time practice or one-time exercise for us. It could be a developer,

it could be a manager or it could be high leadership that includes vice president or CIO, CTO or anybody. It's a mandate for everybody. So, thank you so much for your time and uh I am happy if there is any question. Uh feel free to reach out to me if you have any question. Thank you.