← All talks

BSidesSF 2026 - AI as an Accountable Entity: Governing Risk When Machines Make... (Pavithra Pradip)

BSidesSF26:567 viewsPublished 2026-05Watch on YouTube ↗
About this talk
AI as an Accountable Entity: Governing Risk When Machines Make Decisions Pavithra Pradip AI now makes decisions that impact customers and security. So who’s accountable when it fails? Learn how to treat AI as a risk-bearing asset, define ownership, design ethical controls, and audit models for fairness, drift, and explainability to build provable trust. https://bsidessf2026.sched.com/event/e0bd834b74e1e4f8a47d4ce0c793f256
Show transcript [en]

So, I'd like to introduce our speaker. So, it's Parvatha Pradeep. She is um speaking on AI as an accountable entity. So, she is a staff technical compliance management uh manager at a company that is very large and might have to do with April 15th. Yeah, there we go. Okay. Um So, what she's really known for is bringing business, engineering, and security together. So, give a hand for Parvatha. Um real quick, did you want to do Q&A as you're doing it or at the end? At the end. At the end, and if you just let me get you a mic so the folks at home can hear you, that would be awesome. Okay, well, thank you so much for that

lovely intro and for all the folks in the room. Yeah, I I know I was nervous to even start, but the setup kind of get got rid of the nervousness, so I'm good to go. Okay, so surprise, surprise, I am talking about AI. Um a lot of you have heard a lot of different talks related to AI over the last year or so, and I always thought, "Okay, we're talking about how do we use AI? How do we make it more user-friendly for us?" But we're not talking about how do we govern? How do we govern the decisions that it's actually making? And that's what we're here to talk about here today. Um I know this is right

after lunch, so I'm trying try my best to keep it interesting and make sure you guys don't snooze off, but if you do, I'm not going to blame you. And also thought I'll sing. How about that? Right? Okay, so let's do 5 6 7 8. No, I'm just kidding. I know. Trust me, I do not want to blow your ears off. It's Yeah, you'll probably run for that exit and this theater would probably completely empty before I even start. So, I will save your souls from hearing my voice, but we will keep going on, and let's see what we've got. Uh quick disclaimer, none of the uh topics that I'm talking about today or any of the information is

related to the company that I'm currently working for or any of the companies that I've been affiliated in the past. It's completely my own views, my own opinions, and most of the concepts are already publicly available now. It's a very popular topic that's coming up. So, you will be able to find it online as well. Okay, I like I said, I'll try to not put you guys to sleep. So, this is not a compliance talk. It's more on reliability observability and ownership for decision systems. So, this talk is all about what happens when AI systems stop being tools and actually start making decisions. So, traditionally, software is typically designed around explicit rules, what behavior it is. It's usually predictable

and traceable. But, with AI systems, it's very similar to ML models. It just learns patterns from data. It makes decision based on behavior, which kind of makes it harder to predict cuz it's constantly changing. There are environmental changes, there are model retrainings. So, it's constantly changing. It cannot really It's hard to keep up to predict or even explain what's happening behind it. So, that means that behavior isn't fixed, right? That's the concept. And it can change over time. So, the system might technically still be working. There might be outcomes coming out, but they might have different outcomes over time. And that's the challenge we need to correct. It's not correcting the code itself, but making sure we can trust that code over

time. So, I'm going to use a simple example throughout the entire entire of the deck. Just something to help follow along. So, you're applying for a loan at a bank, and the bank has decided to do with AI system to get either approve or reject the loan. It's completely autonomous. There it's making this decision in milliseconds, and you have decided to upload approve for for a loan or like get a loan through this bank. What happens if it gets it wrong? Right? Say you have applied for a very simple loan, it's not that big an amount, you have collateral, you have all of the you've applied for larger loans at this bank and it's approved, but this small

loan is not getting approved and you know it's wrong, who is accountable? Are you going to go blame the bank, or are you going to blame the engineers who built the model, or is it the GRC team who did not test the model? And that's the concept I want to help address today. So, when people hear GRC, governance, risk, and compliance, they think of audits and paperwork and bureaucracy, and trust me, I know that cuz I'm part of the GRC team. But, engineers actually do this in practice today. They just use different terminology. Governance is about the ownership. It's about making sure you have guardrails, making sure who is responsible for a system if the rules break.

The risk is all about knowing failures. It's making sure the blast radius is not too big, but you know, you're able to contain it. And compliance is all about making sure that these systems operate within these guardrails that you have set in governance. So, all of this comes from either laws, our security policies, or regulations that are already in place today. So, in engineering, GRC shows up through practices like design reviews or incident response or like post incident analysis, and you're already managing this risk through practices like reliability engineering today. But, GRC just builds on that foundation and it helps expand in areas like compliance, accountability, and privacy. So, the goal of my presentation is to

not add bureaucracy, and or at least I think it's not add bureaucracy, but it does extend those to those engineering principles that already exist today to help make sure that these AI systems that now make decisions are also uh accountable for it. So, traditionally, oversight has always been about humans making decisions. It's about the processes that are either stable and in place. You Everything is documented. It's all predictable behavior. You can actually audit it. It's all explainable in human-written format right? So, going to our example for the loan uh system, it would be a loan officer reviewing your application. It would look at a person or a supervisor, maybe, you know, if there's any edge cases

they're reviewing, but there's a human review behind it. Everything is explainable. The intent is documented. And the auditors can actually review why the decision was made, either approve or reject. The term I keep using in this slide is human. That's the key point. It's always been about human accountability throughout traditional software. But AI breaks this completely, right? AI has become fundamentally different from how traditional software works. They're more probabilistic. They learn, and it they retrain, and they change over time. So, it's more important to note that these decisions are done without any human review nowadays. And that puts them way closer into distributed decision engines than simple code today. So, AI now makes decisions faster than

governance was ever intended to even supervise. So, it's kind of unfair to think that GRC today can just automatically or automatically apply to AI system operating systems as well. Real quick about the AI systems, they rarely fail loudly, right? They're always about subtle failure. It's just they drift over time. There is bias, there is unfairness, and there is bad outcomes over time, but you don't notice it. It's all happening gradually. And a lot of teams who actually work with these systems would know that something is wrong, but there is no way to know. There's not one single point of failure or something to pinpoint and say, "Okay, this is the reason why, you know, it's

going bad or like the decisions are not exactly the way we want it to be." So, this is just failure without any alarms. So, here is the first mind shift I ever want to do. Now that we know how the AI system work, we need to understand that AI systems are not just a feature anymore. It's more a system that carries risk. It's like a service or a data pipeline. So, once you see it as a risk-bearing asset, governance now becomes more of a design problem than a feature problem. Step one in this process, right? And this is very similar to the other new initiatives that come up with organizations. Know what you have. Inventory your systems. Know what AI

systems that you actually work with. See what is the purpose of it. Are they autonomous? What kind of data do they work with? Uh do they have real-world impact? I kind of start do like a service catalog almost for all of the AI systems that you have. So, you know what you have in your environment. Now that you know everything that you have in your environment, does it actually mean that you need oversight and governance on all of these systems? That's a lot of work. Like I said, I'm not trying to add more bureaucracy in this work, but more around understanding what's actually important to have governance over. And the reason I'm saying that, right? So,

if you have like uh chat like a chatbot, right? That's AI. It's just taking information from point A to point B. It's not really making decisions off its own, but it's still something that is making your life easier. You don't really need a lot of governance. It's not a lot of autonom- autonomy along with it as well. But, when it comes to the example of the AI loan, you probably want oversight cuz it could have high impact, and that's something that I'm going to touch on in this slide. So, when you want to determine which AI service or system needs oversight, you want to look at at least three factors. And these are the three that you see on

the uh slide deck, but also you can have other factors depending on which uh organization, which industry you're in, but I want to touch that you should have at least these three to start with. First is impact. Does it influence important business or customer outcomes? Does it impact your customers? That's one something you want to really look at, and even your business goals, right? Autonomy is all about how much freedom does it have. Is it making decisions on its own, or is there human review, is it like semi-autonomous, or is it just more of an advisory role? And then you want to look at sensitivity. It's about how the data input is how sensitive or exposure does

it have. So, is it is it PII? Is it HIPAA? Or is it uh data that is, you know, completely tax related? Like all of that information, you want to know what data it's actually working with. So, you want to consider sensitivity. And like I said, you could have other pillars. You could have like regulatory exposure. Maybe the system that you're currently working for, like the AI system you're building, will fall under a current regulation or a future regulation that's coming up. The formulas that you see on screen are just like basic options of what you can consider to make that score, that materiality score of whether you need oversight. It depends on what organization you are. It's just basic

formulas that I've put up there. It's not something that is, you know, you don't need a PhD in machine learning for this, but just to understand which AI system needs oversight. So, always ask yourself three questions, right? How important is it? How independent is it? And how sensitive it is? And if the answer to any of these questions is say high, or like you know, it's very important, then you probably want to have governance and oversight of this AI system. So, going back to our example, I used a very basic summation um formula for this, but impact to the customers. With an AI's loan system, you have high impact to the customer. Directly affects the customer.

On a scale of one to three, three being high, I'm giving it a high three. Is it autonomous? Like I said, there's no human review. It's making these decisions within milliseconds. Again, I would probably put it at a high three. And then finally, we have sensitivity. For loan application, you usually use a lot of personal data. It's a lot of regulated data that you really want to make sure that you are protecting it. So, again, a high three as well. Basic summation formula, a scale of one to nine, nine being the highest. You definitely want to have governance over this loan AI system. Okay. Now that I told you how to have the oversight, which actually needs to

have oversight, let's actually look at how do we address accountability, which is kind of the main part of my uh the deck today, but the accountability is a huge factor, which is still something that we're trying to answer today. So, the core idea is that AI is only accountable when you can answer three questions right? What can What it did, why it did it, and who owns it, and can you prove it? That's technically four questions, but I'll help explain that in a second, now that I said it out loud. Uh but if you remove any of these pillars, or any of these in the triangle, uh accountability just becomes impossible. So, let's start with ownership. The

ownership is the who. Who is answerable? And I'm not saying humans are still not accountable. Humans are definitely still accountable. It just AI never owns its own action. It's still humans. But ownership is not just one person or one team anymore. It's kind of redistributed. It's like the model owner who is responsible for the model integrity and retraining, right? We then have the business owner who's responsible for making sure its outcomes are more ethical. And then you have GRC, the control owners who are responsible for oversight and compliance. So that's ownership. Now let's look at explainability. So that helps answer what and the why. So what does the system doing and why is it doing it, right? What is the

decision? It's the same as any other decision that a human makes. What are you doing and why are you doing this decision? It's about interpretability of that decision. And that's what we need to help answer in the AI systems as well. And if explainability is missing, AI system just becomes a black box. Which you cannot justify to your auditors or regulators. I said this is why it did it. Without that you're not going to be able to answer those questions or even your customers if they ask why did it make a certain decision. And then comes auditability. It's the proof. How do you prove that it's actually working the way it's supposed to and it's actually behaving

the way it's supposed to? And I'm not just talking about, you know, you have logs of all of the outcomes. You're just it's just all of logs spitting out of all of the outcomes. But more about structured evidence evidence over time. It's about everything from the start. Who are the different teams involved? Who are the accountable folks? What is the explanation? What is the retraining data? Everything structured so that it's easier for internal or external auditors who come to audit the firm or the AI system to help evaluate that these controls are actually effective today. So just to recap, it's not about AI doesn't remove accountability, it's just redistributing it. So, what is actually answered with these

three like three pieces of the triangle, right? So, ownership answers who is on call when decisions actually go wrong. What is the escalation path? That's all about ownership. Explainability is how do we debug outcomes? And auditability is just can we reconstruct history based on everything that has happened? If we give the exact same data, the same parameters, the same assumptions, can we make up the same uh decision path over time? The triangle that you saw is just the minimum viable reliability model for AI governance, for AI system, sorry. And it's all about observability, ownership, and history applied to AI decisions. So, this is for the GRC folks in the room, so it's going to get a little bit

boring for the others. Uh so, controls again is not about bureaucracy, right? They're just boundaries that we set in organizations. So, they define how a system is allowed to behave, and they turn values like fairness and stability to make sure it's actually in testable conditions. More like guardrail engineering almost. So, the governance controls set the roles and the policies. This is all about what are the list of to-dos and not to-dos, right? So, the way I like to think about it is governance is more like a steering wheel of a car. It makes sure you're in the right path before you even hit traffic. The risk controls are all about ensuring there's fairness and stability

throughout the entire decision-making process of the system. And this is more like the brakes, right? If any conditions change, it makes sure you're not steering away from the path that you're on. And assurance controls are more through audits. It's performance it, right? And it's all about like having a dashboard, making sure that your steering wheel and brakes are actually working the way it's intended to work. I'm going to tie it back to the example of the loan AI system, right? So, if we had governance control for that system, it would require like business owner sign-off before it's being deployed. It would require the model to not get deployed without an ethics review behind it. And maybe the policy states that you

need to recertify the system every year. So, make sure there's testing and everything done every year on the system. Uh the risk controls would look more like quarterly bias testing or fairness monitoring in place to make sure there is there's not too much drift. But even if there is, you're kind of notified in some way or form. And there's explainability logs for every decision. And assurance controls for the loan AI example would look more of semi-annual audits by external auditors or internal auditors. Uh just reviewing whether it's doing fair and it's able to reproduce the same outcomes given the same parameters. So, more performance testing as well. Okay, I'm almost done. Just hold on tight.

Um so, now that we know how the there's accountability where where is the oversight? Do we need oversight? Now comes the sound check, right? Now we need to audit these systems. And these are for the auditors in the room, but I do also want to say let's replace audit for the non-auditors in the room. Replace audit with post-incident analysis. And that's kind of what you're doing. You're just replaying decisions. You're comparing behavior over time and making sure the and validating assumptions and making sure that it's actually you know, giving the same outcome that it would did before. So, you would look for drift. You would look for bias. You would look for fairness test or any unexplained changes basically in

a system. And that's kind of what the auditors would test for. So, to do that, you can do conduct walkthroughs, maybe. So, for the AI system, right, they would probably do walkthroughs and design reviews to see if it's, you know, actually, um, producing the output that it's supposed to produce. They would do performance tests, verify whether there's, you know, demographic changes, like model design parts have changed because of certain demographics or new features that were involved, and any fairness test as well. And a lot of you already know it know this, but audit is not just a one-time thing. It never was. We still do it, you know, more frequently nowadays, but with AI system, it's actually a little bit

more continuous assurance at this point. Cuz without that, you cannot test for trust over time. So, you would do want to make sure there's always continuous assurance. So, the whole deck was all about, you know, these three shifts, right? From just completely blind adoption of what the AI system are, but taking more of risk-based adoption for those, risk-based oversight around these, as well. Uh, from having a black box of not knowing who's accountable, why it's actually doing it, now you have an accountable system behind it. And then, like I said, for audit, from static reviews, it now becomes more continuous assurance. So, it's still the same disciplines, just new domains. So, if a model can decide if someone

gets a loan, I think it should probably get maybe the same or maybe a little bit more governance than a human loan officer. I think most of you would agree to that. See, some heads are shaking. Um, AI doesn't remove accountability, like I said. It just redistributes it, and it is our jobs to decide where that belongs. Where does accountability actually belong? So, I'm not saying you need a new team, you don't need a new framework to get this AI governance. It's more around doing what you already have. So, inventory your systems, assign owners like you would for any other service, score the materiality to see whether you need oversight, add any drift and explain ability metrics and finally

treat them like your deploys, right? How do you any retraining that you do, treat them like you're actually deploying a new feature to a product. That would automatically make sure that you are testing and making sure it's compliant, making sure it doesn't, you know, give the wrong outcomes to our customers and so on. So AI is becoming a decision maker whether faster than we know it, whether we like it or not. So it's up to us to make sure that those decisions continue to remain accountable, fair and explainable to anybody who comes and questions that. So we began by asking what happens when machines start making decisions faster than governance could ever keep up. But

I hope that today you learned something to adopt your frameworks to keep a accountability in human hands. Thank you so much for sitting through this talk. Now it's time for Q&A. Anyone have any questions? Please raise your hand. Okay. I'm going to bring you the mic.

Hi Felipe Velez, I'm a lawyer. So be aware. Uh first thing, I mean, I think first of all, is that your methodology that you created or what what are you following on? It's it's not something that I created. It's not. It's just something that has always been around with other frameworks as well, just putting together all of the things that we do in the industry already, just making sure we can leverage that for AI systems as well. Okay, so it's based on a stakeholder uh Sure. Okay. Now, so based on that, how do you assign ownership? Uh especially what will be the criteria to assign ownership on on this on this system? That's a really good question.

So ownership, like I said, it's redistributed, right? It's not like one person or a team that is handling it. The way I I about it at least is, right, when a AI system is being built, you go through multiple phases. You start from design, you start from testing. All of that is being done. So, there's always a team or somebody owning it in each of the places, right? Like, each phase of the product getting deployed, you're testing something. So, all of the ownership would fall on either the team that's working in that part of that phase. So, it's not just one team anymore, it's just distributed amongst different teams within the same organization. Hope that helped to answer. Okay. Thank

you. Got it. Sounds good. Fantastic. Give me 1 second to get up there.

Also, I saw a couple of more hands in case we don't get to, please feel free to ask me in the fourth floor. I'll be up there after the presentation. Sure. Yeah hi. So, I don't really work in the GRC-related stuff, but from what I could tell, uh it sounded like we're adding a lot of overhead just to use AI. Yeah. So, doesn't that like defeat the purpose? Yeah. So, good question. Um I know it does sound like that. It usually does come off that way, right? It's any new service or any new system, you feel like, oh, this is a whole new team, the new work, and new Like I said, it's not bureaucratic, but it does sound

like that, right? Like, there's a lot more paperwork. But, what I do want to iterate is that everything that was presented is already existing in most of the organizations in different terms. It's not probably not the same like impact, autonomy, and sensitivity. They probably have their own risk matrix. Just making sure you adapt those to AI system is all we're saying. Like, just make sure you convert or make sure you shift your mindset from adapting to what your static software uh test would be to more dynamic testing on like what other AI systems would have. All right. Thank you. I think there's one more. >> All righty. Do we Do we have time for one more?

Fantastic. Sit down in the front. Thanks for going up and down. Hey, that's what I'm here for. Anything for you. All the way down there.

So, question for explainability metrics. How do you add explainability metrics from the output of an LLM? Because you can go ahead and maybe log its chain of thought, but that's not a metric. I do want to say that's a good question. Uh I'm not really familiar with LLM models as much, so I will do my research on that more, but uh when it comes to actually metrics around explainability, there's a lot right now on how do we add metrics like G R C team mostly involved in how do we add metrics within these, right? So, like you said, like you could test for Let me try to get an example without mentioning anything, but um if you're

explaining that okay, it's doing looking at these criteria and testing for fairness, for example, that's how the decision is made. That's how you're explaining it. The G R C controls would look around is there a drift in that path? Is there a drift in the fairness test or any of the checks or is it adding more checks without our knowledge, right? It could have a certain criteria that you've already retrained it with, but like I said, it's learning. It's always learning, so it might add its own criteria, so testing that to ensure that the new criteria added is actually fair and it's ethically okay rather than making, you know, letting it go like, "Oh, it's just

adding new criteria. It knows what it's doing. Let's just, you know, allow it to explain its own thing." Rather take the approach of being more um I want to say more um losing the word, but making sure you're actually looking at these systems beforehand rather than just assuming that everything it's doing is right. Hope that helped answer. Sure, thank you. I thank you guys. Thank you all. Thank you. Thank you everyone.

[ feedback ]