BSides SP 2025 - Você sabe de onde veio sua IA?

Name: BSides SP 2025 - Você sabe de onde veio sua IA?
Uploaded: 2025-06-20
Duration: 43 min 54 s
Description: Título: Você sabe de onde veio sua IA? Resumo: “Você sabe de onde veio sua IA? A cadeia de suprimentos de modelos de Inteligência Artificial está crescendo rápido — e, junto com ela, os riscos. Em um cenário onde é comum baixar modelos de repositórios como HuggingFace ou GitHub sem validação e coloc

BSides São Paulo43:5483 viewsPublished 2025-06Watch on YouTube ↗

About this talk

Título: Você sabe de onde veio sua IA? Resumo: “Você sabe de onde veio sua IA? A cadeia de suprimentos de modelos de Inteligência Artificial está crescendo rápido — e, junto com ela, os riscos. Em um cenário onde é comum baixar modelos de repositórios como HuggingFace ou GitHub sem validação e colocá-los em produção em tempo recorde, estamos abrindo espaço para ataques cada vez mais sofisticados: datasets contaminados, pesos alterados com backdoors, scripts de inferência maliciosos e muito mais. Nesta talk, vamos explorar casos reais de supply chain attacks em IA e discutir ferramentas e estratégias para evitar que o seu modelo se torne o elo fraco da sua arquitetura. Vamos responder perguntas como: Como adversários inserem comportamentos maliciosos em modelos pré-treinados? Quais são os pontos críticos de risco na jornada do dataset ao deploy? O que é possível fazer, na prática, para proteger sua organização desses ataques? Se você trabalha com desenvolvimento, segurança ou opera modelos de IA — essa é a talk que vai te deixar com a pulga atrás da orelha (por um bom motivo).” Palestrante: Larissa Fabião da Fonseca

Show transcript [en]

Okay, now there's going to be a talk about , you know, where SU came from, with Larissa. Welcome to P's block. Hello, everyone. Good afternoon. Nice to meet you, I'm Larissa. So, today I'm going to share with you some research I did, research that was really about seeking knowledge on the best ways to understand the origins of the AI we've been using and also how to protect that AI. Well, there's also a great need to protect what we're not seeing but is in use, whether within a company or even in our daily lives, in short, in our routines with what we use AI for. Well, I'll have to keep stopping by here, but anyway, it'll work out. It's done now.

Well, a brief introduction about me. I have a degree in information systems from USP, here in São Paulo. Yes, I have a postgraduate degree in Heading Operations from FIAP. Yes, I love them very much, I'm very passionate about CTFs, and more recently about AI CTFs. If you're curious, Vila Gi, which is on the 3rd floor, uh, on floor E3, has some AI CTFs, we hold some events there, uh, specifically about ICTFs, about AI in particular. And I'm usually involved in organizing these CTFs focused on AI, and also, as I said before, I'm part of the AI community. For anyone interested in the subject, who wants to understand more about AI security and really wants to delve a little deeper into

this separate world, I highly recommend joining Village, because we have several channels there for exchanging very relevant information about AI security. Well, the first question I wanted to bring to your attention for reflection is: do you know how many models you use in your daily lives, or how many models exist in your company? Well, that's a somewhat complex question. I think no one will ever have a specific answer, but when you stop to think about it, it opens up a whole new world of questions. Nowadays, we don't even have just one specific area that uses AI. The entire company can use AI and have separate models, each using it in its own way and exploring it in whatever way it sees fit

. And besides knowing which models exist, do you know how many of them are tested? Validated or subjected to any type of analysis, whether by a security team or even by the user themselves before clicking, downloading, and using the service, before logging in and entering any information. Well, and even looking a bit more at models, uh, open-source models, which you can nowadays upload to your machine, run that model, do you know how many of these models go through some kind of scan, vulnerability check before you go there and run it on your machine or on anyone else's machine ? When we start thinking about these questions, the feeling is literally one of chaos. Nobody knows what they're

doing, nobody wants to stop and understand what they're doing. And generally, there's no way to bring order to this mess in a very organized way because there's no control. It's very difficult to control any kind of use of A or any tool of A that enters the environment nowadays. That's because even if you manage your suppliers, a large portion of the suppliers you already have in your supply chain are implementing AI in some way. So that due diligence process, that validation you did with that supplier in the past, has most likely changed in some aspects, because they've also started using AI. And considering private models as well , you'll hardly be able to understand what's happening

in that model. Understanding what goes on behind the scenes of GBT chat, for example, is impossible. In most cases, we trust that they won't disclose our data. We try not to share sensitive data, or we trust that our collaborators, the people who use the tools, won't share sensitive data, and we continue using them indefinitely. Okay, but why does this matter? When we look at image direction models, and any model we're looking at, whether it's image direction, text direction, literally any model, the impression we get while using it is that it's magic. You use natural language, explain to him what you want, and he gives you back what you thought you wanted. Sometimes it's not even that close, but he gives you

something different back with the information you asked him for. And as much as it may seem like magic, this isn't magic; this is software, and like any other software, it can be attacked, it has vulnerabilities, and it has risks during use. Well, there are a few cases I wanted to bring up to illustrate this point, and they're very much focused on this idea of the supply chain, where when we look at companies, it shows how this is closer than we think to the reality of attacks affecting people in their daily lives. Well, this article is about a Python package. We'll see later on, with the package itself addressing this, but creating several articles and generating a lot of

alarm, because one of the Python packages used for model generation was contaminated, and any model that used this package, up to version X, was consequently contaminated by malicious code. And when we look at the models being uploaded to Hugin Face and KG and other platforms, many people don't even know what's running behind the scenes of the model they're downloading. And it's likely that the legacy model she downloaded might have a vulnerable version of that dependency, and there's a malware running in the background collecting information. In this specific case, it wasn't something so critical, but there is research showing that you can have a backdoor there, you can have information being sent to other systems, and it ends up going

unnoticed in everyday use due to indiscriminate use of the model. Another example, it's a really cool research study and I highly recommend reading it; the link will be in the slides I'll be sharing. It talks about how scientists were able to use the Hugin Face model and identify a silent backdoor within that model that was running and collecting information about what was being used in that model. For those who don't know, I haven't included an explanation here, but Hugging Face is like a GitHub for templates. Then you can go there and download any template you want. And in practice, when you download that model, it may or may not already be pre-trained, right? But anyway, he'll usually be

pre-trained. You just install it, run it on your machine, and start interacting with your own AI in your environment. So, as I said, you download it to your environment and run it on your machine. He will generally have access to everything on your machine and everything you set up for him. Therefore, if it has a silent backdoor, depending on the access you have, including, uh, in the research, they also highlight the part about, uh, command control, right, remote access, with remote command of the machine, you can have a considerably dangerous attack vector running on several machines in the environment. But anyway, like I said, why does this really matter? In addition to the articles, there was

one more item that caught my attention while I was looking at it: the 2025 update of the OASP Top 10 for LLMs and GNAI. It used to be just LLM, now it's LLM and GNI in the same top 10. And item three in that top 10 is precisely the supply chain. And this was something that wasn't so different between A-models and the traditional Supl model before, but even in the OASP article explaining the top 10, it brings up some very relevant things that are, in a way, specific to models. That's why it 's so important for us to start talking about this and start looking at these subjects that tend to become more and more common with the

growth in the use of "a". Okay, but what is the air supply chain ? When you stop to think about it, it's a very abstract world overall. As I said, you generally won't have access to what a business model uses and how it operates behind the scenes. You won't get much information on how they run, even if OpenAI, DPSIC, etc., release papers explaining how the model is built. The infrastructure in practice, how it runs, how it's maintained, will never be public knowledge; it will always be a trade secret. And even for these pre-trained models, even if there is documentation, you're downloading a somewhat compressed, executable file, so to speak, which doesn't give you complete information on everything that was done

during its development. So, it's a potentially malicious file, and that also has to be considered throughout the entire chain of everything we're going to develop for AI. I've brought here an image that's quite complex; we're not going to use it as an example. It's right here on the slide itself; when I share it, you'll be able to access the research that describes this entire process. And this specific research attempted to dissect everything that currently exists regarding supply chains focused on LLMs. And we 'll notice that there are several recognizable logos, because there are several traditional tools that are part of the supply chain, but there are several processes and people involved who aren't always involved in the same way

as they are with common software. So, we have our own contributors, who in the case of IAS, for example, serve to retrain the model, depending on how your model is implemented. You have your own consumers in the middle of the development chain, you have your own end users, in short, in a way it's a very messy picture and, at least for didactic explanation, it was n't very clear to me. And because of that, I took and summarized it into a few processes that make more sense to me when we're looking at the supply chain in a more comprehensive way and from a safety perspective. Well, just to be clear, it was something I put together. So, there

are many more steps than that. It's really just a didactic summary there of what some of these main processes would be. Well, each of those steps in the supply chain, in that development flow, will end up having its own risks, its own challenges. And the idea behind the image was to try and convey a bit of that. Well, the first step , when we think about developing an AI application or having an AI application, would be to have a dataset, that is, what data we are going to use to train that model. And if it's already pre-defined, whatever data was used, what data we need to supplement that model with, in short, the data is the basis for

any model of A. When we think about the development phase, when we look at this stage, we're going to have some risks of poisoning, label flipping, and the very injection of biases that end up happening at this stage. And if you don't handle data properly, you already have a first risk in your supply chain. After the dataset, we move on to the training stage. So we're going to select the algorithms, start applying the training, and then we'll encounter a series of other risks. We can have backdoors, we can have training in an insecure environment, which, in short, can lead to changes that the developer themselves may not be aware of or understand . And there's also

insecure trust, blind trust, which is another issue of blindly trusting our data, our training process, which also generates an additional risk. We then moved on to storage, which is also a focal point that we'll explore further in the presentation. This involves taking this pre-trained model, compressing it, or taking this executable file, so to speak , and making it available to other people or putting it into production. And here we can have several other risks of file tampering. We can have this model being exposed, thinking about it as a commercial model in an inappropriate way, or even a lack of integrity validation, a lack of HTING, which are somewhat common things for code, source code, and other things, but

for models are still new or people don't worry about them as much. Then we have distribution, which is really about where and how it will be shared; we have deployment, which is precisely about executing this model in uncontrolled environments, interacting with people over whom we no longer have practical control; and finally, updating, which is the continuous process of retraining, ensuring that the model is running, implementing new functionalities, and so on, as happens in any other system. I won't go through every point here because there's a lot of detail, but I'll summarize the main risks of each step before we move on to a more practical example. Well, when we talked about the data set, which would be

this first stage, we have the issue of data poisoning, which is precisely the poisoning of the data, inserting malicious data into the training dataset, and then passing it on to the training stage. There are already tools available today that validate the pre-training dataset to search for biased or malicious data. It 's not very common to use, but there are already ways to perform this validation, which is important to actually do. There's label flipping, which is what I mentioned when we blindly trust the labels we're assigning to the data during training, or during the training of models, or during the training processes, the dataset creation process, and some of those labels are switched. And then we

can even generate bias, we can generate various impacts, both social and security-related. Well, we also have bias injection, which is when we manipulate the training, for example, to exclude a portion of the training data and focus only on one of the other portions. This causes that model to actually ignore a target audience and generate that bias in the future. Finally, moving on to the training phase, we begin to consider the issue of shared or public environments that may be compromised. During training, you could have, for example, a malicious actor entering and altering some stage of the training, modifying a weight, modifying a critical criterion that will generate a different result. Well, we also have backdoors in the weights, and that

's a very interesting case that we're going to explore in a really cool article where I'll break down how this would behave in an attack, which is precisely about hiding a backdoor in the weights of an AI. And then when you compile that model, you make it ready in that executable model. The person will download that template and use it, loading the template onto the machine. When it loads, the model will automatically execute this malware and infect the machine. This is something that research has already proven is possible, and there are models on Hugin' Face today who do it. So there's already that bias towards attack there. And we also have there, ah, training by a

third party without validation, which would be blind trust. If you're training that model for someone you don't trust in how they're applying it, it ends up resulting in inappropriate training of that model. We also have the same security issues with storage as with other types of storage in general , such as storage in an insecure location, without authentication, without validation of who accesses it, and without access logs. Well, we have models saved in vulnerable formats, which complements the previous point and creates a bigger problem when we talk about the model development process . Formats such as PKL, PT, and Job LIB are already known to be vulnerable and allow code execution. And here's where the same thing about storage comes in

. Storing your model in a format like that is a problem, just as consuming models in a format like that can also be a problem. In distribution, we also have common access issues that we think we wouldn't even have on GitHub anymore, like Hugging Face, in addition to a lack of automatic validation, a lack of auditing in these models that have already been shared and are running in production in a way. And the last two steps in the deployment, as already mentioned, will involve granting broad permissions. So here are ideas where you don't control what your user has access to, you allow them to do whatever they want within that model. Well, there's also an issue there

involving the deployment and update process, which is controlling and filtering the types of connections entering that model, what's interacting with that model, to prevent it from receiving an inappropriate connection or, well, an inappropriate write operation within that system that could affect it during deployment. And in the update phase, we also look at retraining issues, which mainly involve attacks that occur when retraining data is used without auditing, and which can also infect that model. And we also have the issue of other abuses, such as a lack of break limiting, which can cause the model to become unavailable. There are issues there, like a change during this redeployment and update process that generates a new vulnerability, it's not validated, and so

on postponing this malicious artifact within the model itself. Okay, moving on to the more conceptual part, I wanted to give an example of how this can actually happen in practice. And this article about the Hidden Layer, it's a lengthy article, it's a heavy read, because it really involves bit manipulation to hide this layer. I'm not going to go into the minute details here, because otherwise I'd spend hours talking about it and explaining how it works, but it 's a real case and it's one that may already be happening and generally doesn't get as much visibility when we look at models in production. In the case of Hiden Layer, they took and sought ways to hide a tool that could actually

be executed on a machine and during the execution of a model, but without affecting the normal functioning of that model. And to do that, they used some formats like PKL, which is from Python Pickle, Job LIB, or PT, which are from Pythk, precisely to load these models into the serialized model. And from the execution of this model, when the user went there and imported the data, it would automatically execute a handsaw within the user's environment. Yeah, and it's really cool because the execution they show actually happens in seconds. The person can't even think that they imported a malicious template and ran that malicious template, and thus in seconds the machine is already 100% infected and

the person loses access to everything. Okay, oops, that worked now. How did this technique they used work? Their first step was to exploit the malicious serialization; in other words, they built a traditional model, went through a training process, even taking a pre-made model and editing that already-functioning model. And their trick to make sure the model did n't realize it had been altered was precisely to take the model's weights, which are the nodes where we define the probability factors that the model will use in its calculations to arrive at the final answer. They took the bytes, uh, that number, right, it's going to be a float, it's going to have some decimal places, since it was a

number with a larger number of decimal places, in this case, if I don't remember, there were six, they chose to take the last two decimal places as weights from random nodes, which were nodes that had extra space left over in the memory allocation during the model serialization process . And within these nodes, they proceeded to hide the codes and files of the running Hansware. We have an app. Well, by doing that, what did they manage to realize? The change to the model was very small, meaning they could continue running that model, the user would get the expected answers, and

the best thing is for them to leave, because it's not difficult for them to exit.

Well, continuing, this change they made there in the model's weights , in the calculation of the nodes, was a very good idea, because in the execution, in the practice of the model, it doesn't interfere with how that model runs. This resulted in a difference in the calculation of weights and probabilities of thousands of points. It wasn't relevant at all to the final operation of the model. No matter how much a person tried to execute this model, they would never actually see the final result. However, it was only a minimal understanding of what was ultimately happening there. And this also helps with some validation issues, with model validation codes and answers that they look for, specifically deviations from

common parameters. However, they don't actually test this in malicious template validation tools, but rather the way this alteration is made was truly imperceptible. After performing this execution, this change there, hiding the template in those bits, they arrive at the second stage, which would be to embed this template in a common repository. Well, in this case, they use the Hugging Face to perform the submission. They don't actually put it into production for ethical reasons. However, a very curious aspect of Huging Face that makes this attack possible is that Huging Face does not block the upload of malicious templates. It can even validate whether a model is malicious or not, but the only thing it will do is display

a warning saying that the model is probably malicious, explaining which file of that training model actually has a malicious item. But even with the warning, you can go ahead and download it. In fact, there are templates that are already known to be malicious and continue to have high download numbers, showing that the protection that's there is really just for show; the person might not even see it, especially if they download the template via the command line, in which case the warning becomes even more faded during use. After downloading the model, the user proceeds to execute the model, which is when they will load that model using the pick, load, or torqu functions. These are functions that also allow

code execution during the decentralization of the model, and thus, the machine is automatically infected by a handswer. Yes, the main vulnerability in these items is precisely the Python function they use for serialization, the Python models they utilize, which have some insecure methods, such as WDUL, and also the execution methods that allow this code to be executed. And any model that has this insertion of malicious code tends to have its execution completed and not detected by these other models there, at least by storage tools. It can be detected by EDR . There is some research that also seeks to find ways to bypass the execution and detection by EDR, but these methods greatly facilitate the infection of the

model without the user even knowing that there is malicious code inside it before it can be executed on the machine. Okay so to summarize the steps, they basically take a malicious ppkl file , which is precisely the trained model available for execution on any machine. Yes, the developer downloads this item. With this item, he will use the torque load or the local load to perform the loading, right, to perform the calibration of that model and load it onto the local machine. And automatically, with these functions, it arrives at the execution of the malicious code that makes handswf become active in the environment. In a way, it's not very different from any infection we might have in

our daily lives. Any infection that you might get from ransomware will follow a similar path in a way. The person will execute some file in a malicious way that will generate an infection. Well, the image quality was pretty bad , but oh well. It's also quite interesting to consider that the infection mechanism is similar. They are using malicious code or computer functions that enable this type of attack. The image quality is poor, but I'll show you the slides. There's also a link to the complete article, but this is a table with all the vulnerable formats that exist today that allow code serialization and code execution during the deserialization process. And a very important point is that these

three up here are Pickle, PK, and the third one I can't remember the name of off the top of my head, but they are the three most used formats for LLMs and within Hugging Face, that is, the standard formats that we use to store uh models, they are precisely models that are already vulnerable formats that allow this type of attack. And most models, at least open source ones, that you run on your machine or anywhere else , have the risk of being contaminated with something malicious. Well , in the research itself, in the results they present, they also highlight a very interesting point: the risks of adversarial attacks in EA, including data poisoning or even

backdoors or similar issues, reinforce the importance of rigorous validation of training data and models to prevent malicious attack vectors. This is something we usually do with most libraries, code, and items we include in our development process, but it's not being done with models. This creates a significant vulnerability for sophisticated attacks that were previously prevented to reappear in companies that ignore this aspect of development and believe there's a magic solution, failing to reinforce that these processes, being software, also need some kind of validation and protection during the development and implementation process. So, like I said, it's a process like any other. So I tried to outline some things we could consider to protect this process and prevent this attack from

happening if we had been running that malicious open-source model in our environment. So the first point is, if a developer, for example, were to download a template with a malicious PKL, if we had any kind of validation of the origin or reputation of that template, we would have already blocked the execution of that improper template at the first stage. Even though when he tries to execute it, he won't have visibility there and would actually become infected anyway. A simple step back, like validating the origin, validating the reputation, or even scanning for vulnerabilities in that model—and there are already AI-powered scans—would have raised an alert and shown that the model has some risk and that it's not ideal to run it

indiscriminately on your machine . Well, the second step, when we look at the developer downloading, there are ways, and in this research, they even include a really cool and specific Iara RL tool to detect malicious templates that could also help us detect this download, this file on a collaborator's machine or in some insecure environment. And some EDRs are already able to detect malicious items or libraries with certain IOCs that are already documented. However, what's great about Iara Ruly is that it captures much more the behavior and use of certain specific functions, which in this case ends up being more efficient than simply looking at it in a general way and blindly trusting the EDR detection.

After that, when we look at the developer loading the tor.loadload file, it's a disaster. It would be very interesting if the execution of these unknown models wasn't done on the collaborator's machine or on your regular machine. Ideally, when you come across a template like this that you're unfamiliar with, an open-source template, even one from HugFace, without any warnings or other sources, you should treat it as an unknown file, potentially even a malicious one. Therefore, the first time you run this model, it's beneficial to create a sandbox environment to analyze its behavior. This would have prevented the machine from becoming infected completely. So using the sandbox environment when you're discovering a new model, especially when it doesn't have a

strong reputation yet, is useful for preventing that kind of attack as well. And finally, we have the execution of the malicious code, which is when, at this level, when the person is running the model, we have to trust that the EDR or any security tool you have in your environment will block the execution of the Python commands. In other words, there are several steps you can take beforehand to ensure the prevention of this malicious model from executing before it even reaches the point of executing the Hansfor teaching environment. And none of these previous steps are new; in a way, they are common things, but with a slightly more applied approach to AI and removing that mystique surrounding AI. In other words

, you're actually including AI in your company's security process without considering it to be, well, something invincible that's there to solve all problems and doesn't bring new problems to the environment. Hey, it was like I said, another case that could generate a problem as big as this one is the Pythk library in the version... uh, I can't find the version off the top of my head, but it's the versions before December 25, 2022. It 's a library used for developing IAS and LLMs that's infected by a malware, meaning it has malicious activity inside it, and people were downloading it onto their machines, causing information leaks, and often didn't even know they were using it . In other words, it's an alert

generated on December 31, 2022. And there are cases of libraries that still use this type of dependency, and it's also a validation that we already have common tools that scan for dependencies and can map these vulnerable IASs in the environment. Yes, I've also included some additional CVS examples here, in case you want to explore a bit more of what already exists regarding vulnerabilities related to the AI world. There are some very interesting vulnerabilities, and for those who have never searched for what's in use in the environment, it's a good starting point to look for any vulnerable dependencies related to AI. Well, when we look at these risks, we keep repeating that we have training problems without

complete validation. There are governance issues in the use and development of models; that is, indiscriminate use and development without validation or oversight from a security team. We have unverified downloads, meaning that nowadays we blindly trust platforms that offer models like Hug, Huging Face, and Kagel. However, they are not safe in that way, and should also have a series of precautions, in short, usage controls. And we also have the execution there without isolation, trusting that the software is actually good and reliable, when many times it may not be. And we also have a lack of support in the facilities and in the day-to-day development process of the model. Well, and here we start to look at these prevention methods when

we talk about the supply chain, because as I said in that example we used as a case study, the protection methods are, in a way, traditional; they don't change much from the norm. The main thing is for us to focus on including AI in these existing protection models . When we look at the issue of datasets, we have to validate the origin of the data, sanitize the data set expectations based on the data we 're using, in other words, ensure that the data selected for training the model is actually ideal for training that model. We also have versioning of that data there. It's about ensuring they have backup copies there, that they have controls in place. If it is

modified, you have the versions there, what was changed in that dataset, because that can also influence the final response of a model. In the training phase, we use isolated training environments to ensure that nothing will interfere during the training process, and to allow the inclusion of malicious software during the development and training of the model. We have the logs, the use of logs, and ultimately, information aligning this code and this model development. And we also have some anomaly detection tools that allow you to search for anomalies in that code developed specifically for AI models. For the storage phase, we now use digital signatures for the artifacts, which isn't new, but it's useful for ensuring that your model hasn't been

altered during the storage process. We have access controls in place to clearly ensure who is accessing, editing, and so on, managing this model in a certain way. We have there the use of hashes and automatic verification for C and CD. This is also common practice for code in general, and the sets in CICD ensure that during the insertion process in CICD there were no improper changes that could have caused problems for the model itself. We also have a preference for secure storage models, which is where Saf Tensors comes in. It's a model developed by Hugin Face exclusively for secure model storage, preventing the execution of malicious code during deceralization. We ca n't say it's 100% safe,

but it's already better than the others. Since it's a new model, there may still be vulnerabilities and ways to exploit them, but it's already better than the others that we know exist and haven't fixed—the problems that exist with code execution during decentralization. In distribution, we use repositories with access control, validating who is downloading or using that model. We use Iara Roles specifically to map vulnerabilities in downloaded files. We have validation on CD and in a sandbox before the final deployment. And we have SB WHO verification. We'll talk about that a little more at the end as well, but those are mechanisms to validate whether that model actually has a secure signature or not in the end.

Well, for deployment we have execution in containers, which also helps to have a deployment in a segregated and limited environment with reduced access. We monitor the execution of this code, and, well, the deployment of this code process. We also have, uh, the policy, uh, the use of job timing and resource policies for inference, to ensure that the entire process of running that model, of model inference, is also following clear rules and in accordance with what was planned. And finally, in the update, we always have automatic auditing of this data and retraining, which monitors the model's performance over time. There may be performance changes that indicate an infection or some malicious actor misusing that model. It's important to

also have backup techniques during the update phase , because if a problem arises, you also need to be able to make a change during the update process, just like we do for other models. Or, for example, if I identify something malicious in my model, I need to be able to revert to ensure that the previous model at least continues running. Ideally, we should have logs of these updates and a human review, if possible, and not just an automated one, of what has been changed in these models to prevent any unintended updates from slipping through into that model. Yes but as we said at the beginning, if you do n't know where this model came from, you

shouldn't trust it. And you can never, you will never be sure what that model is capable of doing, if you weren't the one who created that model. Therefore, we have several other techniques that we can use to prevent these patterns. So, where can we start searching for vulnerabilities and monitoring what's happening in the market and what research is emerging about new vulnerabilities in the world of AI? I've cited some sources here. Nich's CVE and NVD are good sources for AI, but they are mixed with several other things. Well, there are some blogs and, well, companies that also focus a lot on research in the world of AI security, such as J Frog and Hugging Face, which have very

comprehensive blogs and usually with really cool proof-of-concept examples of the vulnerabilities that arise in the models, which helps a lot in understanding and ensuring that you actually protect that model correctly. Yes, we also have tools like GitHub Security Advisors, which can already capture these new CVS (Critical Vulnerability Systems) and, in a way, map whether something is vulnerable or not. And I also put a repository on GitHub that's focused on model supply chain, and it 's a repository with various tools, several articles, and it's always being updated with information on how to protect your model development process. Yes, and finally, as I said, and there's also Pythor himself, the Pythk blog always tries to keep updated with

vulnerabilities, but it's much more focused on Pythk himself than on the community as a whole. Well, apart from that, we already have some open-source tools that perform validations and create models, and they are very interesting to implement in this development chain to ensure that we detect this malicious code being executed within the model development process as early as possible. Hey, Garak and Cyber Eeval, their links are all here, including for you to access. Well, one that I'm really curious to test more thoroughly, but which has been giving some very interesting results, is this discard tool, which is essentially a kind of SAT for AI models. And from what we've seen, it has been quite interesting to

put into development within the CI/CD modeling process. And many of them already integrate with traditional CIC tools , which further facilitates their implementation in existing development processes. There are also commercial tools, and others that I haven't included here. Well, Hiding Layer itself has a very good tool for detecting backdoors, specifically in models from Protect AI as well, which I think is a great reference. And one more thing, I also wanted to highlight a really cool tool called Hunter, which is actually a bug bounty platform for AI models. So, if one day you 're using a model or searching on Hugging Face or some other platform and you see that the model has a vulnerability—not a joke—you

can submit it to Hunter, which also has a compensation process for companies or developers that participate in their bug bounty program. But ultimately, uh, I'm not going to go through all the items listed there, partly due to time constraints, but I wanted to emphasize that this is on the slide. I ended up doing this research looking for ways to have a basic checklist of what should be implemented to protect the supply chain process as a whole during development, and I created a checklist with several practical steps of what should or should not be considered in this supply chain validation process. So, we have auditing and validation, what should be audited, what tools are suggested for

each stage, what you can look for, for example, in insecure serialization formats, what formats you can create blocking rules for, or at least educate people not to use or to have more confidence in these models. Safe execution. What would be the safe execution process there? So, running in containers or in a sandbox, limiting execution time, memory, and using some specific functions in some containers to block security issues when testing this model. We also have there what wasn't monitoring and detection. So, what could you monitor, what are some interesting rules and tools you can use to monitor the type of improper action that comes from this type of attack? some safety tests. And here, I'll also reinforce that whenever

a company develops and distributes models, it's very important to have model testing teams. There's the traditional R team, and nowadays there's also the R team for AI models, which are very interesting tools and teams to ensure that the model behaves as expected. We also have governance and traceability items. So, what are the basic governance and traceability controls that you should implement in that model? And finally, vulnerability monitoring. So there are those tools that I brought earlier. There are also some important things there that you can use for listening and, ultimately, for periodically auditing your surroundings . One rule, a very special tip that OASP itself gives in the top 10 of LLM NAI, is a reference

so that in this control you also apply the controls from top 10 to 6, which is vulnerability and, well, software, another one, right, outside of outdated conversion. And we also have some rules there for continuous revalidation, which would be steps on how to conduct audits, how to ensure that the model is actually running as expected. Well, I know it's a little small, but I'll share the slide on my LinkedIn and also on Vila Gi's WhatsApp. And then you can grab and copy the complete checklist right there. Well, finally, I've included some additional supporting materials here. Yes, there's that very article I mentioned that explores the supply chain flow for LLMs in much more detail . He's very focused on LLMs, so he

won't end up covering other types of modeling software, but it's a very interesting read, plus there are some bloggers I highly recommend reading, like Hiiden Layer, Py Tork, J Frog, and ASP Top itself, who cover LLMs and NAIs. That was it. I hope you enjoyed it and that the content was clear overall. And if anyone has any questions, I'm also available to answer them . M.

BSides SP 2025 - Você sabe de onde veio sua IA?

Related talks