BSidesSF 2026 - The great SAST dissonance: how to please every... (Claudio Merloni, Romain Gaucher)

Name: BSidesSF 2026 - The great SAST dissonance: how to please every... (Claudio Merloni, Romain Gaucher)
Uploaded: 2026-05-12
Duration: 50 min 37 s
Description: The great SAST dissonance: how to please every audience, at scale Claudio Merloni, Romain Gaucher SAST tools hit a sour note with modern apps with a dissonant coverage that leaves stretches of code unheard: a dangerous sense of security. An AI conductor can fine-tune the orchestration for each app

BSidesSF50:3714 viewsPublished 2026-05Watch on YouTube ↗

Mentioned in this talk

Tools used

Semgrep

About this talk

The great SAST dissonance: how to please every audience, at scale Claudio Merloni, Romain Gaucher SAST tools hit a sour note with modern apps with a dissonant coverage that leaves stretches of code unheard: a dangerous sense of security. An AI conductor can fine-tune the orchestration for each application, letting human experts focus and produce the right mix of coverage and findings. https://bsidessf2026.sched.com/event/06bdf168c53ba6685aab4c1619ba8320

Show transcript [en]

All right, so we are going to get started. Thank you so much for attending this talk. Uh my name is Toma Taylor and I will be your room facilitator and make sure that you have the best talk experience in this theater. This is theater number 10. Today we will be listening to a talk on the great SAS um dissonance how to please every audience at scale. It will be presented by Cladio um Roma is here um but the talk will be delivered by Cladio um Malone Melani. So Cladio is a staff security research researcher at Sam Grep with over 15 years of experience in application security and software engineering. um based in Paris. He focuses on scaling

security through static analysis SAS, developing uh advanced detection rules and promoting paved road initiatives that u prioritize secure by default development. So again a round of applause for Claudio and take it away Claudio. Thank you. >> Thanks so much for introduction which I did not expect. Um well, welcome everybody, brave people willing to learn or hear about SAS in 2026. Hopefully I will tell you something new today. Um and hopefully I won't move too much. So my usual Italian soul makes me move a little bit around. Yeah, there's a little bit of Larsson. And um so welcome. The talk is titled great SAS dissonance because I needed a catchy title that fit like that fit fat

fit into uh the Bites team. Uh but also because if you think about SAS like we are it's a kind of technology tries to solve very complex problem. If you're trying to build a like a general purpose solution, uh you're kind of pulling it in many directions, a little bit like an orchestra where we have a number of like instruments not playing exactly in tune or the same um the same kind of music. Um the agenda is roughly that I won't bore you with uh a 20 minutes introduction about SAS. I promise. I'm just going to touch a little bit uh on a couple of concepts because it serves like to set the con the the context of

the talk. Um my main focus is going to be around coverage and like very abused war. So I'll explain a little bit more what I mean about uh coverage. Uh but I want to focus especially on how um I've been working on and like trying to scale the way we can expand coverage um in a way that actually makes sense especially like it from my perspective for a general purpose tool but uh hopefully everybody can get something out of it and can apply to their own uh situation. Uh who am I? You already got a little bit introduction. Um Roman Gocha, head of security research at Sangre is sitting there and we'll take all of your questions. I'm just going to

deliver my content. And uh I'm Cladio Claudia Merlone. Yeah, I've been working in uh security before it was called cyber security for a very long time. Been doing a bunch of things but got stuck into SAS for some reason and I really like it. It's a very cool problem to kind of solve because you kind of never solve it. Um so uh first why the problem is hard um and what the problems is like SAS probably you know what it is if you're here you heard about it you've been kind of talking to many vendors and probably you've been using this kind of a solutions the idea is pretty simple we want a tool that

takes the source code automatically like kind of a blackbox that does something with the source code identify in a number of possible ways security issues without executing the source That's the the whole idea behind like the the uh this term SAS like static. Um SAS can take a number of different shapes. Uh crap if you want is the basic form of SAS. You're ask like it's answering a very simple question. Does this piece of text exist somewhere uh in the documents or source code or whatever uh I'm I'm feeding into the tool. Uh but it's still kind of um it's sort of fast. Uh you can kind of go by level can be more complex.

You can start kind of parsing the source code trying to get some structure out of it. Typically the syntax of the code. Uh that's what you could do for example with hgraph. Um I think one of the most kind of popular maybe I'm a little bit biased because I've been working in the field for a very long time. Um but tracking data flow between or um identifying possible potential data flows uh within the source code from point A to B uh is kind of the next step and it's a little bit what I'm going to focus I would say uh the most during this presentation. Uh and of course there are like many other ways you um kind of you could push uh

SAS uh today. Well, I don't have to tell you. AI is taking um I would say a large place there. Um you can mix up uh traditional static analysis and um AI LLM. And of course there are other techniques where you want to prove for example the absence of problems. That's where you go into the kind of formal proof u area. Um okay, what's the fundamental problem? Well, it's very complex. It's extremely complex to analyze source code automatically. Um, and even if we don't think about the actual program analysis, think about well, for example, your own company, how many languages probably your your company's using, your engineers are using. Um, there's so many languages support that it makes the

problem immediately pretty complex. Uh on top of that, well the tool must understand something about the source code to be able to say well this is a place where a vulnerability could happen. Uh and you have to translate that in a way for every library and framework that your code might use. Uh so you see how these um kind of aspects compound one uh on on on the other and makes the the problem extremely complex. uh and then okay how the application is deployed what's the actual uh act surface like I would say it's a neverending uh quest for um finding variabilities um there's a kind of a side problem like if as a user for example typically while

you're uh evaluating tools or you want you want to try to start a SAS tools for example what you going to do you're probably going to run it on something um and I'm pretty critic in general about benchmarks because um they are going to measure one aspect uh of the capabilities of the tool um and they can be kind of misleading uh if you take for example uh something I don't really need to mention ne necessarily uh the examples here like it it doesn't really uh matter but something like the the OAS Java benchmark will focus on very specific abilities of the tool the ability to understand certain syntactic and semantic patterns. Uh and the different

test cases are variations on these patterns uh in many cases. So it's it's going to be very good to um evaluate a specific type of ability but not if the tool will be actually able to find issues in your own code. uh and you could say the same thing for I don't know here I have DVNA but you think about web goat web was not made to uh measure the capabilities of a SAS tool to find v vulnerabilities it was meant as a learning tool basically a deductive tool um so it's not a good representation of what enterprise code will look like for example if you take this example uh on the right hand side

uh the VNA the type of code you the code patterns are extremely simple. Like chances are your engineers are going to write much more complex code. So you cannot really tell if the the the tool will be able to find anything. Um and this is the other dimension which is the one I'm interested in the most uh especially in the context of the talk. Um if you look at some other benchmarks like think about Jusha for example here but again it's kind of a replaceable uh placeholder name uh the benchmark will use certain libraries um some kind of I don't know login some kind of web framework uh and a pretty small number I would say of libraries

uh well now take one of your repositories what are the chances that you're going to use exactly those set of libraries. Yeah, there's going to be some overlap probably maybe well for example the login library maybe it's so popular that you're also going to use it but then what are the chances that another repository in your company will use the same libraries so if you are trying to measure the ability of the tool to understand like sources and syncs I will uh explain a bit more um these concepts well the tool will measure a very specific um very specific ability again the ability of the tool to know about these set of libraries not all the possible

libraries. Um so if you think about this for a specific repo okay maybe it's fine but if you think about at scale across many languages across all your uh enterprise repositories well they it's very different. Um so let's talk about coverage. That's the uh that's what I'm most interested in. Uh what is coverage? I think if I asked I'm not going to do that. Uh even if Clint probably would like me to poll you guys. Um but uh everybody probably has a slightly different definition of coverage. Uh and we think about it in a in a slightly different way. Uh and it's it's it's natural I would say because coverage is a very nuanced concept. If

you if we look only at a static analysis tool for example uh you can break down coverage in many different aspects. The general idea is I would say is the probability of the tool to be able to tell you that there is a vulnerability somewhere in your code. Um the more coverage is good, the higher the probability then the tool will find something that you're like oh yeah it find something I want to fix. Um but again very complex concept uh coverage might mean okay is the tool able to understand all the languages or some of the languages uh my my code uses. Um is the tool able to understand the semantics of the code? Is the tool

able to understand um if you think about some complex web frameworks? Well, that the framework is injecting instances of certain classes because it's built that way. Something that's not necessarily apparent if you look at the source code. Um and that's I would say the more program analysis side of things. If you look on the other side, so uh in this case the right hand side, the tool also has to know about security. Uh so well the tool is going to be as good as the information you're given. Um so if the tool knows about SQL injection well it will report SQL injection issues. If it doesn't know about them will not. Uh if the tool

knows about um the libraries you're using it will be able to infer some security properties of the code that use these libraries. But if it doesn't know about these libraries, well again nothing. Um and then well let's take a single library. I have a an example later on. If you take a framework like Django for example, well you're not going to have one potential place where a vulnerability could happen. You might have hundreds because it's a very complex framework. Uh and you could say the same for uh a lot of other libraries. So this is actually the piece I'm most interested in because it's the hardest hardest piece to me to make scale because well there's hundreds of

thousands of libraries out there in every ecosystem. But it's also one that's quite possible to automate and that's what I want to show you. Now let's think again about the libraries. Um well I have a SAS tool. I want to understand if the tool is good at all. Well if I know that the tool covers like some of the most uh popular libraries and I'm going to um kind of I'm going to use the term popular maybe a little bit but what I really mean is at least in uh the data I'm going to show you um the library is most dependent on. I think it's kind of a an interesting way of understanding like or

measuring if a library is interesting to model for example for a SAS tool and okay like things like request pyamo like very commonly used libraries uh pretty sure we like the SAS tool needs to know about them because well they're probably used all over the place so there's I would say no doubt that the tool has to know about them. Uh but what if we go and start looking at well the a library that's down in the top 200 so it's not very popular is it interesting at all should the tool know about it um like I came up with a few examples just trying to find also not obscure maybe maybe you know about them but something like HVAC

uh do you ever heard about this HVAC library? No. Uh are they important? Well, it really depends like if the code use these libraries, uh they are important like uh if you take HVAC is a library used for um like to access uh the secret manage management feature of um Hashi Cororp vault. Uh well, I bet there could be security issues related to the user of the library. Like I have a very like the most simple snippet I could find. Uh there could be I would say potentially some curf uh sorry SSRF uh because you see there's some URL uh that could be used uh when you use the library um there's some potentially a

hardcoded secret. So if the SAS tool doesn't know about them well it's very simple it will not be able to find any of these. So it doesn't really matter if the library is very popular if you use it well you want the tool to know about it and pretty pretty much the same for the other example I had. Uh that's what I call the longtail. Uh so it's all these libraries which are not necessarily used by all the applications but yeah you want to kind of have a feel of what the tool knows about them. Um I took an example from an actual real uh enterprise like enterprise application uh Python based and this is a list of uh

the depend like the direct dependencies of the application and some of like exactly it's an exactly an example of what I was say like some of these libraries yeah they're pretty popular like request the seventh most dependent upon uh library uh but if we go down through this list there's uh I think one of my favorite is pi I GitHub uh it's number 736 uh so not so popular but this application is using it so should the tool know about it probably and there are about 20 different libraries in pi to talk to GitHub so should the tool know about this one or should I know about the other 19 too um it's pretty much the same for um all the others like

you have depending on the library you're going to have different security concerns but uh the idea is still the same. Um so I looked at our customers and the dependencies they use. Uh I looked at about 80,000 repositories. uh have a breakdown of um like I focused mostly on um Python u Javaish and and and JavaScript applications just because these ecosystems are basically the the biggest and most popular uh and this is the number of dependencies uh like 7,000 um I could find be used um by all these applications. Um, so how many of these libraries as a SAS vendor for example I want to cover? Should I start with the I don't know top 100? How far the top 100 is

going to take me in terms of I would say uh accuracy or goodness of the coverage? Well, three different colors are the three three different ecosystems. Uh if you look for example the top 1,00 if we had rules uh or good coverage complete coverage for all of the top 1,000 most dependent upon libraries we would cover 16% of the repositories meaning only 16% of the uh repositories I'm looking at depend only on the top 1,000. So all of the other repositories are to a certain extent basically left uncovered. And even if we had coverage for the top 20,000, well for example only 30 46% of the repositories are entirely covered. We are still missing something. Um

you have the other example for uh npm which is even more striking because well npm is is is an even bigger uh like ecosystem. If we supported the top 1,000 libraries in npm uh only a really tiny fraction of repositories will have perfect so to say coverage. Now what does it mean perfect coverage? What does it mean complete coverage? Like I'm taking a library. I was talking about Django. Okay, I want to have all the possible rules. And forgive me, I'm using the I'm using the term rules because of in my sangr world that's the term we use, but you could replace it with whatever your tool of choice uses. Um, in Django for example, there are

more than 900 modules, so packages, 2,000 classes, and more than 9,000 callable, so function and methods. And if you think about, well, a function or a method is basically what will execute SQL query uh or make an HTTP call to some other service. It's the thing you want the tool to be able to understand. That's a massive amount. If I have to write rules for all of these, it's going to take me ages, like literally months probably. Um, and there's a another aspect that's quite typical in Django. Maybe it's more spec sorry uh Python, more specific to Python than other languages. Uh, but in Python, you can reexport symbols in a library. So the same, for example, uh,

Django views.generic.base.view view might be used by your engineers in your code with different names actually like jungle views generic view or others uh the tool doesn't know about like the SAS tool doesn't know about that you have to give this additional knowledge so you you kind of have uh an an exponential multiplication of data you have to provide um but there's another aspect like are these 9,000 function interesting are there are they all going to lead to vulnerab ilities. No, like there's tons of stuff that's completely uninteresting. Um, not everything is going to make like SQL queries for example and so on so forth. So that's where for example I'm going to show you in more in practice what we build. But

that's one of the areas where I AI can really multiply uh the impact of a security resource team or your impact potentially because it could go through all of these symbols as I call them and basically do a lot of the work uh instead of the security researcher. Um there's another dimension I want to briefly talk about which I think is also very interesting. um we talked about how many libraries to cover but the reality I've been looking at again the same sample and there's a little overlap between repositories. So what I I was computing here is how similar the set of dependencies of two repositories is across the same enterprise. So all the repositories of

company A um or across these three companies I I have more data about like the whole set of um like repositories that we're looking at. What's striking is basically on on average 10% of the dependencies overlap. It means you could have perfect coverage for one repository but the other one is using other libraries. So at the same time there's a small number of rep libraries used by one repository but they're all very different across very different repositories which means we actually have to have coverage for a lot of stuff. Um this is how I was kind of summing up all the data. Um I'm going to move to like the interesting piece I would I

mean, I find the data very interesting because it's kind of surprising, but how do we actually deal with all these um libraries? How can you, for example, build better coverage if you're using a tool that's not exactly uh good for your specific repositories or if you're you have a lot of custom code, for example, like things that out of the box no tool will basically cover. Um what we did was basically um looking at well we have a kind of a own perspective. We are a SAS vendor so we are trying to make our life easier and make ourself able to scale more. Um the the the area where we spend a lot of time is well going through these

libraries understanding what they do looking at documentations going through all these functions and classify them and so on so forth. This is as I was saying something that's very um useful to automate. Uh keeping the human well you've heard it I think 10 times or 100 times in the past couple of days but keeping the human on the in the loop um later after AI did a lot of the work. Um let me go a little bit faster here. Uh we basically built this pipeline which is based on static analysis. So we built um a set of tools I would say or component that parse the source code extract information that we can feed into um AI to classify

uh I will give you more details in a bit but to classify all the potential interesting thing in these libraries but we don't know yet if they're interesting um and AI is going to generate annotations that we stick on the code basically of these libraries and provide an interface that security researchers can use to sift through all these annotations, triage them and say, "Okay, this is good. I want a rule for this." And then there's another piece uh which will generate automatically the rules we need. Um and I'm talking about static analysis. I think the same concepts will could be applied to um other areas where we have this problem where you have to kind of classify or

triage a lot of information and then kind of produce rules or kind of a knowledge bay for knowledge base for a tool. The first step is prioritizing. If we have a 100,000 libraries where do we start? We could start from the most common but probably that's where we would start initially. But then how do we decide how we go through the list? Um that's the first place where we use AI. Um we feed into um into the model a number of information. In our case we can say for example well how many of our customers use this library because well we want to please as many of them uh as possible very quickly. But that's the

same like for uh every customer they have many different repos which is the most interesting uh is there an internal classification for example that we can feed into the system and say well this is most more important than another um we also use AI to classify the repositories because well we don't know what our customers are building we don't we don't see what they're building but by looking at their depend dependencies we can save something we can know if something is a web app for example so we know that we might want to look for certain type of vulnerabilities there um and um we can look into the code of the dependencies we want to write rules for and see if

they are using I don't know a database so if there there's pos potentially some SQL injection issue so we want to write rules for SQL injection for that library and so on so forth um we built Well, Roma built a fancy UI uh around that after having the initial idea for the project. Uh but that's kind of a really great time saver. We can very quickly see that well this is a list of libraries. Uh in green like uh highlighted in green is the score that the system gave to these libraries. It tell us okay how many uh like customers or repositories are using these and what kind of risks are related to these libraries. So here we can very quickly

see okay you're going to tell me SQL alchemy okay I could I could have pulled up something more obscure but SQL injection syncs for example so it's very easy for us to say oh we want to run a campaign to write rules for this specific problem because we don't have good coverage it's very easy to use this interface and click on a button there like the last one on the on the right to make more magic things happen and I will tell you in a second what the magic things are these are the magic things. Um the next step is um another step where the other step where we heavily use AI. There's again I was saying there's a lot

of static analysis under the hood. I'm kind of skipping a little bit or like um yeah I would say skipping some of these details because I talk a lot and I don't have a twohour time slot today. So I'm gonna skip some details but we can talk about them after the talk. what AI is going to do here uh we're going to feed so we prioritize the library we can take the library download the code we can transform it um we are going to give to the to to to this part of the system the list of functions for example and function arguments that the library has to give documentation code sample and then we have a series of agent

specialized in different tasks. So there's an agent who's going to go and look at the code, look at the documentation, blah blah blah, and say, "Oh, this function arguments uh is a file path, for example, uh or it's used in a certain way." Then there's a security a review agent is going to go and look oh uh is this a source of data for example is this a sync like uh try to uh characterize from a security perspective um what this um this this function is doing and then for every specific vulnerabilities we are interested in there's another agent it's going to do another I would say layer and say oh this is a file path um it

seems like it could manipulated or I see in the source code for example of this function that no there's some sanitization or something happens that makes it makes it very unlikely for a vulnerability to happen there. Um and as I was say we supply a number of like code examples and stuff like that. And all of these agents kind of talk uh to each other. There's there's the the last one the independent review that takes like all of this information and say well are these agent kind of consistent with each other? Do we think that this is really an interesting security property to write a rules for? Uh and the next step is well that's

where the human that's the first step more or less where the human actually does something before this we will have to do manually all of the previous steps here the human receives I will show you the UI it's going to be extremely uh straightforward but we have an interface where we can go and say oh the the machine produced a number of annotations it gives me uh this context and we just go and triage and just you will go and triage the findings of uh any tool I would say. Once we have this triage done, we can just package these annotations and feed it into the next step. And this is what it what it looks like.

Uh at the top you see for example for this B tree library 124 annotations generated. annotation mean well the system said that 124 function arguments for example are worth looking at or not there's a kind of a confidence uh rating associated but might be uh target let's say for a sangri pool rule um uh 696 symbols so uh as you see here like modules classes functions and so on and here we click on one of these items. For example, here I clicked on um uh on a function call. There's this upload file function. Then the system tells me at the very right of the uh of the screen I selected actually you can see there's a file name selected there uh

with little star. It means we have an annotation on this file name and we have a sync for path manipulation u like ranked as certain and then we have all this context where we can see the different like decisions I would say uh produced by the different agents. So the security researcher doesn't have to do uh a lot of the manual work. You just have to go and kind of see through the this information in a single like place web page literally and say okay is this worth writing a rule clicks literally on the the uh the blue like bubble at the right and there's a there's a create PR uh kind of button create a PR and then

something something else happen automatically. It's a really really a timesaver. the rule synthesis. This is what I prefer to be honest. Uh because I don't know I'm not sure uh but I find it uh very interesting be I I think because that's where a lot of the speed up happen and where a lot of the errors disappear. Uh the rule synthesis is basically um actually let me show you straight with the screenshot. Um once we say okay the AI produce a valid annotation we literally create a file this is what we call a stub which is basically the code of this library where we removed if you see we removed the function body here and there's literally a type annotation

added to statement that says sangre tain sync kindi what's super interesting about that that here we gain type checking for example example, you cannot mistype SQL because we have a rule compiler that will parse this just like a compiler and we'll make sure that this annotation is written properly. If you're writing manually a rule for Sangri or whatever tool you're using, you're going to make typos and you're not going to know until the rule is used, which is probably I would say too late. uh and if you're writing hundreds of rules, well, you want to update them, you have to go and fix them one by one. Here the compiler will do that for us.

It will regenerate uh all the rules automatically. And some of the rules looks extremely complex. Uh these are things that usually we would write manually because we have this compiler. This is all automatically generated. uh the compiler will compact the patterns for example if he sees that we can I don't know simplify them or um uh bring some some pieces together so you see for example the rag x execute scalar scholars that's a kind of an artifact generated by the compiler for three different annotations execute scalar and scalar so instead you know having three different rules we have a sing or three different patterns we have a single one generated automatically If the compiler improves, if Sangre in

our case improves that we can write better rules or different rules, we don't have to go and rewrite the rules. We just change the compiler and everything happens uh automatically I would say and there's a number of things which are again in this case very srp specific I would say but every tool has every SAS tool has its own like requirements and specificities. Here you see at the end you have requires uh and what looks like types I would say because every label start with type uh for dynamic languages well sang doesn't do necessarily very good type inference. So we have to add sometimes in the rules tags to say oh this function arguments has a specific

type or this sync is a sync only if this method is called on a certain object of a certain type and all of these is also generated um by the compiler and as I was saying earlier with the example of Django well if you the same type is exported 10 times We will have to write this thing 10 times slightly different and well copy past error. There's one typo in one of these 10 instances. We have to go and fix it everywhere. Um so it's really really as as you scale to hundreds or thousands of libraries, it becomes really really time consuming. So this is uh a huge timesaver. So I would say like two uh I would say

big gain to me are one is focus because security researchers can focus on their specific domain of expertise which is evaluating if something could lead to a vulnerability but without doing a lot of the groundwork or of going learning about specific library how it works how it's used because we have a system now that uh can produce that I think AI is really really good at that like providing this context and summarizing the context in a very easy to digest way. The way we did that, the way I shown you is is one example, but I think you could uh like applied or uh create variants uh variations of the same thing to solve very different problems.

Um the other thing is scale like before that as I was saying and you would have and I mean it's still the case in many case you have a SAS tool uh or maybe some other type of tool that require rules where you have to go and do all this work manually and figure out where you need to do it. um typically to write some of the rules well it could take days I would say and I skipped on an interesting detail I will go back very quickly uh as I talk about this now um it takes days maybe a couple of days to write rule for a library because you have to research the library you have to

look at how it's used maybe you never heard about it you have to write the rule you have to test it and like maybe in different way you have to write the test you have to write the unit test then you have to maybe benchmark it on a number of repos. So it's kind of timeconuming if if it's not automated with this system in a couple of weeks. Basically one person was able to 10x the number of rules we have for Python. Literally going from something like 700 to 7,000 uh which is fundamentally impossible to to do by hand. Like it's it's a gigantic task. um and adding support for double the number basically doubling the number of

libraries supported just in two weeks. Uh I think the previous number like the previous number of libraries the support was introduced in six months or something like so a huge huge huge um like scaling factor. Um and it's just not just a matter of bragging like oh now we have 7,000 rules. It's they're actually finding new vulnerabilities. So it's kind of it kind of validated for us the fact that this long tale of packages it might not be interesting to every one of you but there there is going to be a repository where not having that one rule it means missing one vulnerability. Um so what we did learn very quickly because I have barely time for questions

but anyway the questions are for you. Um I'd see well something is okay I've been skipping some of the details simplifying some of the things just to u make the presentation fit in the time um it's not always so trivial like uh syncs what we call syncs or the place where the uh um the malicious data for example or the user control data could end up very often are function calls. So there it's very easy to say I'm going to annotate this piece of code and it's very local. It's not like if we wanted to extend the system it's not necessarily so trivial because well sometimes if you think about sources of data in a web framework

maybe the source is a function argument or like a method argument the let's say this way the method the argument of a method of a class. So you're already kind of looking somewhere else that's been annotated in a certain way. Um or maybe that's declared in a certain piece of code. So the the annotation is not necessarily local. So extending these uh might require um some actual work. Uh and there's also a very fundamental problem. The more rules rules you produce, the more it gets hard if you want to use them. So you need a system that can also scale with the amount of content you're going to produce. And I think this applies to uh

any tool. So to conclude, I would say I think I reiterated that uh a lot. Uh the first thing which I um I just said is really going beyond like the top 100 or top 200 is kind of fundamental otherwise you are bound to miss vulnerabilities. uh and it's also like it's even more true if you look at your own proprietary or internal libraries that the tool will not know about out of the box. uh and the other thing which is very uh kind it's very key to me is uh you need to find this balance between at least in this for to solve this problem between AI and static analysis for example like using both uh makes the system very fast

very predictable um you get very consistent you get always the same type of rule for example same type of pattern or you can apply it very easily to every languages is um but AI has this ability to bring more context that otherwise a human will have to be um yeah I think that's that's all for me. Do you have any questions? >> Thank you so much Clauddio and a round of applause. Thank you. All right so um you can go to uh besides sf.orgq org/ Q the letter N and the letter A to submit your questions and uh we will go over a few questions. >> Yeah, there's a question right there. >> I'm sorry.

>> There's a question right there. >> Uh okay, that's fine. >> How do how do you know when you're done and can you ever be done? I don't think you can be done because the libraries you're going to kind of sick of saying libraries but uh like what you're looking at today is not going to be what you're looking at tomorrow. uh like there's new name it blocking libraries database libraries every day like I was talking about the GitHub libraries because I was kind of surprised about how many variations of the same library exist and it's going to be the same like it it to me it's why coverage is a very interesting problem because it must be automated and you

cannot stop >> security >> well I would say also to by for you to find more vulnerabilities or more relevant. Yeah, thanks for cash for the question. >> I'm sorry. Yes. So, we have uh Go ahead. >> Y >> I'm just curious in terms of how do you know the AI is doing a good job? I mean, because it seems like the big problem usually with automation is you get this feel-good sense that you get a lot of coverage, but how do you know how how are you validating that you're not either generating lots of false alarms for those of us maybe using the tool or or not just getting a good feel that it's got good coverage?

>> Yeah. Uh I could answer in a thousand different ways. uh sticking close let's say to the topic of the presentation just not to uh diverge too much um there's a couple of steps the first is the human in the loop it's very important because when the AI in this case I should stop saying AI every sentence well um we have these annotations uh but we can thumb up and thumb down thumb down and say okay this is a valid like valid information we are providing um we can feed that back into the system so that we get better annotations. If we get better information out of this step of the system, the output for you will

likely be better. Um there's another step I actually skipped a little bit for the sake of time. Uh the next step after this pipeline will be running all these rules we we generate that's what we do on thousands of repositories and seeing on actual source code what happens um because I mean the analysis isn't perfect. It could be uh that we generated some some weird I don't know rule that doesn't really apply well to real source code and so on. >> I have to say this is the best talk I've seen here. I like this. So good job. >> Thanks so much. >> Okay. >> Uh one more question if you don't mind. So uh someone online submitted this

question. If I secure the core framework used by longtail on AI apps, can I claim complete coverage for those apps or does my dissonance theory suggest new risks? Clear asking this question. >> Yeah. Yeah. I don't believe you. We're going to talk after. >> Hi, thank you so much. Yeah. I I said I said what he said. This is um one of the best talk I've seen. Thank you so much. Um so this question is >> I cry very easily. So >> actually yeah because in my um I I found this framework very applicable to my own work because in my company um I have a goal I'm a I'm a TM and a data scientist

in security. I have a goal of securing 100% of our AI launches. So how can we achieve this goal at a scale with limited circular human resources, human resources, right? And so my question is like your analysis of such as those similarity analysis and dependency those are so inating and I'm wondering analysis you are doing by figuring out those different launches similarity also the shared AI framework packages that different teams are using. If we focus on those shared things to secure them are we can we still need to secure those scale personalized coverage >> the question is trick again I don't want to be evasive in the the answers um but for example the data I was showing was

um again for the sake of time and for many reasons was kind of simplified vision if you are a single company for example example. Well, maybe you're probably not using all of the existing libraries in Pi. You're going to use uh like a subset of those. If your core business is very AI related or uh I don't know uh fintech related, your dependencies might also be rather specific. So your target is not necessarily the totality I would say of the ecosystem might be very specific. That's also why it's something I I didn't focus much but something we kind of realized as a consequence of looking at the data is every company or every repository is different

and if we covered for each repository their own dependencies where we might be looking at a lot less than the 100,000 or whatever. Uh then to still claim completeness there's at least like these two factors like if you are using a very obscure library but there is like a dangerous function in there and you didn't look at it you still have a potential problem. Um, yeah, and I'm kind of forgetting the second point I wanted to say, but it will come back in. >> Um, that could be, uh, it sounds like this is maybe a one-on-one question that you can have after this talk. It's a very interesting question. Thank you so much for, uh, raising this question.

And, um, thank you again, Clauddio. Uh, a round of applause again for this excellent talk. Thank you.

BSidesSF 2026 - The great SAST dissonance: how to please every... (Claudio Merloni, Romain Gaucher)

Related talks