← All talks

Do Scanners Suck? I Have The Receipts - Thomas Ballin

BSides Leeds28:2917 viewsPublished 2025-08Watch on YouTube ↗
Speakers
Tags
StyleTalk
Show transcript [en]

So, uh, yeah, thanks for that introduction. Um, very quick, uh, further introduction, a little bit about me. So, my background is in penetration testing. I was a pentester for about 10 years before this. Um, more recently opened up a startup. If you want to know about them, go find us out in the lobby somewhere. Um, and as will become a little bit relevant later on, um, many people might know me as just somebody that likes to stir a little bit of controversy on LinkedIn. Um, you'll notice there's another name on the board as well. So that's Boo Smith or or Bethany, who was the person that did most of the uh the hard work on this.

Unfortunately, she's just got a new job and she's enjoying her summer holiday having just left university. So um, she wasn't able to make it. Um, so I'm going to be taking all the credit. So, first of all, I just want to preface this conversation by explaining a little bit about what the actual intentions behind the apps detection framework are and what they aren't. Cuz what I don't want to come across as is saying that I'm trying to bash any kind of a particular vendor or in fact endorse any kind of a particular vendor. All I'm really trying to do here is demystify what we've currently got in the um dynamic and static application security testing space by providing some

actionable quantifiable information about how well particular scanners perform and how well they can compare against each other. So, um, pretty much all of the scanners that we've so far got in the apps detection framework are things that I've used for many years, things that I absolutely love using and have a real value. I just wanted to understand what their limitations actually are and to be able to say with some level of authority what they can and what they can't do. So, this research really came about after I was trying to recruit a pentester a few years ago. Um and um I built basically a platform where there were a number of different potential vulnerabilities that they could explore

and they could identify. Um and then at the end of it they do a bit of a show me tell me exercise where they talk me through what they found, how they found it and I could ask them a few questions. Uh during that there were some interesting candidates who shall we say use used scanners a little bit more than necessarily they needed to and started coming up with some very odd suggestions as to what the vulnerabilities were. They went on to a cross-sight scripting one and started talking to me about things like HTTP headers or cookie vulnerabilities. And a lot of the conversation sounded very well or a lot of the words they were

using sounded very familiar to me and and I quickly figured out that actually what some of these people were doing was using scanners to really perform a lot of the testing on their behalf. Now I don't really have an issue with people being able to do that. I think that in the modern day we really should be adopting tooling to automate as much of the job as we possibly can. But the thing that really stuck out to me was that a lot of these people weren't identifying vulnerabilities that I thought were kind of fundamental vulnerabilities to being able to test in application security. And in a lot of cases they were vulnerabilities that I thought were quite easy to be able to

find. Um admittedly I built the vulnerabilities into the system in the first place so I was a little bit biased. But what I really thought I might do at that point is take some of the tools that were being used inside these tech tests, run them against what I built and see how well they actually performed and see what the limitations were. Um, so I started doing that and um, I started posting the results on LinkedIn and it got a little bit of traction. people seem to be interested in it and then some of the vendors that I was talking about got particularly interested in it and sent me some politely worded letters that effectively

said words like desist and and cease. Um, so for a while that was interesting, but it kind of died a little bit of a death at that point. And um, it it just sat there in the background as a research piece to probably never see the light of day again until Lancaster University reached out and said, "We've got all of this opportunity for students to do research projects with people such as yourselves. is there anything that you can think of that we could bring some students in to be able to work on and actually um help deliver? And I thought they've got a bigger legal team than I have. So maybe we can look at getting

something like this up and off the ground again. So as I mentioned before, we had B Smith who did about 18 19 weeks with us so far, but um we're going to carry that research on as well. So this is going to be a collaboration piece between um between myself and the university um for another 12 weeks and then hopefully keep it going beyond then. Um and effectively um what we agreed was that we were going to design something a little bit more a little bit more sort of professional, a little bit more precise than running a bunch of scanners against um my recruitment rig. It seemed like there was a a couple of options here. There

was an easy mode where we take a bunch of existing um vulnerable web apps, choose shop, dvwa, things like that and we run scanning technologies against each of them and published the results effectively what I done in the LinkedIn controversy. Um or there was a harder option, right? build something more extensible. Build something that was more matched towards an existing framework or taxonomy of vulnerabilities, something that could really get a complete coverage that you could do proper gap analysis of what your tools were and weren't capable of performing against. And like I say, I wasn't doing the work. So obviously I chose to go with the hard mode. Um, so the framework is mapped against

two different taxonomies. I use the word taxonomy reluctantly for the first one because OAS themselves are fairly clear that the OAS top 10 is not a complete list of top 10 vulner of all of the vulnerability categories that there are out there. But we chose it because it is probably the most familiar the most recognized source of of categorizing vulnerabilities anywhere on the internet. And the the other one was CWES common weaknesses and exposures. I put not CVE in capitals there because that's the first confusion that everybody has. So while CVE is about vulnerabilities in specific technology stacks, CWE really talks more generally about the class of vulnerabilities. So if you can imagine a reflected cross-ite scripting attack

might be a CWE whereas reflected cross site scripting in jQuery is a CVE. Um, one of the interesting things about CWE as a as a choice is that it does already have some information in it about the detection methods of different vulnerabilities. Um, which I kind of hope solved the problem for us. But when you dig into it, the the information there is a bit limited. It's not very transparent about what it means when it says you can detect this through SAS or you can detect this through manual testing. Um, and in fact half of the answers, half of the CWEs don't actually have any vulnerabilities mapped to them at all. But effectively where we've started is

by taking the 180 CWEs that are known to map to OA was top 10 vulnerabilities and we're trying to address those first and then from there we're going to extend it out beyond that to the thousands and thousands of CWEs that exist um in the MIT database. So how does ASDF actually work? Effectively what it is is a bunch of containerized micro applications. So if the uh CWE is for reflected cross-ite scripting then effectively all that uh app does is it's a deployed page where there is a single input field the minimum amount of code possible to actually demonstrate reflected cross- site scripting. Same with file upload. Same with um crossite request forgery. same with every single instance of a

vulnerability um that you might be trying to build a test for. And there's also uh effectively a uh management interface which is responsible for deploying these. So we've got I think 52 specific tests built so far which cover a few more than that more than 100 CWEs because naturally some of them if you're doing CWE for cross-ite scripting then you're also doing one for reflective cross-ite scripting. you kind of get more granular CWEs. Um, but effectively this is a this is a framework to be able to deploy all of those to be able to then point your various different tools at them, run the tools and then record the results. Did the tool find on that

particular URL the CWEs that were expected to be found or that were present? If so, yes, marketers yes. If not, marketers no. And what you end up with at the end of it is some fairly transparent results to say with some level of confidence that the tool does perform well with these particular classes of vulnerabilities and doesn't perform well with other classes of vulnerabilities. Now we are working on a limited data set. So take everything that you see with a pinch of salt. I'll say to hopefully avoid any of the um cease and desists anymore. Um but what we found so far has been really really insightful. So you can see on the uh on

the left hand side a breakdown effectively a um a gap analysis of all of the different tools and the configurations for those tools that we've got so far. So we've basically been working through some vanilla configurations for some of the most popular tools on the market. Nuclei which is a a fan favorite amongst bug bounters um doesn't seem to have done particularly well across most of the o top 10 categories. Um, Burpuite, probably the most well-known commercial offthe-shelf dash scanner. They they've done um I think they they outperformed the other dash scanners in the uh uh in the process. Um but you can still see right across so at the bottom we've got SER GRE which is a SAS tool. You can see

right across the DAS tools though there are entire categories of the OOS top 10 that so far these tools haven't been able to identify. Now the argument from the vendors might be you need to you know you need to tune it you need to configure it specific to your environment you you know if you get the right setups you get the right policies you get the right so on and so forth in place then it will perform better so I've actually reached out to them and I've already got um a lot of these vendors saying they're more than happy to tune their tools on my behalf and then we can run it against the same

framework again to be able to see if you had somebody who is an expert in this tool let's see how how well it performs in that instance. Um, but it's also an opportunity for people inside the business to tune things in the way that they already have them and then to be able to see how well they perform there. Um, we just chose a kind of vanilla configuration at the moment because to be honest with you, when I speak to um, appsac engineers or even developers who are in a lot of these cases, the people responsible for running these tools inside their pipelines, they're not spending hours or days researching exactly how to configure these things.

They're hoping for a pointand-click solution that's able to find um as many vulnerabilities as possible. Um the graph on the right hand side is potentially a little bit more alarming I think in terms of the results. So effectively what that's saying is that none of the scanners individually identify more than about 25% of the vulnerabilities that have been introduced. Um and these aren't complex sophisticated vulnerabilities in a lot of cases. These are what I, as a pen tester, would typically call the lowhanging fruit. They're um the results of these surprised me quite a lot cuz I kind of gone into this looking at the overall top 10 and looking at the CWES and saying we're um

tools probably can find about 75% of these and in fact it's completely completely flipped that conception on its on its head. One of the other interesting things about presenting the data as it is on the left hand side is I know a lot of people try and layer their security tooling to try and ensure that they're they're getting that coverage not necessarily relying on a single technology but maybe they run SAS on their pull requests and then they run DAS during their deployments. But what we can see from the right hand side as opposed to the left hand side is even if you've got something like nuclei plus burp sweep plus semret there's still potentially um dozens and dozens of

vulnerabilities that just simply cannot be detected through automated means. Um I think the the graph here potentially even demonstrates that a little bit further. So if you layered all of the tools that you had so far, how big is the coverage across all of the OOS top 10? And you can see there's some brilliant coverage on certain classes of of um vulnerability. But then there's there's other ones. I mean there's there's a massive amount of that gray area which um represents blind spots across the uh the top 10. Um this is a another view that's a little bit more granular in detail and in fact I can actually bring up the application in a second. So if I if I go

out to here um so effectively the idea here is that you can come in once you've once you've run your scanner against your various different tool sets um or if the configurations already exist inside here you can say I use SEM grip and I use that for example and then what it'll tell you off the back of this is how well each of them perform and you can see that you know improper output of uh output neutralization for logs. As it stands at the moment, as far as we've been able to tell from the basic test, and in fact, there are there is only one test currently built in for this particular CWE. Neither Zat nor um uh

nor Smrep is able to find that class of vulnerability at all. Same with um a mission of security related uh relevant information. You know, this this gives you information that you can start to then feed into your pentester or your procurement process or whatever it might be that helps you decide where should you be focusing on at the moment to try and build out your detection capabilities and where can you already rely on tools to be able to uh to do a lot of that heavy lifting for you. So, I've been really interested in various applications for this. what the actual purpose of of this research is. From my perspective, it's just interesting to understand and and

demystify a lot of what these tools can do. But I like to think there's a lot of application for this and there's a lot of reason why I'm hoping to be able to encourage other people to get involved in this and to contribute towards it. Right? There's a lot of opportunity to understand the coverage gaps inside your business. If you're a if you're a penetration tester or or you're responsible for application security inside a inside an organization, it's a good opportunity to evaluate tools. Right? At the moment, we're dealing with vendor obfiscation of what their tools are actually capable of being able to do. Um, but this allows you to actually compare apples to apples, right? this is

a this is a way to be able to say actually hang on I ran your tool against um this framework and in doing so I'm able to say confidently that it doesn't perform as well as X or Y in the market um which feeds on to the commercial applications which is what I'm really hoping to do is these vendors who've said they're more than happy to run the tools in their configurations against my framework I'd like to now be able to go to them and have a conversation and say look here are all of the vulnerabil ities that you can't currently reliably find. What are you doing about that? You know, what what can we do? Can we

collaborate? Can we work together towards being able to actually improve your detection capabilities? Um, and that's that's widened by the fact that these tests can be run against particular um technology stats. These tests can be run against particular um implementations of a vulnerability. So I can start to be much much more granular and say look you can find cross-ite scripting in ASP applications but you cannot find cross-ite scripting in um in PHP applications or something like that. Um and and it really opens up the the potential for these vendors to actually not just say we find more vulnerabilities than everybody else but to actually be incentivized to find a wider breadth of of vulnerabilities as

well. What I'm also hoping that it does more than anything is just tell me what we can reliably actually use tools for. Right? I, like I said at the start, I'm a massive, massive advocate of being able to use security tools and scanners to be able to speed up your process. I will love it the day where I'm interviewing somebody, they run a scanner and it finds every single vulnerability just because they know exactly how to use that tool, how to get the most out of it. And then I can just ask them and have a conversation off the back of it just to make sure that they're able to offer the consultancy on top of that. They're able to offer the

advice and the guidance that you need to be able to support developers through the remediation of it. Um, but in order to do that, we need to be able to understand the tools and the limitations and where I need that person to still be an expert in the finding of the vulnerabilities. I'm really actually looking forward to um, if anybody knows any testing this against some of the more modern AI powered scanners as well. So, I'm fascinated to know how something um, like that performs against a framework like this. So, if anybody afterwards knows about any of those, really really um, invite you to uh, to let me know. I'd love to get this um

framework going against that. Um the other thing worth talking about in terms of this framework where we're at at the moment is there is still a significant amount of work left to do. So I built a few utilities into it that if anybody is interested in contributing will basically tell you which of the CWEs are not covered yet um so that you've got a um an idea about what tests are worth building and prioritizing first. And you can see from this there's a pretty large number of CWEs still left outstanding. So there's 88 with tests at the moment. Like I say, it's about 50 tests. And it's about 88 um CWEs of those 50 tests

actually demonstrate. But there's 102 just mapped to the OS top 10 still to do. And that's hopefully what I'll get Lancaster to do, but also what I invite anybody here who's interested in contributing to do as well. Um, the other thing that I invite people to do is to run your existing tool sets against this and then commit it back in. This is an open-source project. This is this is something that anybody from within the community can say actually, you know, I like this tool. I'd like to show off its capabilities or I use this tool on a day-to-day. I'd like I'd like to understand and drive it towards being a more effective version of itself. you

effectively can take even the tools that we've already got, run your own particular configurations and policies and so on and so forth against it um commit those back in and then what we've all got available to us at that point is a is a framework to be able to understand and drive these tools towards a greater success. Um as I say it's a it's an open source framework. Anybody's absolutely free to to come in download it. you can run it um with a couple of commands to be able to get the web interface and have a play with the results we've already got. Um the main thing that I'm really hoping for, like I say, is the contribution

towards building those tests and running those scans at the moment. But I uh I invite everybody to uh to use it to give some feedback and hopefully we'll be able to build something that actually drives apps tooling forward into the future. Um so I've uh I think I've got a bit of time left. I speedrun that a little bit. I'm conscious I'm competing with lunch at the moment, but um any questions?

Sure.

[Music]

So um in terms of how >> Yeah. Yeah. Yeah. So in terms of how um this research has shifted my mindset and how we actually approach penetration testing, I think I think one of the things that was always a reality, at least from my approach, is I I never really trusted tools particularly well. I assumed they could find a bunch of things and it was great if they did, but I always found myself validating those results manually to be able to actually complement them. you know, I hope that it could find a bunch of stuff before I even get going, so I don't have to write up a naughty finding for HTTP security headers or something like that. Um, but

it certainly opened my eyes that actually we're not getting a lot closer or at least it doesn't appear we're getting a lot closer to a world where we can rely on these tools without a manual tester on top of it. Certainly not for certain types of changes at least, right? There are particular there are particular changes that an application development team might be able to make now that we can reliably use these tools to be able to scan and in the pipeline is absolutely an option. But it's highlighted that for the majority of vulnerabilities there is still a place at the moment for penetration testing to step in and there's still a need in order for them to be able to understand

if risks have been introduced or not. Um, now we don't do pen testing ourselves. So, scoping is not necessarily a a thing that I used to be a pentester for uh for many years. But in terms of your your second question about how it's influenced scoping process, um, I guess I I guess what it really what's really useful is being able to help customers understand um or or the end the person responsible for application security. being able to help them understand why there is a need to go beyond scanning technologies. You know, things like SAS are a brilliant tool to be wielded to be able to help developers understand vulnerabilities and catch them sooner because the sooner you find a

vulnerability, the cheaper it typically is to be able to fix it. But it's an opportunity to be able to help and say to them, look, you're still you're missing a trick here. There's still lots and lots of things that you need to start thinking about um how you're going to be able to address that. whether you're going to bring an internal pen tester in, an external pen tester, whether you're going to introduce controls that mean you can mitigate these vulnerabilities upstream. You know, like as much as people bash on WASPs, um bypass techniques are getting harder and harder and WS are able to mitigate more and more of these classes of vulnerabilities and um there's other um you know, RASP

and and other technologies now that we can start to adopt. Um, and I'm hoping that's an extensible opportunity for this as well to be able to say actually with those technologies in place, does the detection capability go down, are we able to mitigate a lot of these vulnerabilities or are we in fact still in the same place that we were 5 or 10 years ago as it feels a little bit like we might be still in in DAS and SAS technology. So I know it's your choice of scanners there but I don't know whether you from research or not any the kind of the more wellknown kind of vulnerability scanner vendors within your analysis. >> So at the moment the the scanning

vendors that we've or the scanning technologies that we've got are just stem grip which um I have actually also run the free version of sneak. I've just not had the opportunity to to load the results into it as well. Um, Zap, I mean, Zap is probably the most adopted inside internal teams. It's it's I think one of the top 10 most downloaded projects in the whole of GitHub. So, it's it's very very popular inside businesses, although not necessarily quite so popular amongst security vendors themselves, who I think Burp Suite is the most popular. And actually speaking to Portswigger, um, their their online um the online DA solution is really more commercialized. My understanding is the

detection capabilities of that are pretty much the same as the detection capabilities of the of the local tool used by pen testers. Um nuclei is absolutely the most popular scanning technology used amongst bug bounty um hunters at the moment, but I am very much keen to be able to start adding in more and more technologies um as time goes on. I guess um a a question um might be you know are there particular technologies that people would like to see what's the what what do people consider to be relevant and and um I'm open to like I say people contributing their own access to scanners as they have them obviously I can try and get

commercial engagement to be able to to contribute towards this and say yeah you can have a license to be able to run this framework against a particular tool or not but if I don't necessarily get that buy in from the from the business and I'm kind of relying on people in community who have those those to chols to be able to do it on my behalf. Any other questions? >> 75% rate. Be interesting to vulnerability expected to be picked up. >> Sure. So yeah, I I I mentioned a 75% hit rate um because I'd gone through the um I'd gone through the top 10 and those 180 vulnerabilities. I just said yes no to whether I thought just based on kind

of a gut feel whether I thought tools would be able to find it. You know, I think this is kind of something that you can detect through signatures and this is something where there's likely to be work into it. And I found that um lots and lots of tools really focused pretty much almost not necessarily entirely but quite clearly quite heavily on this sort of um A03 which is injection vulnerabilities right which is an area that you would absolutely expect a lot of these tools to be able to to find crossite scripting SQL injection things of of that nature where I was more surprised um to to your question of the vulnerabilities that they weren't able

to a lot of information disclosure. So there's a lot of really again what I consider to be quite trivial to find vulnerabilities in there where there's a end file in the root directory for example with clear passwords and other data within it. These kinds of things that are default configurations in a lot of software stacks. Um and a lot of the tools just didn't seem to be showing those up. Um, another one certain tools um did um what was quite interesting was they they flagged things like um crossite request forgery unreliably. So there's a few different tests for crossite request forgery on there. Um, and sometimes they'll say it's present, sometimes it's not, which really

surprised me because fundamentally these are just single pages with a simple form that you have a submission button for. And there's no reason why you should be able to find it in one and not in another as far as I can as far as I can tell. So it it took me back to find that they didn't highlight a lot of those. Um, similar with uh click tracking vulnerabilities as well. Um there was a lot of inconsistency for that which is I think fundamentally one of the easiest vulnerabilities to be able to reliably and consistently like when I imagine how you might build a technology to be able to find that it's fascinating to me that

that some of these scanners weren't able to uh any other questions. >> Okay, no worries. Well, come find me if you've got any questions.