← All talks

Vulnerability Regression Testing with Nuclei Framework

BSides Zagreb41:46103 viewsPublished 2025-03Watch on YouTube ↗
Speakers
Tags
About this talk
Infobip built a proactive vulnerability regression testing system using the Nuclei framework to prevent previously patched vulnerabilities from reoccurring in production. The system combines Nuclei's YAML-based templates, ProjectDiscovery's Notify tool, Python middleware, and Slack integration to automatically detect regressions every five days and alert the security team—reducing re-discovery costs and improving overall security posture.
Show original YouTube description
Presentation: Just like any other company that creates software, Infobip isn’t immune to vulnerabilities. In our “find, notify, and patch” cycles, we’ve run into cases where fixing one vulnerability ends up messing with something that was fixed a long time ago for another issue. To deal with this, we decided to bring in vulnerability regression testing. We picked the Nuclei framework for this because it offers a lot of options, can be modular, and there are plenty of examples available online. Nuclei uses YAML to define templates, which can be used as units for regression tests. By putting together Nuclei’s YAML templates, Notify from ProjectDiscovery, some Python code, and a Slack bot, we built a proactive notification system. This system alerts us if a previously patched vulnerability pops back up in our system. Speaker: Domagoj Vrataric is passionate about information security and has over 12 years of experience in the field, focusing primarily on offensive security. Currently employed as a senior application security engineer at Infobip. Recorded at BSidesZagreb (https://www.bsideszagreb.com/). #cybersecurity #bsides
Show transcript [en]

Then we can continue in Croatian. Thank you. Hello everyone, my name is Domagoj Vratarić. As my colleague has announced, I work in Infobip as a Senior Application Security Engineer for 3.5 years. Behind me I have 12 years of experience in IT security. One of my passions is the offensive side of security. From hobby I would say 3D printing, Internet of Things, microcontroller, smart home gadgets and I am passionate collector of gramophone keyboards. This will be an agenda, I will talk about vulnerability regression testing, metrics and problems we had. I will touch on some functions of the Nucleus, how we set up it in our environment, a few examples of the injuries we covered with this setup and how we solved the notification

of those injuries. Some definition of regression testing is the process of functional and non-functional testing, to make sure that the previous and still be fixed. Regression testing has a very important role in the software development life cycle and from some key stakeholders, I would say developers who are responsible for fixing the damage, QA teams who deal with regression testing, and security teams that discover those injuries and write tests that cover those injuries. why we even started this approach, why we have regression testing. We had several cases with injuries that appeared several times in the system, and we wanted to jump to the point of not having similar situations happen again. One of the reasons is that it was generally

new domain, something new that I had no contact with and that was interesting for me. It was an opportunity to improve our in-house security because we want our products that we offer to our clients to be as safe as possible. Also, one important thing to emphasize is that it is a very simple approach for example, for people who are doing penetration testing, because they can write read tests in a very simple way. You define the template, you just start the read test and that's it, you don't have to spend too much time on read tests. Our tests are basically automated and they are run through cronjobs that are run every N days, for example 5 days. Once

the test is written, that's it. We don't have any more work to do. It will be executed and we will be informed if the injury is again present. And generally, vulnerability regression testing improves our security posture. One of the most important metrics in this whole story is the number of injuries that have occurred in the system. Through this recurrence rate, we can actually see which parts of our codebase some weak points, for example. If we have some HTTP proxy, because of which injuries occur, like server-side request forgery, we can make something with that proxy, in order to avoid the complete class of those injuries, or rather, not completely avoid, but simply reduce the number of those injuries. We also have a metric time to

detect, how much we need to detect regression, it depends on how often we do the tests, and remediation time, which depends on the vulnerability policy of each organization, it can be different from organization to organization, for example, our example for medium severity is 30 working days. The last but not least important thing is that we have a private bug bounty program as an organization and in that way we save money that we would potentially pay for the vulnerability that returned to the system and that someone would discover a second time and we would pay fairly for it, regardless of what we initially paid for. Some of the problems we encountered in this whole process are Generally, there is a

lack of those tools that are used in regression testing because there are simply no specialized tools like that. The other thing is that a wide range of nutrients is covered here, from client nutrients to server nutrients to maybe some advanced nutrients, for example, HTTP request smuggling, which concerns 1.1 and 2.0 HTTP protocols. and new types of injuries, for example, the one behind the Burp Suit, Port Swigger, they are discovering new types of injuries every year. We wanted our solution to be in line with the new injuries and to be able to cover them. It is also important to emphasize that we had to make a balance between So, we have as few false positives as possible in our discovery, and

on the other hand, we have as few false negatives, that is, injuries that we didn't actually manage to detect. So, there was a small trade-off. Regarding these vulnerabilities of the client side, for example, it could be some kind of open redirection or cross-site scripting, there was also a problem because traditional scanners like Burp Suite, Acunetix or some other similar one, they can They can detect the payload in the HTTP server and they can say that the content type is html or whatever, but they are not sure if that payload will be executed in some specific circumstances. We wanted to have a tool that will be able to execute such types of payloads. We also wanted to integrate

this tool with our current arsenal. For example, one of the tools we use is BurpSuite Professional, which is a standard for testing any kind of application, even mobile applications. We didn't want to spend too much time on writing those We wanted to make sure that the tool has the possibility to report and get a notification when a test is performed. and on the Nucleus framework. It is a vulnerability scanner that can be used for scanning web applications, networks, cloud environments, and many more things to name a few it covers several protocols, such as http, DNS, network, and headless I mean Chromium headless and it can actually test and change regex in files it is important

to mention that it is open source, the community develops it I think they have around 30.000 stars on GitHub and they have over 900 contributors on GitHub, which is a pretty big number. It has about 10,000 templates, which are publicly available, these are community templates. Here I put a screenshot, this is GPT, which you can use for creating, that is, generating templates, if you are simply lazy for hand writing. It supports scanning several thousand targets at once, and it can also be done in parallel. It also supports out-of-band types of exploitation, that is, using Interact SH, which was developed by Project Discovery, which is an organization behind Nuclei. It also supports the VRB collaborator. For those who don't know, out-of-band means that when you

try to use For example, SQL injection, you send one request to the server through which you try to catch the administrator's hash or something similar, and the server will respond to that request through the response sent to a third party, for example, to a public server, which means that you will be able to read the response from the HTTP or DNS server logs. Nuclei also supports integration with important CI/CD solutions. One of the most important features, at least for me, is that there is no false positive finding. There is a structure like that, you will see later, When you define that template, it is bullet proof in detection. It also supports dynamic and static authentication. In this case, static means that you can give it

API key, bearer token or basic authentication. Dynamically, you start with username and password for WordPress and after the template gets the answer, the answer is a cookie and you continue to maintain the session with the cookies you got. There is also reporting part about GitHub and GitLab, it can create tickets on your Jira and you can also send logs to Splunk and do some kind of daily analysis. It also supports different types of inputs, for example DNS records, IP addresses, C-class, C-network class, you can give it standard input or pipe input from some tool to it, it will be able to read it. You can also give it OpenAPI definition or SvgR definition, for example for testing APIs. You

can also add request response from the verb suite and it will also be able to do that. As far as the template is concerned, these are YAML files that define the tests and they serve as a blueprint for Nuclei. They use domain specific language, DSL, to define some expressions for manipulation of some data. Each of these templates has a minimum part with basic information. It has a defined protocol, for example DNS or HTTP, and it has matchers and extractors. Matchers are, in this case, For example, you expect the matcher to be HTTP 200, which means the server got OK. And the extractor can be, for example, cross-site request forger and token, which extracted that template from the server response.

And a couple of special types of templates are code, fuzzing and headless. Code template is an example with PowerShell. It will use precondition to check if it is Windows. and it will execute a simple PowerShell order where it will try to find out if there is an administrative share. On the right side, this is one of the steps in the headless script that we wrote internally. We tried to take a step with Javascript which will take the homepage on our portal and will read from the homepage the CRSF token with this regex and later on this API endpoint will send post HTTP request and will insert the previously extracted CRSF token. So basically they can insert either bash

or python or powershell or javascript from it. Another of these special types is phasing template. It means that Nuclei can serve as dynamic vulnerability scanner, where it covers this whole palette of several types of vulnerability, high severity vulnerability, like SQL injection or cross site request. Here on the right side is an example of our phasing template. We tell the browser to open the URL we gave you, wait for it to read it and then wait for the wait dialog. It is a popup with alert, prompt or print that is waiting for the JavaScript. which is mainly used to verify that cross-site scripting is existing. We read payloads from txt file with which we want to test URL and here we tell it in phasing part

that it will query parameters in a way that it will replace existing value of that parameter with this XSE value, that is, with values from this txt file. After that, we match to see if that whether the way dialog is really executed and if it is, it means that the template has a positive result, that is, that regression exists. As extractors, we just wrote down here to write down which type of dialog is here, in this case it is alert and the value of that alert which is string XSS. Something about workflows. Workflow is actually a tool for orchestrating several templates. I have two examples here. Let's say we have generic workflow and conditional workflow.

Generic means that this second to Confluence Detect, regardless of the result of the first one. In this condition, the first one will be scanned with a template to detect which technologies are in question. It will then run independently of previously detected technologies. For example, if the technology is not detected, none of these two exploits will be executed. For example, if JBoss is detected, both of these exploits in the sub-templates section will be executed. This is a great thing for a modular approach to scanning vulnerability, because you can make a workflow, for example, that is responsible for one vulnerability class. For example, you can have a workflow that only concerns SSRF vulnerability and define all the templates

that concern SSRF. Some use cases that Nucleus covers, I already mentioned that it has over 10,000 templates, it can detect CWOs, I've put here that there are over 2800 templates that are currently being used for CVO detection. From Log4Shell to NetLogon or CheckVeg. It can also detect blind SQL injection, cross-site scripting, whether it is reflected or stored, through Chromium headless browser. It can also detect default access data. For example, there are over 10, I think 10 or 12 sets of access data for Apache default services. It can also If you have secret key that is stored in the file, it will detect it via regex It can send DNS request and read from the answer if the

answer contains CNAME record in it. In this way, it can detect if it is possible to get to subdomain takeover. It also has the possibility of detecting wrongly placed security positions, for example, administrative portals without defined access data, whether an AWS S3 bucket is publicly available for you to read and write on it, and it supports detection of remote code executions, This is our setup, we use the template we wrote on our git server We have cron jobs that we use to trigger the whole process every 5 days We use notify to send the message to Slack. We have a Python script that cheats the output. It serves as a middleware and cheats the output from the Nucleus. It goes through JSON

and based on that it creates a more intelligent output. This is a diagram. Let's say this is the first step. There was some vulnerability, it was an internal security review in Pentest or someone sent us a report on bug bounty. A member of our team gets that information, it processes the vulnerability, writes a template for it, which it pushes to the Git server. After that, the local cloned repository on our Linux server is updated from that Git server, and on that server, CronJob starts two instances of the Nucleus. One instance is single user vulnerability, and the other is multi-user vulnerability. We will discuss multi-user later, but that can be an example of Insecure Direct Object Reference vulnerability

type. When CronJob is triggered, it starts scanning. It has its predefined targets that it scans, it gives an output JSON, which is read by the middleware, it sends that reconstructed output to notify and send it to Slack channel via webhook. After that, Slack channel will inform our current member of AppSec team about how regression happened and then it will be good to see how we will fix it. As for authentication, which was also very important to us, we solved it in two ways. These vulnerabilities regarding the server side are solved by the principle of giving environment variable. With these flags, ev - var, name variable, value, we do this for users and passwords, we say to Nuclei, you will use

this variable for authentication in HTTP requests, we also have a scenario for one or more users, and in this case, Authentication token and cross-set request forgery token are collected from the response. We use these defined variables, username and password, which we previously sent through flags. We send a POST request to out1session with JSON and in response we look for the status of HTTP 200 and whether the token is in JSON format. After that, the extractor extracts the token which we call outtoken. Later in the next sequential request we send the get request to the homepage with the previously extracted outtoken and we check if the status is 200, if it is ok and if we have any csrf token in HTML. After that we extract

that token through regex and then we send it sequentially in the next request in that chain of sending. As for the authentication of client vulnerabilities, for example, open redirecting or cross-site scripting, We use Headless Chromium here. It's a similar example to Selenium or Puppeteer. Puppeteer tests are written exclusively in JavaScript. It didn't work out for us, because we realized we would spend too much time on it. We decided that this is an elegant way to solve the problem. We use something called X-Path. This is a simple xpat and this would be a full xpat. In this simple one we have an id that is important to us and we will later define it as input or button. Cross-sat request forgery

token is an authentication token. In this case it handles the headless chromium automatically. We specify the browser to click on the "assert cookies" on our portal, click on the username field, fill it with some value, click on the password, insert some password and click on the login button. After that, the whole process is automated. In the sense that we don't have to manually extract any variables. A few examples, this is an example with reflected XSS that we detected on www.info.com Here you can't see the parameter, it's somewhere behind, but what we do is navigate the browser to the URL and wait 10 seconds for the popup to be executed. If the popup is executed, that's

it. That matcher is satisfied and we think that regression has been done. This is a demo that lasts a few seconds. We are now moving through the console, Chromium is moving, it will read the URL and the popup is done, which means that the test was registered as positive.

This is a bit more complicated, because it is about clicking through the interface where we enter the user and password, we actually immerse the user interaction. This is an example of output we are working with This is standard output, this is the top line And these last two are the output we made through middleware script This is template, here we have some information, some metapotations We navigate the browser to a URL, wait for it to read it, then we click on the "Cookies" button, click on the username and password, wait for the element to read it, click on that element and wait for the payload to be completed. If the payload is completed, we consider the test to be successful. In this case, we have previously recorded

the payload as input, so we don't have extra steps, but the test is going as fast as possible. This is also a demo. Most of the things are blurred, so at some point, at some point, pop-up will appear. It will now run the template, then Chromium, click on the cookie, fill in the username, password and login. Now the interface will be blurred and there you saw it, it was pop-up. That's the second example. Next example is with BlindSSRF. We had a case where The pass was signed maybe a year or two ago. It is fixed, but after I went to check it out, I wanted to move as wide as possible the top 10 types of passes, and I wanted

to see if this pass is really fixed. I discovered that you can make a bypass, so that our functionality has a filtration that it reflects IP addresses. but what was possible to do was to use wildcard DNS server which means that Localhost.bsites.hr will always resolve to some internal IP address or some internal IP address. We used that publicly available DNS service and we realized that the SSRF is still vulnerable. It is blind, which means we will not get a specific response from the server, but we will get a verification through this time delay that the vulnerability is still there. It can be used as a basic port scanner. This is the template. We wait here through annotation, we defined

11 seconds, this is the host I'm shooting at, this is some API endpoint that I had to censor, authentication token that was already extracted from the previous template, and here we target some host, on a port that we know is open. This means that if the port is closed, it will come to timeout, if the port is open, the response will be 1 or 2 seconds. In this way, the console can be used as a port scanner. This is an example with Insecure Direct Object References. We defined what we are shooting, which is our target, and environment variables for users 1 and 2. We said that we need workflow, write JSON in Nucleus output, read

it with a script and send it to Slack. This is an example of workflow. It is a conditional type of workflow. It first tries to capture the authentication token, which means that if this is unsuccessful, if the access data is not working, the next step will not start at all. It captures the authentication token for users A and B, in this case it captures the CSRF token for the same two users, with the user A it tries to create something, and try to modify it with user A with the second session. This is an example, we are trying to use user A's available data, we are sending some body and we are extracting user ID

from JSON, In the next step, we try to modify the user ID we have previously extracted to modify the authentication token of user B. If the answer is 200, it means that we have successfully edited some things with the other user and we think that the test is positive. And the last example is a path traversal that came up because of some wrong configuration. We try to get to the XML file through traversing and we check if there are certain strings in the answer, app properties, user home and SMS API, and we check if the answer is 200. As far as the notification itself is concerned, whether it has been successfully regressed. We have done

this by using Notify, these Python scripts that I mentioned before, the nuclei. Notify allows you to send the output of a tool to examples like Discogs, Slack, Telegram channel, Mail, Push notification and so on. This is our configuration of our integration with Slack We said that this is Slack, this is the channel, AppSecAlerts and this is the webhook that we use for triggering This is an example of JSON output, where we have different things For example, we have some ID of the template, the author Tags, description, reference examples, I put a link to our Jira ticket, severity, time when it happened, the command that is actually executed in the background And this is a script that reads the file and it gives

us interesting things, it gives us a description Severity, reference and ID. Based on that, they match the output, positive regression test, etc. This is the final result, where we get a Slack message in which we have everything that we are interested in, at least for the first hand. So we have the type of vulnerability, the description of the template, the ID, the severity and the task, because it is convenient for us to have a URL of the ticket so that we could react to it as soon as possible. That's it. Do we have any questions on this topic? I'm listening. The most complicated was definitely when we had to simulate the headless template, we tested XSS,

and we had to simulate burp functionality, where you actually intercept the request from inside the proxy, and you have to change it on the fly and skip it. And all of that had to be simulated in JavaScript. We managed to do that after a lot of work. We had control on the client side, so the payload value didn't go through the browser. We couldn't click and enter the value by sending, We had to include previously prepared value that is valid. In the meantime, we had to quickly access the CSRF token with JavaScript and send a post request with the payload that would not pass the frontend validation. And then send it, click several times through the interface,

so that popup would appear and we would say, "OK, it's still running." That was one of the first examples I wrote about. It took me two days to write it. But after that it was much easier. We have more questions from the audience. We know that there are commercial webfuzzer that are not false positive and they do everything automatically. It can work as good as it can. Of course, there is always a problem with authorization because it is rare that a webfuzzer can check commercially. Have you ever compared your tool with commercial ones? Of course, these are your cases. Yes, these are cases for our use cases our portal, which is InfoVIP solution as a communication platform. What I have

just shown you, for client and server authentication, was enough for us. But now that we have some multi-factor authentication, I think this is not at that level yet. So maybe there is some Cutting edge commercial tool was better solution, but for 95% of scenarios I think this is good enough. And for these AIDORs that can also mess up commercial tools, it should be done case by case on the app. How much time do you need to go through for one app to fix it and then start working on it? We don't have that many IDORs, so it's hard to say, but we can use some burp plugins that actually work as a matrix, where, for

example, in the steps you have different users, and in the rows you have different functionalities. Then, when you click through the functionality, the application tries to verify with several sessions at once whether it has succeeded in that same functionality as the original user. And then we can and say that based on this matrix we see that this functionality is questionable and that the user can make it without any privileges and then we reproduce it through template because our templates for IDOR are not is considered as a vulnerability scanner. It is good, but more in the context of Fuzzer, and Fuzzer was more for SQL injection, XSS, SSRF, and similar. Most scanners don't understand But I think AI will change that. BurpSuite Professional

started from the very beginning to put your AI functionality into the tool so that you can pay the monthly and yearly credit and use AI directly inside the tool. I think that's the future for these types of businesses. Thank you. Any other questions? Thank you for the presentation.