← All talks

The Growing Crisis in CVE Data Quality

BSidesSF · 202530:0762 viewsPublished 2025-10Watch on YouTube ↗
Speakers
Tags
StyleTalk
About this talk
The Growing Crisis in CVE Data Quality Jerry Gamblin Explore the escalating issues in CVE data: inconsistent reporting, low-quality submissions, and outdated info. Learn why these threaten cybersecurity and what solutions can restore trust in this critical database. https://bsidessf2025.sched.com/event/8e6f6317c2e249f1bf18a835c696b8b3
Show transcript [en]

Hi everyone. Thank you so much for attending today. Uh we'd like to welcome Jerry Gamblin presenting his talk, the growing crisis in CV8 data quality. Um as a reminder for Q&A, please um we can use the QR code for um Slido or besides.org uh Q&A. Thank you. Welcome everybody. Um, like we said, the title of this talk is the growing crisis in CVE data quality. So, a little about me. My name is Jerry Gamblin. Um, you can reach me on all the social media platforms at JGamblin. My website is jerrygamblin.com. I also run a CVE data website called cve.icu. Um, that updates daily. Uh, so here's the talk. I was going to give a talk. We had to get these slides in

about three weeks early. Um, I I started writing on them and then this letter got leaked that said that the CBE program was running out of money. It was probably one of the most stressful days in my life. We started to see amazing stories like this that said, "What are we going to do if there isn't a CVE program?" Um my boss called me in to a meeting with his boss and they asked me that same question and I said there is no DR plan if the CVE program goes away. Um we will be in a wait and see mode until we decide. Um, luckily in that same period, they reinstated the funding with zero public

announcements from anybody that the funding was in danger or that it wouldn't be renewed. So now when you go ask the government if this was a thing, they say, "What? What are you talking about? We didn't There there was no issue. It was renewed on time, etc." Um, if anybody's wondering, that contract to run the CVE program and the CWE program is $29.0 million a year. So, it is not free. Uh, during that same time, we saw what would happened if the CVE program went away. We had in the 12 to 18 hours that there was no CVE funding and it was at risk. We had three new CVE databases pop up. Um one of them Termina which is

the US government is the European Union's equivalent to the US CISA. So they're still around. They've been planning this for a while. It looks like they're going to stick around. Um circle.lu LU which is the Luxembourg um instant response team. They launched something called what was it the global CVE GCVE and then there was a foundation that was launched to try to take over the CVE program from the US government. Um all three of those are still in different stages of flight. Nobody knows exactly how any of those things are going to going to work out. I expect the EU program and the MITER program to to work closely together in the future. I I

don't expect it to be to be a separate CVE program on its own. So let's just talk about the CVE system. It is a bureaucracy on a bureaucracy on a bureaucracy. Right? So, the Department of Homeland Security gives money to CISA who gives money to the Homeland Security Systems Engineering and Development Institute, HSSEDI, which then gives the contract to MITER, which then gives the contract to the CVE program just to publish half of the CVE, just to give the CVE number and the description, right? So they deal with all of the CNAs and then NIST, the National Institutions for Standards on this side runs the National Vulnerability Database and they work with a company called Analogence. I never know exactly

how to pronounce their their name, but they are a government contractor who gets 125 million over five years to enrich the NVD data. So the CVE program like we said it oversees the public database at cve.org. It upholds CVS publishing rules and it member and administers the CNA system. The CNA system is important because we're working on a decentralized program where individuals and companies can publish their own CVEes. Uh they also operate the quality working the working groups which is the community's ability to help steer the CVE program. These are important because I'm going to ask you guys to try to get involved with these later in the talk. Um, the NVD program enhances the

published CVE records. They add the CVSS score, the CPE and uh the C.WE. I talked to a lot of people and a lot of people who have been in the vulnerability management space and in security for a long time and they believe that the NVD runs the CVE system. Um, they do not. They are there for the US government. If you get too prickly with them, they will tell you that that their only customer is the United States government and that they give the rest of the data away for free. So, when they had funding issues and people started saying, "Hey, we rely on you." They're like, "Why? Our mandate is to serve the federal government of the

United States. So, you really don't get a say in how good or bad of a job we're doing. So, let's get to the growing crisis. I just ran this this morning. There are 30% more CVEes published today as there was this time last year. We're over 15,000 CVEes today. This day last year, it was just over 12,000. That's a pretty normal growth pattern for CVEes. Um, this is the view in a little different. This is just percentage of CVEes uh published per year. Uh you can see that we're growing every year that more a bigger chunk of the CVEes are published at the end of every year. Uh we run a small there's a small subgroup

of people who do vulnerability forecasting. Um we have a conference in Europe every year. If you're interested in geeking out on trying to guess how many CVEes there are, I would suggest you look at the first.org or uh vulnerability forecasting group. Um we're meeting in Cambridge in September this year. So we'd love to have more people join that group to try to to figure out who's publishing what. So now let's get into the problems with CVEes. Actually when I tell people CVEes are the easiest easiest thing to publish, I tell them this. There are four required fields for C to publish a CVE. Um, you need the CVE ID, the description, and what product it works

on. Those are the three things that you need. The CVE program brings some metadata that identifies it. Um, the description, this is the only validation on the description. It says, what is the CVE descriptions requirements? It needs to be at least one character and no more than 49. uh 4096 characters. Two character. Yeah, you have to have two characters. Sorry. Um and you laugh and you say that that's nobody's doing that. We're seeing more and more CVEes looking like this. I I'm not trying to to pick on anybody, but this is this is what we're getting from Microsoft a lot of the times are just spoofing vulnerability, right? And this is what the community gets and gets to

use and gets to try to figure out if this is exploitable in your environment, if this is something I should worry about, right? Um, while they're doing this, I don't know, but it's well within the rules. So, these are all legal CVEes. uh legal in the sense that we have 20% of all CVEes doing this because it says you have to have a product and since there's no checking on it, most people including MITER just throw an NA in there, right? So if you're trying to read the data and figuring out, hey, what does this CV affect? you're going to see this NA, which which isn't very handy when the description says spoofing vulnerability, but you could file a CVE

today that says spoofing vulnerability with the product it affects NA and the version it affects NA and that would pass all the requirements for the CVE program as of today. Dr. Ben Edwards who works for Bitsite is is one of my friends. He puts out a report that talks about the data completeness that CNA is due. Um, as of last year, this is from 2024 till the end of the year, only 80% of the CVES had a product. So, that means 20% of all CVEes don't even meet the minimum publishing guidelines set by the CVE program. Um, I complain about this all the time. I don't get very far because the CVE program is in a state now where

they want to publish CVEes and doing data correction quality is is really really hard. So that's the reason I am giving this talk. Uh I wanted to talk about the funding issue. I thought this would be the only funding issue I would have talked about when we started this talk. Um in 2024 the NVD lapsed their funding. they weren't able to continue hiring people to to look at the CVEEs and apply CVSS, CWE, and CPE. So, the good thing was they were like, "Okay, we're going to get back. We're going to get this going and everything's going to be okay." Um, March of this year, they threw up their hands and said, "We can't do it."

Um it's impossible for us at the rate that CVEEs are being published to add CVSS scores, CWES, and CPEs. So right now there are just under uh 300,000 there are 291,000 CVEEs in the NVD database. 25,000 of them or 24,000 of them have no CVESS score, no CPE, no CWE done by the NVD. Uh if you look at that broken down by year um it's super heavy on the last two years as you can can accept and imagine I think that last time I looked only about a quarter of all CVEEs published in the last 18 months have CVSS scores and CPEs uh supplied by the NVD. There is a big push for CNAs to

supply that data. Uh, that's a whole another talk because the NVD only allows some CNA data to be published in the CVS CVE records from them. Um, and that's another program they've actually shut down. So, it's actually frozen there, too. So, it's really, really hard to get quality data. And to that point, back to this slide, if you look, there is only about half of CVEEs that have any CVSS score or C.WE. So if you use either one of those to figure out if you're vulnerable, in the best case scenario, you only get about half of the data of all the CVES have been published. Okay, so let's talk about bureaucracy funding and mandates. We we saw this slide

before. Uh just to be perfectly honest, uh MITER, they spend $29 million. Um NIST spends $25 million a year on their contracts. I I'm going to say these are contract prices. I've been told that these numbers are much less when you look at the actually money spent, but I have the links in the slide. When you look at the government funding websites, this is how much the checks were sent to these companies. If they move the money anywhere else, they won't come out and tell you exactly how much it is. So, just for fairness, these are the numbers of the contracts. So, some if somebody come say, "Oh, it costs way less than that." ask them to show you anywhere

where the number is and I will be happy to redo the math to that number. But since they haven't been able to provide that number, I am only going to go by what's publicly published. So last year alone, CIS CISA paid MITER $664 per CVE published. NIST paid their contractor $573 for every CVE that was published. Put that together, it got really close to LEAT. I don't know if they were trying that, but but I was like when I was doing this math, I'm like, "Oh, come on. Another hundred bucks." And it would have been all right. So together, the total amount of money that the US government spends on CVE publishing is about $1,237

a CVE for new CVs. And I always get, but Jerry, they're maintaining the database with all the data with all the old CVs in there, too. So just to be fair, I went and did that math, too. um they get um my CISA pays $99 for every one of the $290,000 CVES in the database as of the time I've ran this. Um NVD pays their contractor $87 for backwards compliance. So that's $188 a record per year. So you you can use any one of those numbers that you want, right? And when you start talking about making changes into these companies or into these organizations, you end up dealing with something called the Homeland Security Systems Engineering and Development

Institute. Has anybody ever heard of that institute before today? Yeah. Uh to be honest, I hadn't didn't hear of it until earlier this year when I was trying to to map out who really runs these programs. And it's as big of a bureaucracy as you would expect, right? Like this group then works with MITER to publish CVEes. So if you wanted to figure out like who's in charge of the CVE program at the very top, you could say DHS. If you want to be a little bit more specific, you say CISA and be like these are the people who who really do it and that would be half true, but this Homeland Security Systems Engineering and Development

Institute is who also does that, right? Who is the real one? And then you go to the NIST on the NVD side. Why the National Institution of Standard Technology is running the NVD program is something that nobody can explain to me, right? like I yeah I mean they're supposed to do other research but you know in 2025 it would make much sense for these two groups to come together. Let's talk about privatization and globalization. I know that both of these are probably sponsors here and I know people from both of these organizations, but the CVE program is in such rough shape. GitHub has their advisory database which they do a great job on, right? Um Google has

their advisory database which they also do a great job on. Um the overlap on these databases are minimal. So, if you're looking for CVE data or vulnerability data, because it's not common, you can go to either one of these websites and pick up different data sources from different data, right? Um, which is great. And then you get to this problem where you have the global CVE launching, right? They're going to get data. you have you have uh the EU database which is going to come out and do the EU NVD which will be interesting and we'll see if they publish anything new and then you have the foundation issue again what you don't have on here is

um and just to not get too political there are other groups that are not that don't love the US that had their own NVD you have the China NVD and you have the Russian NVD And I I I would guess that if I'm sitting in Moscow today, I probably can't access the NVD website. I haven't checked that. But I can tell you sitting in the US today, I cannot access the Chinese vulnerability database or the Russian vulnerability database. So we're siloing that data by government in different places. Um so we really need to work on that as a community. So what are the proposed solutions? So I gave a talk just like this at Volmcon

which is a meeting of all the vulnerability CNAs uh three weeks ago. And my first thing was they have to have a quality validation process. You have to get in there and start using in quotes AI machine learning or whatever to improve the data that gets published in these CVEes. Right now it can't be two reax rules that that are the base quality for how the the data looks. Yeah. Um mandatory fields. I push hard for this. it's never going to happen unless we get a community outrage and I need you guys to help me push on this is to say that today a CBE field is not useful for people unless it includes a CVSS

score, a CPE and a C.WE. Right now, um, people on the board, people who run the program and make all the rules believe that just a description and minimal product information is enough to make a CVE record. I think in 2025, we all know that that's not true. I think the sooner we can get them to actually consider this data and put it into the CVE record as mandatory, the more secure the the internet will become. the government efficiency thing. I I don't want to sound like Elon Musk at all, but you have two different organizations in the government getting two different big buckets of money, and neither one of them report directly to the American people. Um, if if you push

too hard or you lean in too hard, they will tell you that they run on mandates from the federal government for the federal government. So at some point these two groups need to be merged and their their mission needs to become public facing for either the good of the of every US citizen and corporation are in a perfect world for everybody in the world to have this data. So the call to action is pretty simple. Uh advocate for improved quality from CNAs. If you work at Microsoft, Google, or basically anybody who publishes a CN publishes CVE, I'm more than glad to run reports for you to show you which data you're missing and why people need it. So, just find me after

the talk. I'm doing that. Um, participate in the working groups. There are a couple open working groups. We're starting a consumer working group soon, I was told. I was told I was allowed to talk about that here. um it will be announced through the CVE program and they're looking for people who consume this data both as companies that put it into their products, users who use it. Um, we're trying not to be government connected in there, but we'd love to have people, and this is probably a good group of people to talk to that are in there that would love to come in and say, "Hey, we really need to see this." And kind of help us lobby to get some

rules changed so that the CVE records work better for people in in the organization. Uh, thank you very much. We now have time for some questions. So, I I don't know if any showed up in Slido. Yeah, we have a couple of questions. Um, our first question, in a time of doing more with less, what is the benefit to vendors to provide even more data to CVE services when focusing on trying to ship security fixes in updates, especially when some of us fully disclose our vulnerability data in full? Yep. And and that's a that's a great question, right? If you're already putting out it in your advisory, all I'm asking you to do is to copy that data

out of the advisory into the CVE program into vulnog vulneroggram, right? It's not an extra step. We're not asking for different information. Um, a lot of great companies put a ton of information in their advisories and then they kind of don't put enough data in the CVEEs and it's not really easy to collect that data in one place. Uh at Cisco, I'm lucky to have a team and we've spent over 18 months and however much five engineers make and all the computing cost to go and scrape every CNA and to put that all in a standardized format. Uh but that's not public. We use that for our product. And it it's really hard to tell someone else that, you know,

yeah, you can do it. You just have to spend five engineers worth of salary and however much compute power we use to get this data in a usable format. Thanks. Our second question, would allowing the CNAs all to set CVSS cause a larger quality of determination issue? I I don't think so. The CVSSI which is ran through first is very great. They put out great documentation. If people would read the documentation and follow the calculators they put out, that would be that would be great. Um, the CVSSI is also open on first.org. So, if you're interested in better or different CVSs scores, I would suggest you join that. Does NVD have a competitive bid process

for maintaining the database? Uh, they do. Uh, I will share these slides, but the competitive bid was won by by their current vendor and it was a 5-year contract for $125 million. So, I think it comes back up for competitive bid in 2028. Aside from CVE data quality, there are growing number of CVEes published with low fidelity, adding to the noise that companies have to filter through. Any perspective on this? Yeah, I am low fidelity is an interesting question and an interesting topic. Uh, right now for the first time in the 25-y year history, a company called Patchstack is publishing more CVEEs than MITER. It'll become the first CNA to publish the most CVEEs uh this year. uh they only focus

on WordPress CVEes and about 90% of the CVEes they publish are cross-sight scripting in WordPress plugins. They are all valid CVEEs. They provide great data in their CVE quality. Um so while they're doing that it's it's not terrible. I'm okay with that. The data is good. I they they provide all the necessary fields. So we're not at a problem where we're talking about there are too many records. 250,000 records are not too many records for people at to manage at machine scale. It's actually quite a low number when you start talking about compared to SIS log or SIM events etc. Okay. Is CVE worth saving? Yeah, it is. I I will tell you that the

CVE program, the ideal CVE is worth saving. It's it's already established. People already use the data everywhere. Um, and it's the and it's the standard, right? It's what everybody goes back to. Could it be changed? Yes. But if you started to talk about could somebody create something like CVE but different, that would just mess up the whole environment. it would get splintering and then you would have even more databases sources that you would have to collect and and it would just make it harder for the average person and the average company to secure their environment. Any thoughts on how AIM ML could help get through the backlog or enrich the CVEes with sparse data? Yes. Yes, I do.

I have a lot of thoughts on those. Um it boils down to this. the best models are 90% right. Um, would you rather have a 100% of the fields filled in with 90% correctness or would you rather have them all marked null? Uh, you you have to make that decision yourself. And that is going to be a decision that that we're going to have to make as a group. Um, yeah, it's if if we could get an AI model to 99% 98% I I would say go for it. But right now, everything I've seen said the models are at about 90% correctness and guessing CWE and CPE. What do you think about the Linux kernel

becoming a CNA? Uh they're operating within the rules. Uh they're producing a lot of CVEes. Um I think that they're trying I'm glad that they're trying. Uh they're producing quality CVEEs. So just like patch stack while there are a lot of them they are meeting the quality requirements it does put a lot of load onto the NVD to to run through those but as of today they're doing everything that they're required to do as a CNA. Uh I would love to see CVSS score CPE and CWE be added as requirements but that is not directed directly at the Linux CNA. Should we have identifiers for CVEEs for products versus packages, Maven, etc.? Uh, I think CVES is where that needs to

land. I'm on the automation working group and the quality working group and we are looking to add Pearl and uh carnivore to the package at some point with SIM Omnivore. Sorry, it was one of those vores. I I'm not familiar with it. It's a US government project, but yeah, Omnivore, our Pearl, our CPE, so that you'll be able to identify which package it is, but it should all live under the CVE umbrella in my opinion. Is there a place to get what fix was added for a CVE without looking at huge commit diffs? No. No, there isn't. That sadly, that's the that's the hard answer on open source. you got to look at the disc at for for products you have to go

to their website and see and some of those products are behind login walls which are allowed under the CNA rules. So yeah there there is no easy button to getting fixed information. And our last question is CVSS itself in need of a new or better alternative? It depends on what you think CVSS should do. I think CBSS as stated as laid out in its documentation performs the task that it's supposed to do great. If you think it should be more dynamic or more and less static, I don't disagree with you there. But I don't think that is the role of CVSS. I think that there is room for another scoring system, another open scoring system that that could come

along and provide something like CVSS, but like a dynamic CVSS score. Perfect. Thank you. And we're at time. Thank you so much, Jerry. Thank you. [Applause]