← All talks

Watching the Watchmen: An Introduction to Certificate Transparency Logs

BSides Lisbon · 202533:12162 viewsPublished 2025-12Watch on YouTube ↗
Speakers
Tags
About this talk
Certificate Transparency logs are publicly accessible records of all issued TLS certificates, enabling both attackers and defenders to perform infrastructure reconnaissance. This talk explores how CT logs work, their security implications, and practical techniques for querying and analyzing them at scale—from identifying undisclosed internal networks to discovering relationships between companies and services. The speaker demonstrates real-world examples of certificate misuse detection and shows how defenders can minimize information leakage while security researchers leverage CT data for large-scale Internet analysis.
Show original YouTube description
This talk introduces Certificate Transparency (CT) log lists, with a focus on their security implications. I will focus on the security aspects of all certificates having to be published on a public log and how everyone can query that log to obtain information that can give an edge to an attacker. Certificate transparency logs are not new, and are a very used tool by attackers to perform infrastructure reconnaissance. However, some defenders are still unaware that they can reveal various pieces of information on their infrastructure, including their internal networks. I will also touch upon some strategies that can be used by defenders to minimize the information that is disclosed by this mechanism. Additionally, I will show how anyone can essentially download an monitor any CT logs, since they are publicly accessible by requirement and by design, even as CT logs become exponentially larger over time as certificate lifetimes become shorter. This allows for large scale data processing and discovering interesting curiosities and artifacts on the Internet as a whole, including relationships between companies, product and services usage, etc. About the Speaker: Luis Grangeia is an information security specialist with over 20 years professional experience. He has participated in multiple security audits and penetration testing projects in most industry sectors, including critical infrastructure. Luis has a security research background, having published and presented research on topics such as Linux kernel level malware techniques, DNS Cache Snooping, advanced exploitation and reversing of IoT devices, radiofrequency security and others. He is currently Security Researcher at BitSight, focusing on collecting and studying security data at scale.
Show transcript [en]

Oh, by the way, just one small detail. We'll have the prestigation art afterwards. So, >> yeah. >> Hello. Can anyone hear me? Okay. >> Nice. Cool. I have a bunch of slides. Not a lot of time to talk about them, but uh hopefully, you know, yeah, we'll get through it uh without me speed running through it. All right. So, you probably seen uh if you're a bug bounty hunter, if you're a pentester, you've probably seen uh this uh or certificates the search engine where you can type in a domain. Uh for example, I typed in tesla.com and I got a bunch of uh interesting domains for me to uh if I want to pentest, if I want to find a

security vulnerability, I can do that. Uh yeah, this is, you know, again, I'm not sure if you've seen this. If you've worked in offensive security, you most likely have. If you haven't, I encourage you to test it and put your own domain in there. What is this? Uh in fact, so what's what's where does this information come from? Uh you know, there's I showed you cr.sh. Uh uh there's census also has a a search engine for certificates. There's SSL made, there's scaly dog, security search stream, etc. Some of them have web APIs, some of them have search interfaces. This information all comes from a thing called certificate transparency logs and it's uh freely available. You can either

query it indirectly through these sites or you can actually go and collect it yourself. And so I want to talk to you about CT logs and I want to show you that they're much more useful than just for pentesters. Uh there's a lot of interesting things there. Um and I'm encouraging you to take a look at it for a number of reasons which we'll go through. I don't know what what just happened. Uh all right. So uh first I want to give you put you guys on a trip back to 2020 2005 2010. Some of you like me were working already by then. If so you probably have a little bit of a back pain. So do I. So I

I'm with you. Uh but and and there were quite a few things differently different you know there were some different buzzwords um and one of those buzzwords was public key infrastructure TLS digital certificates uh sign your documents you know secure your website by just with 9.99 you can secure your own website with SSL it was not NFTs or blockchain or AI it was a different buzz word you But the same same vibe you know companies were formed around this. There was an economy around this. Uh and so yeah uh people were selling certificates and you know this hasn't changed much from then. Uh you basically if you wanted a certificate you go to a certificate authority. You know by then

there was you know thought or godaddy. I think they they already existed by then. You usually pay them a bit of money a little bit of money not much. Now you don't pay at all maybe if you go to let's encrypt um but yeah you go there you you you get your identity verified you say hey look I'm I own this website I own a certificate please give me a certificate they would do their own checks you know in their own way not really a standardized way of doing that and they would provide you with a signed certificate by the sign by the certificate authority and you would get a certificate and you could prove to

others that you were who you claimed you were there were root CAS CAS or certificate authorities that were trusted by default by your computer by your Android by then you know it was starting by your Windows etc they didn't issue all of the certificates they couldn't there were just too many certificates to issue so they would create special certificates that would delegate that trust to intermediate CAS or subordinates CAS, however you want to call them. And so you end up trusting all of these. You don't know exactly how how many CAS you trust. You know, you have a bunch of root CAS in your in your phone or in your computer, but you really don't know

how many CAS you're trusting. You also don't know where are they. You know, they could be in Russia. They could be in a a completely different country than yours with different geopolitics, different real world security right? And also by then you know this is very general generalizing but by then there weren't many uh restrictions of what a CA could issue. If you know certificate authorities somewhere wanted to issue a new certificate for google.com they could no restrictions. This created a big problem because certificates were only seen by people who needed them. If I went to a website, I would have to see uh a certificate that would be between myself and that website. That's a problem. So basically

what I'm trying to say is if you were able to trick or bribe a CA maybe in Russia for example or another country to issue a valid certificate for bank of america.com, you could effectively become bank of america.com for a selected group of victims. you were in your inside your network. You man in the middle of your network and essentially you would see all the traffic and you could effectively convince them that you were that website and nobody would know any better. This was a problem and there were there was a lot of situations that happened between this time around this time enough situations that triggered this organization called uh Electronic Frontier Foundation to release a study

and to perform a study where they wanted to scan the internet find as many certificates as they could find and identify the chains of trust like take one certificate let's see who issued this certificate let's go back to the root let's see how CAS are actually issuing certificates, right? It turns out that by that time around 1500 distinct CA certificates were actually issuing certificates on the internet and those were trusted in way or another by Microsoft or by Firefox browser etc. That would amount to a little less organizations 60 uh 651 uh one of which was Marks and Spencer the guys that make sweaters. I think they're still around. So I used I I gave

a presentation about this like many many years ago and I used to say by this time the security of the internet depends on the security of Markx and Spencer group and lots of other groups by the way there's this is not a single situation by the way this the studies is still up it's a really interesting study you can go look for it the URL is right there um So then a year later, coincidentally, a year later, 2011, DG Notar, which was a CA, a root CA in the Netherlands, gets hacked. And this is, by the way, uh not an isolated incident. This is kind of a cathartic moment that was the culmination of many many smaller serious

issues like certificates being mis misissued. some guy pretending to be Microsoft that gets a code signature certificate for example. I I it's too many uh too many situations to mention but this was kind of the the the moment where people kind of stood up and said yeah this is this is not great this is this is bad. Basically, did you notar gets uh compromised? And by the way, the way that they found this was just one single reason. They actually the the attacker, presumably someone from Iran, at least they attacked people from Iran. They issued a bunch of certificates for many many high-profile websites, including google.com. and they managed to man in the middle most of the country

of Iran uh and basically collect uh credentials Gmail credentials for at least 300,000 users in Iran and there were also other users affected outside of Iran. Obviously, this was found. By the way, the way that this was found was because Chrome was already around and Chrome was doing an additional check for the security only if of Google.com domains. You know, Google Chrome, Google.com, let's check and they were presenting like a different error and someone from Iran said, you know, this is this is weird. This is a valid certificate. Why am I getting this error? And that's how they got it. I think to me you know we had one a cry in 2017 three approximately 300,000

machines infected this was want to cry was more disruptive this was more insidious to me this was really really these guys were caught almost by accident and they wouldn't have been caught if they just continue to sniff other types of credentials. So yeah around this time people said yeah this is this is this hierarchy this PKI the chain of trust this is not going well. Uh something needed to change. So uh enter certificate transparency. This was the big change that was start that has started to be introduced uh to try to change this. Uh this was started launched by Google officially 2013. It's basically an effort to have a public record of all publicly issued

certificates that are trusted by root CAS. It's now enforced. And by the way, this started in 2013. We're in 2025. Things are still moving here. Uh there's not complete uh uh support and uh adoption for this technology. So in Chrome it has been uh enforced for all certificates that were issued after 2018 and it started on 2018. They started earlier with EV certificates but now they completely support it in Firefox. They started this year like earlier this year. They knew about certificate transparency. They only are enforcing it since 2025. Uh in operating systems the the situation is a little bit different. uh certificates are used in Apple applications and Mac OS applications, iOS applications. Uh they actually have

as long as I as as far as I can tell, they are requiring it since 2018. In Android apps, it's supported, but you have to opt in on a per app basis. If you're using an app that uses certificates, you might not be using certificate transparency. You need to opt in. there's a an option in the SDK. It is implemented in Microsoft since 2024. I don't think it's enforced as far as I could read in the Microsoft operating system. In the browser, it it is because browser uses Edge. Edge is uses Chromium. So, it's the Chrome engine. So, hopefully you haven't seen this error. Uh if you have you have a a trusted certificate that is not in a CT

log which is a very very weird thing to have. Um but yeah this is the error that shows up if you have a bad certificate with regards to certificate transparency and you won't be able to visit continue to the website. The way that browsers and clients check for the presence of a certificate in CT log is uh you know I can try to explain it really simply. It's basically used a a t a TLS certificate attribute that is embedded within the certificate and is signed by the logs that contain that certificate. Um and so you can basically check for the presence of the certificate in a log without actually having to go to the log and say hey have

you seen this? You can actually actually just look at the certificate and see it that it has a receipt that it was published on a CT log. So basically certificates are uh issued in the same way that they already have. Uh you don't pay money anymore. I think I don't know if you do you're probably not not not doing the right thing. You can go to let script and give they'll give you a certificate but yeah you send a certificate request you get a verification. Hopefully the verification is a little bit better. Uh I won't I won't talk about it. Um but yeah and the CA will send a pre-certificate to a CT log or to more than one actually. You

will get a signed certificate time stamp back and it will include it in the final certificate to you and you'll have that certificate that you can use regularly. There's you know there's a a catch here that I won't go through otherwise you guys won't be able to have uh uh snacks. Uh but it's it's a chicken and egg problem because you you you you have to send the certificate to the log, but you also have to embed the the receipt in the final certificate. So there's a thing called a pre-certificate that gets submitted first and then you get a receipt back and then you generate the final certificate. It ends up working by the way, but what happens is there there

will be a lot of presertificates in the CT logs which is a good thing. It's basically the same information. So, does certificate transparency prevent certificates from being misued or a random CA in Croatia from misbehaving? No, it doesn't. However, does it help detecting misued certificates or misbehaving CAS? Yes, absolutely. But there's a catch. You must look at the logs otherwise they won't be catched. There will be the logs themselves. They are not enforcing anything. They're just saying, "Yes, I got this certificate. It was issued by a CA that I know about. Not sure if it's good, but it's here for everyone to see. Here's an example of certificate transparency working um correctly. Uh, Cloudflare has a service, you probably

used it, DNS over TLS. There's a bunch of services that uh sit on a special very special IP address 1.1.1.1. Um that uh uh IP address has had a certificate issued this year and last year. They signed a bunch and they issued a bunch from a root CA trusted by Microsoft in Croatia. All right, they've done that and they actually submitted the certificate to a city log. So, and by the way, the way Cloudflare caught this, at least that I could tell, is that someone alerted them like they say, "Hey guys, you you've seen this certificate, you probably need to check this out. This is not you, right?" So yeah, these certificates that were issued by FINA, a provision's root CI,

they could potentially be intercepting traffic from you to Cloudflare without you trusting in knowing being anyone the wise. They said they they come up and said, "Hey guys, sorry this was a mistake. We did the test and yeah, we revoked the certificates already. It's fine. Should you believe them?" I don't know. I'll leave it up to you. Microsoft apparently does believe them. So they're still a root CA at this point despite this mistake mistake. Cloudfare got to know. Yeah, because they basically had someone tell them uh which is ironic because Cloudfare uh owns a CT log and they actually they actually probably had it in their logs. But we'll get to that. But kudos to

Cloudfare because they they've done really good work around this. So I know I mean and I I kind of explained it how they work but I want to show you exactly because it's so far it's been like an abstract thing like CT logs you know what's this is this is it a server is it like someone with a text file or a database what is it it's basically you can actually take your Chrome browser actually go to a website that is used by Chrome and that's uh I think they even distribute this file in the Chrome uh distributable that contains a list of all of the uh logs that are trusted by the the chrome

browser. Uh they are basically cryptographically verifiable append only public ledgers of certificates using a thing called markle trees which is really interesting. I have a few slides about them. I won't show them to you because otherwise I will you know spend rest of my day trying to explain those. Um, basically they can access B they can be accessed by everyone using a public API. It's it's in an RFC. This is literally a web API. You can curl it. I will show you how. You can download certificates and funnily enough you can actually submit certificates to a CT log. I haven't tried but uh I have seen that it's possible because I I you'll see. Um, by the way, the way they this the CT

logs will usually not make a lot of verifications about the certificate. They will just for them for the for them to accept your certificate, it needs to be coming from a root CA that the CT log recognizes. So yeah, Google Cloud, Digisert, bunch of others own uh CT logs and maintain the CT logs. think kind of a blockchain but for certificates but without you know the proof of work and the you know massive uh amounts of uh computational waste and it's not distributed it's basically sitting on their servers um so yeah this is these are just a couple of queries uh to a CT log from Google it basically collects uh certificates that were issued on the

second half of this year and you can see the size it's uh I don't know maybe 20 250 billion to 2.5 billion I don't know it's a big it's a big list uh that's the first line that's a tree size the number of nodes in the tree each node corresponds to a certificate and its associated certificate trust chain so what if you submit a CT log you have to submit the certificate itself and all of the certificates that lead up to a root a trusted route and then I basically collect collecting I'm getting an entry a random entry which gets me the the leaf certificate as well as the um the certificates in the chain.

So some statistics about this uh public ledger or these public ledgers. Each CT log is basically a distinct uh ledger. They if if we take all of the lists that exist and they have been there that are trusted or have been trusted by Chrome, it's about 186 URLs, which means there's a total of 48 or almost 49 billion certificate certificate entries or leaf certificates in the logs. Um, a lot of this data, by the way, is duplicate and I'll explain why. But yeah, it's a lot of data. If I made a, you know, very back to the envelope, uh, math and it's about 200 terabytes of data. If you want to download it, it will take you a

while. uh and yeah we we we made some infrastructure uh and plumbing at bitsite and thank you to who's here in the audience who helped a lot with that and would I would not be able to be to give this talk without him is uh the data that he collected so yeah it's a lot of data uh but uh it's interesting data I think um and by the way uh what I'm going to show you now couple of uh tests couple of ideas that I have things looking at the certificates and this is really just scraping uh the surface here. Um uh we I I basically took a month of data. I took uh the

month of September 2025 and I took 30 CT logs the biggest ones and just looked at the data. Uh we do dduplicated the data. again uh I I'm sorry I didn't tell you this like for Chrome to accept a certificate it must be present in at least two certificate logs. So the information the 200 terabytes is at least half of that is unique certificates and it it's it tends to compress a lot more but uh you you know you you get it. So yeah, I took the data for September 2025. It's about 720 million certificates unique uh and you know used the help of the big data gods. Spark, Athena, really nice guys. Um and

uh we just looked at the data. Uh by the way, I will name some companies here. Uh I they haven't done anything wrong. Uh it's literally they just issued certificates and that ends up showing up in public data. And that's that's the reason why I'm here. First I looked at first thing I like popped into my my eyes were Cisco phones from 2016 were in the CT log data from September of 2025. Why? I don't know. My theory is that someone submitted those certificates to the logs. Just testing stuff out. Uh the reason I found this was that uh certificates can be issued for domains. That's the very very most frequent reason why you get a

certificate. Um that's why you see like by the way almost 500 million distinct host names in this data just one month of data. Uh you you can get certificates for IP addresses even though that's much much more rare. Uh but you saw we we can get certificates for 1.1.1.1. You could get certificates for emails but no one uh is using certificates in emails anymore. SMIME was a thing back in 2010 25 to 2005. Uh URIs are a special field uh that had this data on it. It's I Googled it yet. Yeah, it's a model and serial number for a Cisco phone. Uh by the way, these certificates were issued in 2016. They expire next year and they're old.

They're like the signature algorithm is bad. It's using SH Shiaan yet a log accepted it and said yeah this is valid this leads up to trusted root so we will accept this they they take 10 years to expire so yeah it's not a good certificate by today's standards another thing that I found and this is something that is a bit of a trend this is not the only company that does this there are many many others uh so automatic.com appears to be I haven't I haven't known them but they seem to own WordPress.com uh the the platform and they also uh if you want to host your blog there they will they they have a cloud cloud

service right WordPress Tumblr WooCommerce etc. Uh if you take uh a certificate log uh and you search the database for uh certificates issued to the subject common name tls.automatic.com automatic.com. There's actually so there's a couple of ways where your host name can appear on certificate. It can appear on the subject um or it can appear on the subject alternative name which is another field of the X509 certificate. So basically they issue all certificates for their clients with tls.automatic.com in the subject name and then they put in the customers domains in the subject alternative alternative names. So basically you can single out all domains that are using uh uh automatic dot uh sorry automatic.com as a platform that

may be using WordPress or uh uh Tumblr or some other thing right so interesting right if you're a bug bounty hunter if you're a pentester if you have a zero day for WordPress this is a nice place to start also another this was actually the very first thing that I saw uh impervab Imper Perva does the same thing. They basically have uh it they they generate certificates. By the way, for some reason, some certificates have a validity period of one single day. Don't ask me why, but basically you can single out customers of imperva just the imperfect cloud flare cloud w cloud w application firewall. I think haven't uh looked deeply but I did try some of

these host names that we found and yeah this was a situation this guy did not have his site working but you get a page from uh imperva saying hey you can reach the final website but this is a w from imperva so nice to have you here and there's the certificate you can see impera.com as a subject common name and then the subject alternative names contain bunch of other domains that are uh from domains that are hosted by this w also certificates you know there's a bunch of other fields in the certificate right there's what's it for is it for server authentication for client authentication I couldn't even parse some of them there were some object IDs

that I didn't know about my parser wouldn't uh tell me what it was um there's a random code signing certificate from a guy in Cyprus I I redacted his name just for you know uh but yeah it was a guy that wanted to develop open source. So we asked for a certificate we asked this issuer certain code signing CA they got him a certificate. So this is this is nice you know get to know people that develop code just looking at certificate logs. Uh this this key usage PK init KDC and MSS smart card login. It looks like from looking at it just a couple random handful of certificates. These seem to be Microsoft domain controllers that

when you set them up and you set them up with a certificate, they will generate certificate with these characteristics. So if you're looking to find MS Microsoft domain controllers, maybe you'll you'll find them here. Another thing that I did that uh you know it was also I'm taking too long. So yeah this is basically a a rehash of the study of 2025 done in a different way. Um so I wanted to see what is the tree the current tree of the chain of trust that exists from root CA to to subordinate CAS and yes uh you know they they scan the internet in 202010. I basically collected certificates from from CT lock crawling. They found 1.3 distinct leaf

CAS sorry 1.3 distinct certificates. We found 720 million certificates a little bit more. Uh some of them are presertificates which might be you know there's maybe a duplication here but at least 337 certificates like much more data right and we didn't have to scan the internet which was great. uh they found almost 1500 CAS. We actually found fewer CAS which is interesting but it makes sense because by then PKI was a business you know people wanted to have a CA they wanted to there were companies basically being profitable selling certificates now it's let's encrypt and a few others Google issue certificates that sort of thing uh this is me just I put this on a

Neoforj database I could query it uh the nodes are kind of encoded by the number of certificates that they sign uh you know less Encrypt is obvious obviously the big winner here. This is me just uh looking for CAS from Portugal. There's two uh one is uh marketware. I don't know. I forgot it's a big one. The other one is these guys that make emails. They issued two certificates uh in the month of September. Company called Suma. I don't know if anyone works here from Suma. No, maybe not. I think it's an email company. Another thing that I want to tell you about, so you probably don't want to start downloading 200 terabytes and indexing indexing them into a database.

Going to take a while. So you could use a really cool tool that Cloudfare built called Cloudfare Radar that has a bunch of data on certificate transparency. You can't do you can't do these more in-depth queries, but you can actually look and see stats about certificates being issued, their validity, a lot of other cool stuff. And I want to subtly sub subtly brag about the fact that we see a little bit more than Cloudflare for some reason. Maybe this is rounding error. Maybe this is dates. But yeah, I filtered the data for September and we see a few more million certificates than than Cloudare, which is which is nice. At least the numbers match up

roughly. So yeah, I'm almost at the end. Sorry. Uh I have a lot of ideas for future work about around this. Again, this is really just scraping the surface. Uh you could do machine learning, unsupervised clustering, for example, uh taking certificates, uh grouping them by, you know, creating a feature uh vector and then clustering them using an unsupervised machine learning algorithm just to see graphically which certificates are uh more similar to each other. Uh so we can potentially find other clusters of interesting data. So that's one. Second one is uh finding different certificates issued for the same subject domains. This is if I were a customer like if I'm the customer here and if I'm wanting to secure my network,

that's the first thing I would do. I would be constantly looking at for my domains in in the CT logs let's say you know bitsite.com and looking at the certificates and for some reason when I see a new certificate issued for bitsite.com issued by a different CA that I use then I'll be worried and I'll I'll probably investigate that finding certificates posted long after they're issued. Why are people posting certificates here from 2016? Good question. I'm probably going to dig deeper into that. I think that's an interesting one. So, you know, let's end this. Um, basically what's the talk? The title of the talk is watching the watchmen. Who are the watchmen? Well, CT logs are the

watchmen. They are basically here collecting all your data looking at what what CAS are actually doing. Are they misbehaving or not? Who watches the watchmen? Basically, who is looking at the logs? Hopefully you, right? uh uh or us uh we need to we need to be looking otherwise you know if a tree falls in the forest and no one's there to hear it does it make a sound probably not so the only u uh reason why these guys at FINA CA got caught like miss misissuing a certificate for cloudare is because someone was looking at the logs and you know generally speaking you know there's blockchain today and people there's a lot of companies just looking

at blockchain data I don't see there's there are some companies looking into this, but I really would love to see more folks looking at this data set, which I believe is one of the most interesting data sets uh currently publicly available. Like it's by far the highest number of keys key material that's out there that you could just download for free. And so yeah, that's it. That's my talk. Uh thank you. Any questions?

There's nothing. >> Okay. Thank you, Lee. Thank you everyone. Let's uh go for a break. What would