
all right uh so my name is math mcfarren I am a site reliability engineer at let encrypt one of the world's biggest certification authorities uh today I'm going to talk um yeah about web pki revocation uh that it's broken what that means for it to be broken and what it is in the first place and how we are going to fix it so first uh what do I mean when I say the web public key infrastructure because that's kind of a vague term um really I'm talking about the certificates that you're using when a browser is talking to a website over https to know that it's talking to the right website uh specifically we're going to
focus on the use case of the public internet using public DNS public IP addresses with uh broadly trusted certificate authorities um that's ignoring a lot of the use cases of the same technology that's used for say internets and corporations uh or related x509 use cases in like email as or all kinds of other stuff code signing document signing um so we're focusing specifically on yeah the web basically when you connect to an htps website you uh have a certificate that is used to cryptographically bind the domain name that you're connecting to to the website operator's public key uh that certificate has been signed uh using a cryptographic signature by a certification Authority uh usually either the RSA or ecdsa algorithms uh
though that'll hopefully be something post Quantum secure in the near future the web browser is in charge of checking the domain name on that certificate matches uh what it intended to connect to it checks that the public key in the certificate has signed the TLs handshake properly and it checks that the certificate is signed by a trusted certification Authority that gets you basically the property that you are sure you're connecting to the right website as ATT tested by these series of signatures um there's a bunch of different like players in this ecosystem you've got a browser user which is almost everyone in the world uh you've got the web browser that they installed from a browser
vender uh Safari Chrome Edge or Firefox that browser is being used by the user to connect to a website run by some website operator the website operator gets their certificate from a certific if Authority like let encrypt and the browser vendors are in charge of picking who uh who is trusted by their browser as a CA um unless the EU gets their way in which case the EU will be in charge of it instead uh but things can go wrong maybe uh you you sold your domain name to someone else maybe you're Psy hacked maybe there's some major openl Vol that LE leak all your keys um basically there's some reason that we don't actually want to trust
that certificate anymore so in that case we need to revoke um certificates have an expiry date uh usually up to a year um and so there's a you know this this timeline that we don't have to worry about it after year but before then we have various solutions to to basically push that untrust date earlier um the certificate Authority is the one who's in charge of tracking that status um there's various circumstances under which CA are required to revoke certificates uh for example if the keys are leaked if the person who holds the domain registers it uh or potentially other use cases like if the ca Mis ISU it in the first place uh they might be
forced to revoke it um that can actually lead to a situation we sort of call Mass revocation where you know maybe there was a bug in the ca software or the process processes that led to a whole bunch of certificates being issued potentially improperly and and potentially is all you need for for policy failures like that where we can't have ultmost confidence that was issued properly so so one of the big challenges in revocation that we're not really going to get into is is mass revocation where you might have to revoke a whole lot of them really the central problem here is that we've added a new Arrow to our diagram and because this is sort of an
exceptional case where you know the browser user has to check with the ca to find out is this certificate revoked um we've got this path that is not really well exercised and has a bunch of Technical and privacy restrictions and is really what has led to all of the heartburn today where things uh aren't really working we have a big collection of options uh that we're going to go through uh in this talk and sort of talk about how we got from one to the next uh why we thought they might work why they didn't didn't end up working and where we're going to go in the very beginning we had certificate revocation lists uh this is
a pretty simple file format um that you download or your browser downloads on your behalf it contains basically a list of serial numbers for some issuer uh and it's signed and uh has an expiry date of the crl itself um the certificates have an HTTP URL in them that points to where you can find crl it's called a crl distribution Point uh and they sort of hundreds to kilobytes up to megabytes if maybe you've had the mass revocation and have revoked a whole lot of them the format is actually very similar to a certificate sort of just with different fields in it it's actually defined in the same RFC and uses the same asn1 kind
of encoding as certificates uh they've been around for you know basically the whole time but they don't really work because it's hundreds of kilobytes uh and we need to check revocation at the start of a TLS connection it's just way too slow to download them on the Fly unless you're really willing to you know wait 30 seconds for your website to load which nobody wants um because there are hundreds of megabytes potentially for a complete set you can't store all of them certainly not at the web pki scale uh or certainly you couldn't in the 90s when we sort of designed these things um most of the crl isn't going to be interesting to the client because it's going to have
a list of all of the serials on the internet that are revoked and most people aren't going to all of the websites on the internet um and sort of as a result of this uh most web browsers uh don't use crls today um there are some mitigations to the big siiz problems like having a bunch of different crls that we call sharded crls to make them smaller and sort of reduce uh what you have to download but ultimately uh it's not widely deployed um in the public pki the crls are widely used in sort of Enterprise use cases where we have one Enterprise CA you might have most you know of your sites controlled by that and you can push it
out to everyone's managed desktop easily uh but but doesn't really work at web pki scale we've known this for a long time so around the year 2000 uh the sort of second obvious design choice is just ask the ca if each CT is revoked which was built as the online certificate status protocol it's a simple RPC protocol uh again there's a URL that's in the certificate and you just call the URL and say hey is this certificate revoked um the request is signed by the ca again kind of like assert uh and that kind of gives you confidence that the ca you know gave you valid things um ocsp responses are valid for about a
week uh and you can cash them if you want um but it didn't really work out um the biggest problem is that this is a a service that the CIA has to run that has to effectively have 100% uptime if every browser everywhere in the world is continually asking if Sears are uh revoked because nobody can run 100% uptime certificates especially someone like a CA who this is really the only thing that they have any uptime requirements on um is probably not going to have a good time and so the browsers basically all uh fail open if they can't reach ocsp that means that a network attacker could just block ocsp which is the type of attacker that
TLS is trying to defend against which basically means that ocsp doesn't work uh the attacker you know while they men in the middle Ling the TLs connection uh just blocks ocsp checks and it'll continue to work or if they have a cached ocsp entry from when it was good they still get to man in the middle for a week uh until that expires in addition ocsp has really bad privacy policies every time you connect to a website or maybe you have a cach you know the first time you connect to a website in a week you are leaking essentially across the internet uh to anyone who's interested what domain you're connecting to or at least what CT
you're connecting to um that leaks it to both the ca who could potentially log this in their ocsp servers that would provide uh some great metadata or if you were sort of a um nation state actor or sort of otherwise a global threat actor with privileged Network perspectives you could really have a big uh amount of visibility into what people are viewing just from that ocsp server so because of these really bad bad privacy problems um Chrome has completely dropped ocsp um other browsers do do ocsp checks today uh the Microsoft Firefox Apple ecosystems um and it's sort of uncommon to do ocsp checks outside of the browsers uh there's plenty of other things like apis and client libraries
that might connect to to websites but they usually don't do ocsp um some firewalls and that kind of thing we do both um ocsp checks and crls um this was sort of an obvious design problem from the very beginning uh you know people didn't didn't you know Miss this they sort of knew it was going to happen so we designed another mechanism which is called ocsp stapling so in in the the version of ocsp I just described the web browser goes and asks the ca for the ocsp response which leads to our sort of privacy and uptime problems but there's there's a an alternate way of doing this because those requests are signed they don't
actually have to be distributed directly from the ca they can be distributed by the website so that's called stapling the web server uh you know maybe once a day or or or so uh will go ask the ca for the ocsp response download it save it on its uh server hard drive or whatever probably next to the certificate and then it can include that stapled in the TLs handshake along with the certificate so great it's fixed there's no more privacy problem there's no more uptime problem everything's fixed uh unfortunately it's everything's not fixed um the problem is that the server needs to do this step where it has to fetch and staple the ocsp response you do need
some availability of the ca ocsp server to do that um at least you know once a week or so um but the uh the web server isn't going to know when the ca is website is available um and it's sort of complex code to make this happen um uh the major web servers like Apache and engine x uh have notably really bad implementations of this um Apache for example uh would just sort of give up if it couldn't uh fetch an ocp response and like might not retry stapling later I think engine X had a bug for a long time where they would staple any ocsp response even if the response was revoked um and staple a
response telling you that oh actually mys revoked is not helpful um and there's sort of this there's no nothing that has forced people to actually implement this um so a lot of web servers just straight up don't support uh ocsp stapling um and that sort of led to the the death spiral of non-adoption um there is a flag that CA can set on search which says that you must staple a response CA don't want to do that because it would break any websites that aren't stapling browsers can't require ocsp because lot ocsp stapling because lots of websites don't staple the server software doesn't really care about implementing it well because of the previous two statements where nobody
really wants it so we're sort of stuck in this Loop of non-adoption uh where nobody I mean sort of everybody maybe agrees that this is a good solution but there's just too many problems with implementing it um and I think at this point uh most of sort of the industry has sort of given up on ocsp stapling it's just too hard to deploy um generally any change like this where you need to actually go change every single web server in the world is is not really practical um unlike browsers which is a very small number and they auto update it's it's really not true with web servers um people running 10-year-old copies of Apache on some ancient Centos
distribtion is is still a widespread thing that people have so the browsers are trying to fix this sort of without the cooperation of the site owners um Chrome has a thing they called crl sets Mozilla has the same thing they call it one crl um that support an outof band revocation mechanism uh the browser vendor is using the same crl format but they're pushing it to their browsers directly um as I mentioned at the start serials are too big to push the complete set so the browser vendors themselves are sort of doing this calculus about what is a high value revocation and what do they push out so an intermediate CA or even a root CA that's you know done
something bad might get revoked if you're a high-profile website and you're getting like news coverage about how people are man in the middling you then like yeah maybe that'll get pushed out um so it's an incomplete solution but it's easy to do and it has been has been done um so this is sort of you know the fire alarm of the internet but it's it's not really scalable however there's this 2017 paper called Sierra light a scalable system for pushing all TLS revocations to all browsers which is a scalable system for pushing all TLS revocations to all browsers um it uses a push based model similar to the previous uh serial sets and serial light to push Uh Sears to all
the but it makes them really interesting uh compression choices so one of the things that has happened in the last decade is a system called certificate transparency where all CS are logged to globally publicly accessible logs as well all of the crls are public so combining those two facts together has led people to develop um this Innovative uh compression technique to compress the bloom filters uh in in a very compact way uh that's less than one bite per revocation much smaller thanls and this actually allows browsers to push all revocations to all browsers relatively quickly um crls uh weren't actually required to be issued by Cas ocsp was mandatory um but Mozilla added the requirement to enable this um Firefox is
beginning to roll out crite uh other browsers have sort of vaguely indicated interest um Apple has also added a requirement for crls uh they're are Fortress of Silence so we don't know why but we could assume that they're doing something similar and uh Chrome have publicly indicated in talks that they might be doing something like this soon too one of the final problems is that you know this this still pretty big um but we have this like builtin uh mechanism with expiry of certificates and rather than sort of complex mechanism of ocsp pinning or these compressed sets that need to be powered by a browser vendor um we you could just make certificate lifetime shorter ocsp and crls are valid uh a
week maybe up to 10 days depending exactly on which compliance regime you're in um so if we can just get a new CT every week we don't need to revocation uh we can we can obsolet it completely so um that's sort of one current thing that's ongoing the Baseline requirements which is the sort of set of guidelines that the browsers and Cas have come up with together uh recently basically recognized this um and does not require evocation for anything under 10 days though that's going to change to seven days in the future um Microsoft's root program actually still requires ocsp which means all CA still have to do it but you know it might change in the
future now short lft certificates are actually sort of hard because you need automation um but unlike with this sort of death spiral of non-adoption that ocsp stapling has you probably want uh automated certificate issuance anyways and unlike ocsp stapling where you actually have to make software changes to your web server or your TLS stack to support including this extra pinned response anything that supports Sears can support shortlived CTS you just have to like write a new file on dis and reload it um so there's also Better uptime Properties with with automating your shorttime search because unlike ocsp staple we have to go to your ca to get ocp response you actually have multiple Casas with short searchs and you can
just switch um cloudflare for example uh always uses at least two Casa for all of the searchs they issue uh to support redundancy in case one is down uh including the you know to make that automation more accessible CA are also increasingly uh supporting the Acme protocol which was designed by Crypt um to uh make sure that clients software can be interoperable between Casa so we get better security from not needing revocation we get better uptime because we can switch CA and not have manual steps that can lead to uh broken websites and you can get better reliability those are this extra bonus uh which is that if your certificates have shorter lifetimes the data set of
currently ROK search shrinks which makes things like CR lights or even classic crls work better um so even if we can't go to 7-Day searchs right now every where um pushing that lifetime down will also make this a lot more effective uh there's some push to cap uh searchs to 90 days which will probably dramatically increase that so that's sort of our set of options today crls the two big and two slow to check on demand ocsp has like bad privacy and big reliability issues ocsp stapling is impossible to deploy seral sets in one crl they help but they're an incomplete solution calite is a working version of those and shortlived Sears finally obsolete revocation so if you run a website
you're a business with a SAS provider you should automate your Sears I think that's everything I've got today um thanks for letting me talk if uh you want to find me online you can find me on infos exchange on masteron or my own website um does anyone have any
questions have a question in the middle there um so the Acy protocol is a protocol that a piece of software running on a web server or or other piece of automation can use to talk to a c or request certificate issuance the traditional way of issuing searchs was you log into a website you like click some buttons and then you download a CT and then some engineer goes and uploads it to to a you know your web server the idea of Acme is the automatic certificate management environment or something like that uh um uh which which is designed to allow software to request certificates uh themselves if you use let encrypt um that's the protocol that let encrypt
uses uh to to obtain
certificates ctif why money
um I I would hope that certificate authorities uh can recognize the security wins um one of the big things is that ocsp which which everyone has to run right now is in many cases the biggest cost center in an ECA um I know at left Crypt we have like two orders of magnitude more traffic for ocsp than anything else um we spend a million dollars on database servers to power OC CSP right um that's a lot of money that could be doing other things and let's encrypt is a nonprofit we're supported by uh your donations and corporate grants and things um it really is uh potentially a big Financial win on on the you know the corporate side to to
get rid of ocsp um so so short live shts I mean short live shts are still going to require you know more load uh to issue those over and over again but the cost benefit analysis really shows that short live Sears be a lot better but the adoption problem there is that they do need to be automated which a lot of CAS are now doing but haven't always
done deal you money they check who you are yeah and guarantee to everyone and seems
like what yeah so so um the main thing that a certification Authority is tasked with doing is validating that a site owner you know is is controll has control of a domain name uh and that the same entity you know has that private key um this was in the past often done by hand done by a you know a room full of validation operators or something uh I think one of the things that leton Crypt showed is that we can fully automate it um through creation of software um that is really what has enabled this uh first of all it's what enabled that enpt to be free and to scale to issuing 3 million CTS a day you know we don't
have 3 million people sitting in a room each checking a domain name um but that's that's sort of once once the flow is automated you can run it as fast and as often as you
want yeah I mean there are they all need to automate I mean then they sort of all have um I don't want to say that they're dying because there's still lots of value you can get out of having an entity that gives you business support and and uh slas and all kinds of agreements like that and you know you might want to pay for that um and that's that kind of you know business relationship is what has powered C in the past so so really you know that's still going to exist the fact that they can optimize their own workflow doesn't mean that they're going to go away but it does mean that entities like let en Crypt who's doing
it for free uh can expand the pool pool of people who have CS we one in the
back okay yeah yeah I could yeah so extended validation um was a mechanism let me let me see if I can find an example um I can't really see my own screen on the projector so not going to try um the idea so so one thing I sort of mentioned what a CA does is they bind a domain name to a public key and you know they sign it extended validation adds not just a domain name but a business name what country they operate in the like employer identification number in some database that I don't know what it is um and that might give you more confidence that the entity you're talking to not just the web server is who they claim
they are um now that's really beneficial to ca because I can charge a lot of money for those um but uh it's it's there's been number uh of research projects which show that it hasn't really worked there's no signal to users that it prevents fishing or otherwise sort of prevents any problems it's really just an additional cost there are some use cases where EV is still beneficial uh for example in code signing binaries on Windows um you sort of want to know that the B AR came from the company you got it from um that's that's a different model than connecting to a TLS web server uh so uh EV has sort of you know
there's there's maybe some indication in a web browser still there's a little button you can click and it shows you the business info but nobody ever clicks it so it doesn't really do anything um and I don't think it provides a ton of value on the web um note that doing EV is separate from the issuance flow so if you do EV on you know your your ca's back end you can still use the Acme protocol to automatically get a new CT every week that that extended validation might still be good for a year right that's that's a business validation it's different than the actual issuance flow so so there's actually a little bit of
decoupling there that can happen um so so short live CTS are not uh you know in opposition or or you know incompatible with with EV uh yeah yeah maybe just one more question but the green vest yeah I'm just curious about gra I I my personal experience I never experienced ration so I never get a pop up say what what browser do you use Chrome right so Chrome basically doesn't do rication so so yes you you won't experience it um unless there's a major breach of google.com or something in which case they can use their uh crl sets mechanism to push it out but yeah otherwise I mean chrome doesn't check ocsp or crls or
anything so
um I probably you wouldn't experience it too often um the main you know case where stuff gets revoked is is if aert has been sort of mississuaga
[Music] the middling you so if there's no attacker who doesn't have a key then like you're not going to see a revok search um uh there was a just a complete example um there was a news story a couple days ago that uh the German police um intercepted some jabber server and and you know was was man in the middling their connection um that s was at some point uh expired which is how they noticed but if it had been revoked you know that's a case where there was some entity in between that had web server up um and if that sh have been revoked you might have noticed it in that case they were using the jabber
protocol which probably doesn't check revocation either but uh there there's really you know not that many places that that you'd actually run into this um except in the case of an actual attack using a a remote shirt okay great thank you so much Matt that was fantastic