← All talks

Crypto agility in a symmetric environment - managing HSMs post-quantum computing

BSides Joburg33:4251 viewsPublished 2025-09Watch on YouTube ↗
Speakers
Tags
About this talk
Hardware security modules are critical infrastructure in payments and finance, but their legacy key management processes rely on manual ceremonies that take weeks or months. Quantum computing threatens current cryptographic algorithms, requiring organizations to adopt crypto agility—the ability to rotate keys and algorithms rapidly. This talk explores how to modernize HSM workflows through asymmetric key exchange, reducing ceremony time from weeks to minutes while maintaining compliance and improving operational resilience.
Show original YouTube description
HSMs (hardware security modules) and their legacy processes are the silent backbone of our core payments infrastructure. Quantum computing poses a significant threat to our cryptographic landscape, and evolving our payments infrastructure to meet new threats requires planning and orchestration between many organisations. This talk is primarily aimed at security professionals in the financial sector who work with HSMs or write policy on managing them. In addition, those preparing for quantum computing or who would like some insight into core payments infrastructure and HSMs will benefit from this talk. Technical topics will be explained sufficiently so that those with no HSM experience can learn how they work and what the ecosystem looks like. Key takeaways are how quantum computing changes the HSM threat model, and the steps that can be taken to prepare for quantum computing. A change in key exchange process is suggested which will assist in preparation for quantum computing and improve operational efficiency. About Amy Smith: Amy is a senior security engineer at EFT Corporation where she does all things Blue Team, including cryptography, PCI compliance, and begging people to do their training. About BSides Joburg: Website: https://www.bsidesjoburg.co.za Twitter: https://www.x.com/bsidesjoburg Instagram: https://www.instagram.com/bsidesjoburg Masterdon: https://infosec.exchange/@bsidesjoburg LinkedIn: https://www.linkedin.com/company/bsides-joburg
Show transcript [en]

[Music] Um, hi everyone. Today we're going to talk about HSM. So, I guess first off, who am I? Um, I'm a security engineer at Fintech. So, we operate our own HSM, uh, which we use to offer kind of a wide variety of products. And then we also have field engineers who manage HSM at our customers. I provide support to some of those guys as well. So, I work with HSM a lot. It's really important to note I'm not an expert. Um, I'm just someone who works with HSM and has for a little while. So, in order to talk about HSM, I think we all need to get on the same page about what they are.

HSM stands for hardware security module. They come in all kinds of sizes and they perform a really wide variety of functions. So you can get an HSM that's about the size of a thumb drive and you get HSM that are as big as the service you rack in your data center. Um even the secure enclave on your phone that does encryption with your biometric data is technically a kind of HSM. Essentially HSM share hardwarebacked security controls and what those do is they prevent you from removing data from the HSM and they limit what operations can be performed on the HSM. And um it provides isolation between the hardware where um operations are performed and the outside world. So you can perform

encryption operations without anyone interfering or snooping um in that function. Today I'm going to be talking about the HSM that are quite big and they sit in the rack of pretty much every bank, every telco. They're used a lot in finance. Um they're used a lot where root of trust is really important. So like DNS um certificate authorities. So how does an HSM work? For the purposes of this talk, we're going to consider an HSM to be a black box. We don't need to know much more about it. You put keys into an HSM. Um and it can take a key that you hand it as well as data, perform a cryptographic operation on the data and produce

encrypted data. Alternatively, you can hand it encrypted data in keys. It'll perform some more cryptographic more cryptographic operations and it'll give you some decrypted data. For everyone who's doing the hunt, please scan the QR code on the screen before I forget. Um, so they allow you to encrypt and decrypt data. They also can perform basic cryptographic operations to generate keys. So you use an HSM to generate most of the keys that you would use to perform encryption and decryption operations on your HSM. What are the security features of an HSM? Most importantly, they are tamper resistant because the physical security of where you keep your keys is really important right? What that means is it's really easy to

accidentally push them into uh tamper mode where if it looks like someone is trying to physically interfere with the HSM, the HSM is going to wipe all of the data on the HSM. Um, and recovering from that is not super easy. What that means is that if you pull out a cable, you're going to wipe your HSM. If you take a screwdriver out and you try to open the hardware box, you're going to wipe your HSM. If there's a major fluctuation in temperature or voltage, you are going to wipe your HSM. If you move your HSM, you're going to wipe your HSM. Uh, if you sneeze too hard near the HSM, you're going to wipe

the HSM. And a lot of engineers will tell you about how they have destroyed the HSM and had to recover. Um, what's also important to know is that modern HSM in this context usually only store one or two keys and you use those keys to encrypt and store all the other keys that you use, but we'll talk more about that later. Those are called different names depending on which vendor you're talking about. I'm going to call it a local master key, an LMK. Um, but yeah, more on that later. If your HSM has been configured to be compliant, it is not going to allow exporting clear keys. So, you're never going to be able to take um a full key

off your HSM and look at it in clear text format or you shouldn't be able to. How do we secure our HSMs? And by we I mean the community not me specifically in my company. Um although we we do follow best practice. Um firstly HSM are managed remotely from a dualc controlled access restricted room. That's the ideal situation. If you haven't managed to get that together you have to go visit the data center and manage it locally there. You need to make sure that you're using an air gap laptop or a dedicated device. So, some vendors very kindly produce devices that you can use to manage your HSM. Otherwise, if they don't produce those, you have to egg up a laptop

yourself. You need to do that under CCTV surveillance so that everything you do is recorded um and and we have a clear history of what's happened to the HSM. Usually, you need to coordinate your security person and then another like five people to access your HSM. uh you're going to need two custodians to access your secure storage and then you're going to need another three custodians to manage your keys. I'm going to talk more about this later. So, how do I use an HSM? Why why would I have an HSM? Um I'm going to talk through a very industry specific example of how we use HSM. Um I've simplified it a lot. uh you usually manage HSM

connections between you and every other partner you have. So I'm going to talk about just how we share information between three different um uh people or three different companies. Um so when you're issuing credit cards usually you have or always you have a card scheme. So Visa or Mastercard issues your credit cards. You have a card issuer and you'll have a 3DS provider. So, they're the ones who uh manage the 3DS prompt that you receive when you try to make an online payment uh to that's like 2FA to verify when you make your credit card payment. Every one of these people is going to have an HSM on their premises. And um in this example, Visa is going to

generate some keys that we are going to use as the card issuer to uh create cards. We are going to share some of those keys, a subset of them with a 3DS provider who's going to use them to validate the 3DS response they get from Visa. So, how does that work? Like I said, Visa is going to perform a key generation operation using their HSM. They're then going to negotiate a session with us. Um, all of these keys are symmetric keys. No public private keys are involved in this process. So they're going to negotiate an symmetric session key with us, the card issuer. They're going to encrypt the keys they have generated using that session key

and then share them with us, usually like over email. What's going to happen is we're going to receive the keys encrypted under the session key. We're going to perform a key translation. And that just means you take the keys encrypted under the session key, you decrypt them on your HSM, you re-encrypt them using the LMK that is on your HSM so that you can store them and use them later. And then we perform a secondary key translation um so that we can share them with our 3DS provider. So to do that, we negotiate a session with our 3DS provider. That session key is also generated on the HSM and shared with our 3DS provider. And we will take a subset

of the keys Visa has shared with us. We will convert them from being encrypted under the LMK of our HSM. We will convert them to the session key. We will send them to the 3DS provider encrypted under the session key. The 3DS provider um is then going to perform their own key translation and convert them from the session key to their local AMK for storage. I hope that made sense. Ultimately what has happened is that everyone has received keys and everyone has them stored under their HSM's specific key. So manually or now we have shared keys and we want to offer a service to our customers. Um 3DS process is very complicated. I'm not going to take everyone through it

because there's no need. Um I'm going to explain one API call. So, my customer um wants to make a card payment online. Um they're going to receive a 3DS prompt. They're going to approve the 3DS prompt. Things are going to happen. Magical things happen. Visa is going to receive the result of that 3DS prompt. It's going to be encrypted under one of the keys that they shared with us that we shared with the 3DS provider. They're going to send that encrypted result to the 3DS provider. The 3DS provider is going to take that data, give it to the HSM. It's going to give the HSM the appropriate key, which is encrypted under the HSM LMK. It's going to decrypt

that data, and it's going to tell us whether or not 3DS was successful. We're going to approve or or deny the charge on the credit card. So, this is why HSM are important. We use them all the time and our applications call them all the time to facilitate all of the services we offer. This is common across the whole payments industry. So I want to talk a little bit about the two pieces of that puzzle that we kind of brushed over which is key generation and session negotiation. Firstly key generation um on my blackbox HSM I'm going to generate a key. my by me I mean the key manager who's going to manage the whole key uh generation

ceremony is going to generate this key because keys cannot be exported in the clear they are then split into clear text components so I have one custodian who's going to step up to the HSM uh or the terminal depending on where we are and um write down their component of this key they're going to keep put it in Tampa evident packaging and put it in secure storage. Custodian 2 is going to step up to the HSM and write down their piece of this encryption key. Custodian 3 is going to step up to the HSM and write down their piece of the encryption key. So, what that means is hopefully no one has seen anyone else's

key components. Um, yes, this is insane. If you haven't dealt with this, I envy you. Um, I'm sure anyone who hasn't dealt with this is going, "What? you put your encryption keys on paper and yes we do um but they go into Tampa evident packaging and then secure storage so it's okay right um how do we secure that ceremony firstly segregation of duties is really important so you have your key manager your ceremony manager you have your security officer who's going to be one of the two people who unlock your secure storage and you have your custodians who are each going to receive a key component In order to avoid collusion, we go to great lengths to make sure that they're

not very friendly and that they're not anyone with the same role um is in the position to be influenced by the same people. So each of these roles has to be occupied by people with different line managers. Um ideally they're also specifically your custo your key custodians. They should also be nontechnical employees. More on that later. We also have to keep immaculate evidence of the ceremony. So, a lot of paperwork is involved. Um, each Tampa evident package that a key component goes into has a serial number. You have to track that. Every time you rip the package and take the key out, you put it in a new package with a new serial number that

you track. So um the ceremony manager apart from like lighting candles and incense and handing out robes and chanting has to keep really immaculate evidence of the ceremony to make sure that we don't fail our orders. Okay. Session negotiation. Uh before I talk about negotiating sessions with HSM, I want to talk us through a TLS negotiation which I think most people are familiar with. That's certificate based. Um so we use public private key um encryption for this. So normally what happens is we have a server and they advertise a certificate because um like associated with the website. You have a a client who's going to want to visit the website and when they try to initiate a connection they say hi.

The server goes hi here's my certificate. Um the client goes great thanks so much for that. I'm going to generate a key. I'm going to encrypt it with your public key and I'm going to send it to you. And now we have a session key that we both know. That key is symmetric. Super easy. That took milliseconds. With HSM, it's a little bit more complicated and it's very manual. So if I'm over there on the one side and I want to exchange a session key with my 3DS provider, I'm going to have to generate three components on my HSM. I'm going to distribute those components to my custodians. They then have to get those three c uh

three components to their counterpart custodians at the 3DS provider. How do we do that? We can ship them. But bearing in mind that these are now physical pieces of paper, physical security suddenly becomes really important, which is not something that I was used to dealing with as a cyber person before this. So that means we have to ship them all with different shipping providers. So you need like a FedEx and a DHL and a Postnet. When you're operating in Africa, a lot of the time you don't have three reliable shipping providers. So you have to stagger sending your keys with the same provider so that your keys are never sent on the same uh mode of transport.

You want to make sure they don't end up on the same flight, same car. So we'll send one key with DHL. We'll wait a week. We'll send the next key. We'll wait another week. Uh and we'll send the last component. and then we'll be done. Um, alternatively, we can physically send my custodians with their components to the 3DS provider, but again, you have the same physical security restrictions, so they need to be on different flights. They can't travel together, so on and so forth. So, that takes quite a while. When my 3DS provider receives the key components, they're going to combine them where the HSM is going to perform an exor operation on those three

components. Um, and they are then going to have the session key which they're going to use to encrypt data and send to me. Awesome. We've done it. We have established a session. And that took days to weeks, sometimes months depending on how badly this goes. A really important thing to note here is you can pick your employees but you cannot pick your customers. So because this is a very manual process, people are writing down key components. um people have to type in very long pieces of or very long strings into the HSM. Um there's an enormous room for error. So we find a lot of the time if you receive written components, you can't always

read the different characters. Um and so you have issues plugging them into the HSM, things fail. Some um of the organizations we work with take um the insider threat incredibly seriously and they choose key custodians who are not only non-technical but are actually afraid of computers and that actually makes it really difficult when you are trying to get them to type in long strings accurately. So you can go through the process of trying to combine keys three four five 10 times. So this process can take hours um and if someone has made a mistake it'll take hours to realize you contact the organization say hey there's a mistake in the key component we think they have to

regenerate and you start this process again um so this requires a lot of patience mental fortitude people skills very common in the IT industry um the last thing I want to just explain very quickly is key storage so I've told you we have our HSM we have the LMK on the HSM. Um, I can either export keys from the HSM as components as we've discussed. I can export keys from the HSM encrypted under the LMK, which is how I'm going to store them for use by my application. Um, but what about the actual LMK on the server? Like the LMK on the HSM? Uh, we need resilience here because this is super important, right? All of the keys

my application is using are stored encrypted under this key. I really don't want to have to go back to paper components and recombine everything I own. Um, so resilience for this is incredibly important. When you generate this key, you split it over usually three smart cards. If you are smart, you split it over several sets of three smart cards and you keep them in different locations. Um, and you make sure that they are really secure. Unfortunately, smart cards do fail. So, you do need to have the ability to fall back on another set of cards. Um, okay. So, I'm just going to take us back to this process to make sure everyone is on the same page as me. We've now

explained how keys are generated and how painful that is. How sessions are negotiated and how painful that is. Key key translation is pretty okay. An application can do that in seconds, milliseconds. It's excellent. Um, that's kind of the normal process that we work with as HSM engineers. So, what's the threat model currently for my life? It's a massive effort to keep your key secret. Um, we have massive investments in physical security. Um, massive amounts of time and effort and TDM go into making sure that all of this is done well. We take the insider thread really seriously because we have to because we give our keys on paper to our employees. Um, and that requires a lot

of management of them. We have to rotate custodians regularly. Um it's an it's a serious issue for us. There's quite a low likelihood of our session keys being broken. Um even if you are using deprecated algorithms like triple days, you'll find a lot of organizations do lowkey still use triple days uh for some of their session keys. Um because you're encrypting small amounts of data um and sending them not very regularly. So the amount of data that's available to try and use to crack your key is actually quite small at the end of the day. Even after years, there's a very weak compliance requirement for key rotation. Technically, you should do it very often. In reality, it's not going to

make much of a difference to your audits. Um, and that would be why a lot of people are very slow to change keys and algorithms. Algorithms are also phased out very slowly and we get plenty of warning. So, triple days was deprecated years ago. It's still very slowly being phased out in a lot of industries. It's going to take a little while, I think, before we're done. So, that's the threat model we currently work with. Our main focus is on the physical security. Um, our keys and our key management. Again, the focus is on the the internal management of those keys, but we don't put a lot of effort into rotating regularly and managing the

estate. Well, so this talk about quantum computing that was on the title slide, right? I know it's we've been here 20 minutes. Um, estimates of when RSA 2048 is going to be cracked are varying. I've seen some estimates as early as 2030. NIST has disallowed its use after 2035 because RSA and all the asymmetric algorithms are um fundamentally vulnerable to being cracked by quantum computing because of Shaw's algorithm. Very much not an expert. I saw a talk on it once um but I believe NIST AS 128 has no official deprecation timeline yet. Um but it is in NIST's collision category one. That means they don't have a lot of faith that it's going to be secure for very long. Um,

and in all likelihood, it's probably going to have to be retired on the same timeline as RSA to avoid or RSA in general. As a consequence of this reality, a lot of people are investing heavily in harvest now, decrypt later, which is where they're harvesting all sorts of data off the internet and they're storing it uh waiting for the time where they're going to be able to decrypt it. um when they can use a quantum computer to do so. So post quantum your threat model looks really different if you're trying to operate HSM. Suddenly we have much quicker and possibly more frequent changes to which algorithms are secure, which key sizes are secure. Right now

we're projecting that a whole bunch of them are not going to be secure without having really quantum computers to available to test and prove this. Once we do have quantum computers, I'm sure the research on this is going to accelerate. Things are probably going to get worse for us, not better. So, as a consequence of that, your session keys and any keys that you have transmitted encrypted under your session key are likely to be compromised more frequently. Also keys that we're using now which we are not in uh rotating regularly such as our session keys such as even some keys used to issue credit cards um may be compromised in the future while they're still in use because they're stored now.

So even if we start transmitting those keys under secure compliant algorithms in 5 years, the fact that we uh transmitted them now under vulnerable algorithms can be enough to cause those keys to be compromised. Um and tracking which keys are transmitted under which algorithms therefore is now super important. We need to be keeping track of that now so that we know what we need to change. And basically the takeaway is we need to focus a not less on physical security. It's still really important. It's never not going to be important. But we also need to put a lot more effort into rotating our keys and managing this process better. So I Googled how to prepare for quantum

computing and I was told two things by the internet. The first is you want to adopt quantum resilient encryption algorithms and key sizes. The second is you want to become crypto agile, which is a really cool word and it just means you need to be able to rotate keys and algorithms quickly. You want to work like a dev team, not like a legacy infrastructure team. Guys, that wasn't a joke. Um, okay. Step one, how do I become compliant? We need to investigate our systems. A lot of us are not really sure about which algorithms are in use. Keeping track of all the keys that are in your estate is actually incredibly difficult if you run a big estate, especially

because this is legacy technology. You have stuff that's been sitting there for 10 or 15, 20 years, and tracking it down can be really hard. So, you have to perform that investigation. You need to create a key register. A lot of this is required for new PCI compliance. So, hopefully you're doing it anyway. Um, you need to include the size of your keys obviously and you need to be tracking the algorithms that are used to generate and transmit those keys. You also need to start to plan to migrate all of your keys and your applications that use those keys to AES 256 which is currently considered quantum resilient um or to an asymmetric or a quantum

resilient asymmetric algorithm like FIPS 203. Step two is you need to improve and this is becoming crypto agile. You have to improve your turnaround time for key rotations and algorithm changes. How do we do that? It's really, really hard. Firstly, you have to identify your development bottlenecks. Um, and by that I mean, if you have legacy pieces of code that people are really afraid to touch, they have to start learning how they work. They have to touch them. You're going to have to change the encryption algorithms that you're using. Um, you need to make sure that all of your encryption code is as modular as possible, that your developers are familiar with it. are not afraid of it

and that they can change it securely without undermining your cryptography by accidentally um I don't know doing something really dodgy. Um secondly, you need to implement asymmetric key exchange and I'm going to talk a lot about that because it's really important to me personally. And then the most important step is you have to practice. So right now it might not be feasible to move over directly to a quantum resilient algorithm and key size. Totally fair. Start by moving over to a slightly better algorithm and key size. You're actually doing yourself a favor by going through intermediary steps. um just getting off triple days going on to AES starting at key size 128 and then moving over to AE AES 256 when

you can can make a world of difference because your team gets better at rotating keys and changing algorithms the more you do it and you really like all the incident response advice you don't want to have to do this for the first time in an emergency. So let's talk about asymmetric key exchange. How can we do this? Um, if I want to share a key with a counterpart and I publish a certificate, all I have to do is send that certificate to my counterpart, they're going to generate a key, encrypt it with the public key in my certificate and send it back to me. And we have a session key and that's with minutes instead of days, weeks,

maybe months. And it's PCI compliant, which I think is really important to some people like me. Um so if we go back to this process that I just spoke us through um and we make use of asymmetric um session negotiation. What happens is when I generate my key I don't need it in components anymore which means I don't have to go through the whole key ceremony process. my application can send a message to the HSM saying hey generate a key export it under this um uh public key to share with this person and the HSM will do that in seconds. So key generation suddenly becomes really easy. Session negotiation also becomes really easy because now all I have to do

is send that encrypted key to my counterpart. they're going to receive it, pop it on their HSM, decrypt it with their private key that's already on their HSM and session has been um created. So all the difficult steps in this process have suddenly become much easier and we can do them in minutes where before it was taking a really really long time. So I said this was PCI compliant. Just to give a little note on how to do that, you need to make sure that you generate your PKI using your HSM. So, your private and public key should be generated on your HSM. Your private key should only be stored under your LMK. Um, and then when you generate

your certificate, please get it signed by a CA to make the process a whole lot smoother. Please also note that the AWS payment HSM and a lot of other cloud HSM already do this. It's very common practice for them. Um, and if they can do this, us on legacy people can also definitely do this. So, what are the implications of doing this? We're going to have a much shorter key exchange time. We're going to have much simpler key storage because I don't have to keep track of a million paper key components anymore. I can keep all of my keys encrypted under my LMK. I can back up those keys in as many places as I want. And as long as I have proper DR

in place for my LMK, so a lot of smart cards in a lot of different places, DR is really simple. No paper involved. uh one ceremony, get your LMK back on HSM and your A4 away. I've also drastically reduced my insider threat because I don't have to worry so much about key custodians and taking them out and giving them lots and lots of different keys all the time. And I've made my life personally much much better because I don't have to do so much paperwork. And that's why I'm here. So why are we having this discussion? Firstly, because I I did go to a quantum cryptography talk and I went, "Wow, that's really interesting. How do I what

what implication does that have for my life?" I started looking and I realized it's actually really hard to prepare a legacy environment for a threat like quantum cryptography. It's going to take a really long time and at my small company, it's going to take me years. if you have the kind of big enterprise inertia that a lot of the people I work with have, you're going to have to start yesterday. Um, so I wanted to kind of bring it to everyone's awareness so that they can start if they haven't. The other reason is that the symmetric encryption ecosystem is communitydriven. I think you've seen from our diagrams, it's all about we communicate with each other directly. No one is setting

standards at a kind of bigger level than that. Uh the way that they do with the internet. Um which means I can't improve things if the people I communicate with aren't improving things. So I'd really like us all to work together to improve things. The second reason is that new threats present new opportunities to improve our legacy processes. I have found that motivating for security projects in companies where you're a cost center can be really difficult. It often requires a proper strong narrative. You need to capture the imagination of your execs. Convince them that it's worthwhile. Um, if they're not so processminded and the improvements on my previous slide are not enough to get you the time and money to do this,

hopefully the big shiny threat of quantum computing is enough and that will be what gets them to do this. This is a request for collaboration. I really hope that um I'll be able to get hold of a lot of the HSM management teams at some of our organizations in this room um through doing this process because for us for any one of us to improve our processes and become ready for the future kind of all of us have to do it um so I really hope we can reach out we can work together um that's why I'm here if you are interested in HSM and you want to keep learning about them. There are a couple of talks that

are really worth a watch. Care and feeding of HSM's key management in hard mode is so funny. Um, I kind of gave an idea of how easy it is to brick your HSM at the beginning of this talk. Um, Nick talks about it for 45 minutes. Every story will give an HSM engineer PTSD, but they're really funny. It's worth doing. Um, the second talk that I think is worth giving a watch um is securing DNS sec with ritual and ceremony. I gave a very brief overview of the key generation ceremony and the ceremonies associated with this. Um, she went into massive detail and it's it's really interesting if you want to get an idea of of how these ceremonies work. You can

also actually watch the DNS ceremonies on YouTube if you feel so inclined. Um, but yeah, that is me. Thanks for coming. Do you have any questions? [Applause]

There we go. Yeah. Yeah. Thanks. Great talk, Amy. Um, what do you think about companies and fintex who are not trying to run their own ATSMs? They they're actually outsourcing that to third parties. Um, I think it's it's about managing a third party relationship. So, what's nice is I think the companies that dedicate their time to managing their HSM are the furthest along in terms of managing them in um like crypto agile ways and also preparing for quantum computing. So um I don't know who you outsource your HSM management to but if it's say like future XC cloud um I think they're already producing um or making quantum resilient algorithms available for use and it's just about securing

your development cycle and making sure that your developers are able to change the code and upgrade key sizes and algorithms quite quickly. So the worst part is not your problem which is nice. Cool.

Thanks guys. [Applause]