HSMs, what are they good for?

And how does the cloud change this?

What is a HSM?

A HSM is a hardware device that can perform cryptographic functions in a “secure” manner.

The idea is that you can load your private key into a HSM and be sure that it’s safe from theft. Anyone tries to physical access the device and it’ll wipe (or literally burn) the data. So encryption or signing of data can be trusted because only the HSM has the key needed to do the crypto.

So, for example, there are modules for Oracle to allow for transparent data encryption to happen via a HSM. There are certificate authorities that have their root key in a HSM. International money transfer or trading may be signed and verified by HSMs.

Basically, HSMs are a big thing.

The problem

But the problem I have isn’t with the HSM, itself, but in how they are accessed.

Take a standard network based HSM. There are two standard ways a client machine can access the HSM:

  1. IP based. The server is “trusted” by the HSM to make requests. (an on-machine HSM card is really just a variation of this; we use the local hardware bus rather than the network).
  2. Authentication based. The server authenticates to the HSM (eg with an X509 cert) in order to make requests. There may be variations of #2 (eg symmetric encryption) but it’s all “private data” based.

This second method is becoming more common with dynamic resources. Think “cloud”; you might spin up 10 servers and get assigned IP addresses, then shut down 3 of them, then spin up 5 more with different addresses. At this point you can’t use IP address as a basis of trust, so we use an authentication protocol instead.

Attack paths

An IP based trust to the HSM is naturally exposed to any system administrator on the machine. Or any attacker that manages to get remote code executition. They can directly call the HSM since they’re coming from a forged source.

An authentication based approach runs the risk of the credential being stolen. Either via an exploit in the app or OS, or via whatever mechanism is used to provide credentials to the app. In this scenario the attacker may be able to make calls to the HSM from any machine on the trusted network.

In both of these scenarios the attacker can now call the HSM and have it perform the necessary calculations using the stolen access.

Where HSMs are useful

I can think of two ways that HSMs are useful… and these come into play once you have already been attacked.

So, assume your attacker has got an RCE to a trusted machine or has managed to steal an authentication credential.

At this point you can revoke access to the HSM from the bad machine, or revoke the bad credential. Now the attacker no longer has access to the HSM and thus to the private key. They can no longer forge requests or decrypt data.

Without a HSM the attacker may be able to steal the primary keys and so continue bad activity offline (forge requests, decrypt exfiltrated encrypted data), but with a HSM their access to the keys have been revoked. Sure, this stops your app from running… but you probably want that, anyway!

This makes it easier to limit the scope of intrusion and to recover from it (fix the vulnerability in the code, build a new machine, create new trusts, create new credentials), rather than having to revoke the primary encryption keys, re-encrypt any data, ensure any counter-parties are aware the key is no longer valid…

Sure, there’s still clean up and rollback type activities to be done after an intrusion but at least the key, itself, is still secure.

Similarly, HSMs help by limiting attack paths to machines that can see the HSM; if there’s no IP route between an attacker and a HSM then they can’t make requests. The attacker may have been able to exfiltrate the authentication secret to talk to the HSM, but since they can’t reach the HSM from outside the network then they can’t make use of it. They need to maintain a persistent intrusion path, which may make it easier to be detected.

How does the cloud change this

Dynamic elastic compute pretty much rules out IP based authentication. In addition some PaaS systems or container ecosystems may use NAT rules so that multiple applications all have the same externally visible IP address.

So we have to use an authentication based approach. This suffers the previously discussed problem of getting the credentials into the application.

You need to make sure that you’re building your virtual networking constructs so as to ensure that only approved IP ranges can reach the HSM (e.g. in a 3 tier architecture, the application and database tiers may be able to reach the HSM, but the web front end tier can’t).

This, of course, raises the question on how well you trust the cloud provider to maintain network security at their end; whether their stuff could abuse hypervisor and control plane access… but this applies to the whole cloud environment!

What about cloud HSMs

Amazon can provision you a Cloud HSM which you can use instead of your own on-premise HSMs. The idea is that they are inside your VPC and so the latency is a lot less and so applications that depend on them can see increased performance.

The Amazon solution is based on SafeNet Luna SA devices, which provide a strong separation between “administrative access” and “application access”. Amazon staff shouldn’t be able to access your secrets in the HSM.

Summary

A HSM doesn’t appear to provide any real improvement for an active attack via an RCE or similar; the attacker can use the credentials and access rights of the application to get the HSM to perform activities.

However the HSM can help limit the consequences by making it easier to revoke and restrict access to the secret key.

The weakest part of a HSM is in the secret management; how do you help prevent leakage of your credentials, while still allowing for automated deployment. In a PaaS environment you may have inherent trust on the PaaS administrators, since they may be providing that initial “identity”.

Question

Are there any other ways a HSM can make things better? Please let me know!