I’ve previous written about encryption and hashing and why things like customer passwords should never be encrypted.
Sometimes, though, you need encryption because you need to get the raw data back.
Now you can apply encryption at different layers. Some are easy; some are hard. What you need to be aware of, though, is what they protect against. There is no one-size-fits-all solution
A standard app
In a common scenario we may have an application that writes data to a database; that database persists data to disk.
Here we’re looking at things like encrypting the whole hard disk. This could be done via software (e.g BitLocker on Windows, LUKS on Linux) or via hardware (self encrypting drives - SED). The idea is that the storage mechanism itself is encrypted, and the data is decrypted when read. The key used to enable to encrypt/decrypt cycle is typically entered at boot time; without the right key then the data is unreadable. For a laptop you would typically need to enter the password at power up; for SED devices the key may be negotiated via the hardware BIOS/TPM.
This helps prevent against physical theft; eg if someone steals your laptop then they can’t read your data without knowing the unlock password. If someone steals an SED device from your datacenter (or a site engineer accidentally walks out with one after performing a replacement) then the device won’t be able to talk to the motherboard to get the unlock key.
This is the sort of encryption used on iPhones and newer Android devices, and it’s important for the same reason that companies use BitLocker on laptops; if the device is lost or stolen then the contents can be unreadable.
Once the storage device is “unlocked” then reads and writes are encrypted transparently to the OS and applications. They don’t need to care about it.
What it doesn’t protect against, though, is against data access once the device has been unlocked. So an SA on your Linux server will be able to read all the data (the device decrypts it for them!), for example.
This layer of encryption is, essentially, transparent to higher levels of the stack which makes it nice to implement, especially on mobile devices. Everything that runs on the machine pretty much runs unchanged.
So be aware of the limitations: device level encryption can protect against loss of the device but can not protect against attacks at the OS or higher.
So we can go up a level; we don’t want our SAs to be able to read our data. Let’s encrypt at the database level, instead. The common term for this is “Transparent Data Encryption” (TDE). Oracle, Microsoft and MySQL are examples of this.
Now the database does the encryption/decryption. Your SA trying to read the datafiles will just see rubbish, but the application will be able to run unchanged. We’ve protected against the SA.
What it doesn’t protect against, though, is the DBA. The DBA can connect to the database and read anything. It also doesn’t protect against the SA trying to get hold of the encryption keys or gain DBA privileges (switch to the Oracle group?)
This level is effectively transparent to the database users (hence the TDE terminology), which makes it nice to implement (no application code change is necessary).
So be aware of the limitations: database level encryption can protect from disk loss or for someone being able to read the raw data files, but it doesn’t protect against authorized privileged users of the database (DBAs) or people who can become privileged users.
Application level encryption (ALE)
So now we get into the application itself; the app is modified to encrypt the data. You might use a tools from companies such as Vormetric or Voltage, which can be configured to talk to a HSM to get encryption keys.
At this stage the application can decide which columns to encrypt. You might want to encrypt a customer’s name and address, but not necessarily the date of a purchase.
ALE is the most intrusive into an application because the app needs to be modified to perform the encryption.
Depending on the type of data you may want to use Format Preserving Encryption so the data looks in the same format as the original (e.g. credit card numbers), which will allow some applications to continue to work on the encrypted data, rather than needing to decrypt it every time. (This is a good use case, when storing data in the public cloud; if your app can work on FPE data then the risk of exposure by running it in the cloud is a lot smaller than having to decrypt it each time).
Now neither the SA nor the DBA have direct access to the data, although they still may be able to abuse privileges to get hold of the encryption key or authentication tokens to talk to the HSM.
Of course your application may still need to be able to decrypt the data (how can you send an order if you can’t decrypt the name and address?), which means that you are still at risk from rogue developers or errors at the application layer, but you may be protected from a SQL injection type attack because the resulting dataset would be encrypted
What about cloud storage?
You need to think of “who has access? who can attack?”. So let’s say you’re storing stuff in Amazon S3 and decide to use server side encryption. Accidentally you allow the buckets to be read by the world, but without decryption. So you’ve protected against “raw data theft” (compare to disk leaving the data center). But you haven’t protected against theft by “authorized” users, because the S3 API will do the decryption for you. You’re also vulnerable to admins of the Amazon account who can modify the rules…
Similarly if you use EBS attached to the S3 instance and encrypt it with LUKS then you may be safe from the admins, but the SAs of the VMs can still access the data. Of course if you’re doing hands off delivery and don’t allow anyone to login then there may not be any SAs :-)
I like to look at technology as a stack of things sitting on top of each other.
So your app sits on top of a database, which sits on top of an OS, which sits on top of a machine.
Encryption at different layers protects against attacks at that level of the stack; SEDs can protect against “machine” attacks, but doesn’t protect against higher levels of the stack.
In a modern deployment you should consider device encryption as the lowest level requirement, even inside a datacenter. Disks do walk, even with the best of processes. But just because you have this low level encryption it does not mean you have “solved” the encryption problem. You need to look at your attack surface and pick a level that’s appropriate.
Sorry, developers, just because you have an encrypted disk or database TDE doesn’t mean you can avoid doing ALE!