Backup and restore

How quickly can you restore?

Have you tested your backups recently?

I’m sure you’ve heard that phrase before. And then thought “Hmm, yeah, I should do that”. If you remember, you’ll stick a tape in the drive and fire up your software, and restore a dozen files to a temporary location. Success! You’ve proven your backups can be recovered.

Or have you?

What would you do if your server was destroyed? Do you require specialist software to recover that backup? Do you still have the install media? The license keys?

What would you do if the datacenter caught fire and you had to rebuild from scratch? (Assuming you ship your backups offsite…)

Or had everything encrypted by malware…

SF MTA

In November 2016 the San Francisco Municipal Transportation Agency got hit by ransomware. It encrypted a lot of their files. They were unable to run the terminals to allow passengers to buy tickets, and were forced to allow free travel.

Now in many of these cases the victim ends up paying up to get the decryption keys to recover their data. But the SF MTA didn’t do that; they went to their backup tapes and restored everything they needed.

Document, test

When you are in a disaster recovery situation you need to have standard processes and procedures fully documented. How do you restore data? How do you recall tapes back from offsite? Do you know what tapes to recall (hint: the tape inventory may have been encrypted!).

Simple procedures that you test are critical. Assume a worst case scenario and work out how you would recover from that. Note problems that arise during the tests and update your procedures.

Protect the backups

Many backup systems leave data “on line”, especially those that backup to a remote disk. If these backups can be modified then they could also be encrypted. A backup is useless if you can’t read the data.

These backups contain pretty much all your data. They should be treated as highly confidential data. If you have PII (Personally Identifiable Information) in that backup then you need to treat it the same as any other PII data. This may mean it needs to be encrypted… in which case you now have encryption key management problems (not so “simple procedures”).

You also need to be careful where and how you store your backups; again in November Michael Page, a UK based recruiting firm, had their data leaked. These were SQL backups from a service run by Capgemini. Somehow production quality data was placed on a development server, and exposed.

Personal anecdote

I have a very simple process for my backups; it may not be efficient, it may waste disk space… but it’s simple.

I use the native OS backup software. So for ext[234] disks I use the dump command; for xfs disks I use xfsdump (and when I ran SunOS or Solaris I used their dump commands).

On a Sunday I run a level 0 backup; that’s everything. On Monday I do a level 1; on Tuesday a level 2… so each day has an increment on the previous day.

These backups are then ‘rsync’d offsite to a second server. If the worst happens then I could build a machine and boot it from a DVD and then restore all the data over the network back to the disks (no special software dependencies! It’s part of the standard OS). I have a simple process; restore the latest level 0 then all the incrementals.

I got to do this for real, once. I was in Newark Airport, ssh’d home to read my mail, when I hard crashed the server (known issue; if I ran X twice then it hard crashed). Damn; I wasn’t going to be home for 2 weeks! All my mail and stuff would queue up on my external machines, but… grump

I had a brain-wave; I had a second machine where I was playing around with virtualization… could I recover the broken machine into a VM from the backups? I fired up a new VM, custom configuration for disk sizes and then restored the data. Got on the plane while the restore was running. When I got to the hotel I finished up the process and rebooted… and it worked!

Indeed it worked so well that when I finally got home and rebooted the physical server all I did was transfer over the few files that had changed since the backup run and then shut it down. I’ve been running via VM ever since. The old physical machine is sitting under a table, just in case I need hardware in an emergency :-)

Conclusion

Backups are no good if they can’t be used to restore your servers back to operational state. This means test your processes, including any steps necessary to get your backup/restore software working again.

Protect your backups, both from tampering and from being leaked.

Don’t expect everything to work first time; and don’t be surprised if restores take longer than you expect! (One off-site DR test my company ran ended up with the windows server restores taking 2 days when they only planned for 4 hours; good lesson to learn!)