Breaking the MBR on every hard disk

I was reminded of a backblaze article about SMART numbers. This nudged me to look up the stats on my drives to see if any numbers had budged.

Let’s collect the data for processing:

  for a in /dev/sd?
  do
    smartctl -a $a > $a
  done

Spot the error.

I ran the code. Did an “ls”… and didn’t see any output. I started to panic a little… I didn’t just do what I think I just did… did I?

That’s right! I just overwrote the MBR on ALL my disks with smartctl output. 8*4Tb, 4*2Tb, 2*512Gb SSD and a 1Tb external disk. All broke.

Oh fuckety fuckety fuckety fuck.

At this point I was running around biting my knuckles. My heart was racing.

I switched to decaf. Used the walk to the coffee machine to force some thinking time.

The machine was still working. It hadn’t noticed.

I started to calm down…

The largest of the output was 10K long. Fortunately I don’t put the raw disks into RAID arrays; I create one large partition on each disk and use that. 10K is 20-ish sectors. The first partition on every disk was 2048 or 4096 sectors in, except for the external disk which was 64 sectors in. Yes, I waste 2Mbyte of disk this way…

But that wasted space may have saved me. I began to think that my data was safe. All I needed to do was recreate the MBR.

For that I needed the partition data. It must be in the kernel somewhere, right?

/proc/partitions was particularly useless. It tells me what partitions were there and the size… but not where it started.

Ah, /sys/block/$disk/$partition has two entries; “start” and “size”.

sda/sda1/start:2048
sda/sda1/size:16775168

That looks useful! I think I can recover… No partition type, but I know that :-)

Let’s grab all that data just in case the machine does crash and I lose it. This is my best chance of recovery.

A vague memory… GPT data is stored near the end of the disk… on a GPT disk the MBR is basically just used to “protect” the contents from non-GPT aware apps.

Bleh, parted bitches and moans. gdisk? Ooh, nice! It told me the MBR was barfed, and the GPT failed checksum… but gave me the option to use the GPT anyway. I took it… and it gave me the correct values! SAVE!

I was able to recover the 4*2Tb and 8*4Tb disks this way.

The other 3 disks were too small so I hadn’t bothered with GPT; they were just MBR partitioned.

Now let’s see what we can do with sda…

sda/sda1/start:2048
sda/sda1/size:16775168

sda/sda3/start:16777216
sda/sda3/size:983438000

fdisk, units, create primary 1, start at 2048, +16775168. create primary 3… oh, can’t start at 16777216. Partition 1 is 1 sector too large. Need to reduce the size by 1. Remember also to do that for Partion 3. Set the types to FD (Autoraid), make partition 1 active.

Repeat that for the other SSD. Similarly for the 1Tb disk.

Run grub-install against the two SSDs; they’re my boot mirrors.

Now the moment of truth…

reboot!

And… relax.

All the md RAID devices assembled correctly. The machine booted cleanly, as if nothing had gone wrong.

Of course there’s still junk in the MBR of most of the disks

% sudo dd if=/dev/sdb bs=1c count=256 | hdump -16
00000000  73 6D 61 72 74 63 74 6C 20 35 2E 34 33 20 32 30   smartctl 5.43 20
00000010  31 32 2D 30 36 2D 33 30 20 72 33 35 37 33 20 5B   12-06-30 r3573 [
00000020  78 38 36 5F 36 34 2D 6C 69 6E 75 78 2D 32 2E 36   x86_64-linux-2.6
00000030  2E 33 32 2D 35 37 33 2E 31 32 2E 31 2E 65 6C 36   .32-573.12.1.el6
00000040  2E 78 38 36 5F 36 34 5D 20 28 6C 6F 63 61 6C 20   .x86_64] (local
00000050  62 75 69 6C 64 29 0A 43 6F 70 79 72 69 67 68 74   build).Copyright
00000060  20 28 43 29 20 32 30 30 32 2D 31 32 20 62 79 20    (C) 2002-12 by
00000070  42 72 75 63 65 20 41 6C 6C 65 6E 2C 20 68 74 74   Bruce Allen, htt
00000080  70 3A 2F 2F 73 6D 61 72 74 6D 6F 6E 74 6F 6F 6C   p://smartmontool
00000090  73 2E 73 6F 75 72 63 65 66 6F 72 67 65 2E 6E 65   s.sourceforge.ne
000000a0  74 0A 0A 3D 3D 3D 20 53 54 41 52 54 20 4F 46 20   t..=== START OF
000000b0  49 4E 46 4F 52 4D 41 54 49 4F 4E 20 53 45 43 54   INFORMATION SECT
000000c0  49 4F 4E 20 3D 3D 3D 0A 4D 6F 64 65 6C 20 46 61   ION ===.Model Fa
000000d0  6D 69 6C 79 3A 20 20 20 20 20 53 65 61 67 61 74   mily:     Seagat
000000e0  65 20 42 61 72 72 61 63 75 64 61 20 47 72 65 65   e Barracuda Gree
000000f0  6E 20 28 41 64 76 2E 20 46 6F 72 6D 61 74 29 0A   n (Adv. Format).

But the important part is the partition table… and that’s just fine :-)

000001a0  62 79 74 65 73 20 5B 32 2E 30 30 20 54 42 5D 0A   bytes [2.00 TB].
000001b0  53 65 63 74 6F 72 20 53 00 00 00 00 00 00 00 00   Sector S........
000001c0  02 00 EE FF FF FF 01 00 00 00 AF 88 E0 E8 00 00   ................
000001d0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
000001e0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
000001f0  00 00 00 00 00 00 00 00 00 00 00 00 00 00 55 AA   ..............U.

Lesson: stop doing shit as root. If I’d have done

sudo smartctl -a $a > $a

then the redirect would have failed and I would have saved myself a tonne of heartache.