Not if, but when
As a MySQL consultant, I spend a lot of time talking with people about their backups. More specifically, we talk a lot about recovery. I had an interesting incident myself, which illustrates some of the things that are bound to happen as time passes.
First, let me explain how I do my personal backups. I have a series of external hard drives, which are fully encrypted, as is my computer’s hard drive. I maintain a full mirror of my computer’s hard drive on these external hard drives. I occasionally switch the hard drives out, and carry one or more of them to a bank’s safe deposit box. I try to do this once a week, but sometimes it isn’t quite that often.
As a result, I have one hard drive located physically near my computer, which contains a very recent backup of all my work. I have at least one, usually 3 or more, other copies of my data in a slightly less fresh format, but durably stored in a bank.
While setting up a new computer recently, I somehow corrupted a GPG-encrypted file that I use quite often, and update frequently. (Perhaps a quantum bit flip or a solar flare — I don’t use ECC server-grade RAM, so this is actually possible/likely). As a result, I needed to get my most recent backup. I plugged in my external hard drive, and the drive physically failed. I spent some time doing diagnostics, and concluded that the drive really had failed. This reminded me that I had another hard drive, which I had set aside on a shelf couple of weeks ago, because it had also apparently failed. I pulled this drive off the shelf and ran diagnostics on it. It was also bad.
So I had lost my most recent copy of my file, as well as my most recent backup of it. it I could go to the bank and retrieve my previous backup of it, but that was a couple of weeks old, and I knew there were some changes that were not in that copy.
The happy ending to this story is that the corruption was only in the tail of the file, actually only in the last couple of bytes. I was able to decrypt everything except the last block or so, and then I retrieved that portion from my old backup. So in the end, I did not lose any data, but it was an interesting exercise.
The most interesting thing about this is the probability of several failures happening together. I think it is a natural human tendency to underestimate the probability of several different kinds of failures, or even several identical failures, happening at the same time. It quite commonly happens that hard drives fail at the same time, and we know that backups fail, and we know that files are corrupted or deleted, and it’s not a matter of if, but when these things happen together. This is why I have several copies of my backups in different places.
I’m still glad that I do backups the way that I do, keeping my own backups instead of relying on some online backup provider. I have heard many horror stories about them, and witnessed a few myself. I do not trust anyone else with my backups.
Further Reading:






“Anything to deposit, sir?”
“Just my ~/.xfce settings. Took me 5 days to reach a fully working desktop!”
Seriously, I’m impressed that you’re depositing a disk at the bank. The above joke is not completely unreal. We sometimes underestimate the time it takes us to do the silliest desktop configurations and installations. How much is that time worth?
Shlomi Noach
9 Jan 12 at 12:42 pm
Regarding failing online backups, are aware of Tarsnap, a backup system designed by FreeBSD security officer Colin Percival?
smyru
9 Jan 12 at 4:22 pm
It’s a good blog post but actually my key takeaway from this post is that Baron is a crypto geek: You have your disk encrypted and then inside it a file encrypted with gpg – then you put that in a vault in a bank.
Impressive :-)
PS: I always disagree with your blog engine whether a lion is a cat or not. I bet a spam bot would get it more right than I do…
Henrik Ingo
9 Jan 12 at 5:04 pm
I don’t get much spam :)
Xaprb
9 Jan 12 at 8:35 pm
No. My backup system is rsync. I find it comforting that it’s simple. I used to use rdiff-backup but the complexity of having versioned diffs is scary. Simpler = more likely to be recoverable.
Xaprb
9 Jan 12 at 8:37 pm
I take a somewhat different approach. My laptop’s hard drive is also encrypted. I keep a full-disk byte-for-byte dump of the relevant partition on a drive at work, refreshed perhaps every 4 months. (It takes a while!) My go-to backup is an encrypted drive with incremental snapshots using hardlinks for deduplication. Only changed files are copied. I’m still considering whether or not this is a bad idea. It does mean that I have a time-series of snapshots of my system, and I have successfully recovered from foul-ups by chrooting or even syncing an entire snapshot onto my (broken) system.
Tim McCormack
9 Jan 12 at 8:54 pm
Oh, and here’s the homebrew backup I use, in case you think it isn’t entirely an insane idea:
http://www.brainonfire.net/code/backup/
Tim McCormack
9 Jan 12 at 8:58 pm
For the “very useful data” I find myself very confortable in using a raid 1 system, and once in a while I connect a spare disk and do a full resync… 4 gigs of data is less than 4 minutes… and in case of trouble “just attach”…
(Yes it would be smart to have a remote rsync of those data to…)
In addition the usefull goes with the not so useful data is rsync’ed in a nas…
/* Foolish nerd mode on */
Storing disk in a bank, uau…
I live outside the city so i have some valley nearby, and hill’s to… so I thought to use those cheap 5ghz wifi link (who said nano station ???) to make a remote nas… 20 km can be easily attained…
I prefer to have my disk in a shelter than in a bank btw… It will be so funny to be in desperate need of the disk on SUNDAY!!!
Roy
10 Jan 12 at 6:37 pm
My dad solves the remote backup problem with an external drive in a plastic bag inside a 5-gallon plastic bucket sunken in the yard with a water-tight lid and a drip-cover. If the house burns down, this thing is safe.
Tim McCormack
10 Jan 12 at 9:53 pm
You mean I’m not the only person who had 3 backups fail at the same time?? You brought back bad memories.
Joe Devon
19 Jan 12 at 4:42 am