So first: I finally broke down and got a SSD for my MBP. A Mercury Extreme Pro, 120GB, $250. Worth every penny (they’re not paying me). If you are a developer who depends on your machine to make money, you absolutely must go buy an SSD today. Read the articles, do the research on wear-leveling for your OS, but seriously it’s today’s equivalent of compiling code by hand. We have machines for that.
I originally took some video footage with my old machine, because I thought it would be cool to piece together a side-by-side before and after video of boot times and launching those apps that take forever, like Evernote and Dropbox (I love Evernote to death, but it has the worst startup performance of any app ever, on every platform). I knew that performance was bad, and it was impacting my productivity. But I had no idea how bad.
My Dropbox spun up (no actual changes, 75GB Dropbox, production build) in fifteen minutes. Every time I logged in. Opening my 14GB Evernote notebooks cost me a full five minutes. Add in the time to do a reboot, and we’re looking at 25-30 minutes at least once a day every day where I am running at much less than full productivity. I had no idea it was that bad. It’s so bad that it doesn’t even make any sense to edit that video and publish it, because it would just be the machine on the left being ready in ~60 seconds, followed by 25 minutes of waiting. It’s so bad that even the Internet’s morbid curiosity isn’t enough to overcome the total and absolute awfulness of it.
Unfortunately I put off the SSD thing for awhile because I have more than one computer and I wanted to do all of them at once but I really wanted to replace one of the computers rather than put an SSD into it and then replace it six months later with a six-month-old SSD. Well, I was wrong. Don’t make excuses. Get a freaking SSD. I save at least thirty minutes a day in actual measurable productivity, not even counting the gains you can’t measure because my code compiles twelve times faster and I keep 10x more variables in my head during the code->compile->debug cycle.
One thing I should say about the OWC – performance is great even at full capacity. I wish I had gotten a larger one though. DO NOT try and play the game of “maybe I can slim down my system and save $200”, because now that I’ve tried that game I’m going to get the next level up in a few months once I get sick of it.
Now of course replacing a hard disk is an exercise in stress-testing your backups. And if you use Time Machine, like one of the other 17,000 people who have had this problem, you will eventually hit a bad (corrupted) sparsebundle. I know, I know. Everybody should test restores from their backups once in awhile. But nobody does. After spending 4-5 hours struggling to repair a bad backup (backup was stored on a redundant RAID, no less…), I’ve decided I’m actually serious and I’m going to start automating backup verification so this never ever happens again.
Below is my little script which automatically verifies a sparsebundle:
#!/bin/sh if [ "$#" -eq 2 ]; then { OUTPUT=`echo -n "$2" | hdiutil attach -stdinpass -nomount "$1" | grep Apple_HFS | cut -f1` } else { OUTPUT=`hdiutil attach -nomount "$1" | grep Apple_HFS | cut -f1` } fi if [ $? != 0 ]; then { echo "ERROR: COULD NOT ATTACH SPARSEBUNDLE" echo $CMD exit 1 } fi fsck_hfs -fy $OUTPUT if [ $? != 0 ]; then { echo "ERROR: COULD NOT REPAIR SPARSEBUNDLE" echo $CMD exit 1 } fi hdiutil detach $OUTPUT
You can use it like
bash repair_sparsebundle.sh /path/to/sparsebundle
or
bash repair_sparsebundle.sh /path/to/sparsebundle mountpassword
Then I have this little helper script:
#!/bin/sh mkdir /Volumes/afpmountpoint DROBO_URL="afp://username:password@time-machine-server.local/sharename" LOOKSIE_PATH="/machinename.sparsebundle" HOME_DIR_PATH="/Backups.backupdb/sauron/Latest/Users/username/username.sparsebundle" read -s -p "Enter HOMEFOLDER password: " HOME_PW echo "Mounting afp share" mount_afp -i $DROBO_URL /Volumes/afpmountpoint if [ $? != 0 ]; then { echo "ERROR: COULD NOT MOUNT NETWORK SHARE" echo $CMD exit 1 } fi bash repair_sparsebundle.sh "/Volumes/afpmountpoint/$LOOKSIE_PATH" if [ $? != 0 ]; then { echo "ERROR: COULD NOT REPAIR $LOOKSIE_PATH SPARSEBUNDLE" echo $CMD exit 1 } fi mkdir /Volumes/bundlemountpoint hdiutil mount "/Volumes/afpmountpoint/$LOOKSIE_PATH" -mountpoint "/Volumes/bundlemountpoint" if [ $? != 0 ]; then { echo "ERROR: COULD NOT MOUNT $LOOKSIE_PATH SPARSEBUNDLE" echo $CMD exit 1 } fi bash repair_sparsebundle.sh "/Volumes/bundlemountpoint/$HOME_DIR_PATH" "$HOME_PW" if [ $? != 0 ]; then { echo "ERROR: COULD NOT REPAIR $HOME_DIR_PATH SPARSEBUNDLE" echo $CMD exit 1 } fi mkdir /Volumes/homedirmountpoint echo -n $HOME_PW | hdiutil mount -stdinpass "/Volumes/bundlemountpoint/$HOME_DIR_PATH" -mountpoint /Volumes/homedirmountpoint srm ~/.bash_history if [ $? != 0 ]; then { echo "ERROR: COULD NOT MOUNT $HOME_DIR_PATH SPARSEBUNDLE" echo $CMD exit 1 } fi ls /Volumes/homedirmountpoint| cat cat /Volumes/homedirmountpoint/test.txt if [ $? != 0 ]; then { echo "ERROR: COULD NOT READ TEST.TXT" echo $CMD exit 1 } fi diskutil unmount /Volumes/homedirmountpoint diskutil unmount /Volumes/bundlemountpoint diskutil unmount /Volumes/afpmointpoint
Which, when run as root, will connect to your AFP server, mount the AFP, mount the timemachine bundle, repair it, and then mount your home directory FileVault sparsebundle inside the timemachine bundle and repair that. Run it once a week. Repairs your backups, confirms they work. Problem solved.
Except–why do they go bad in the first place? This is one of Time Machine’s many mysteries, but I spent some time thinking about it. One reason backups can go bad is because the disk you are backing up from is bad to begin with. Turns out, this explains 100% of the backup errors I repaired. Exactly those sparsebundles that had errors were backups of disks that, as I later discovered, had filesystem (fsck) errors. And there’s a certain amount of logic–backing up a bad disk could conceivably give you a bad backup, although perhaps not bad in an identically-behaving way.
So, of course, I extended my witch hunt to repairing all my local drives. Of course, to actually fix the errors in your boot volume you have to boot from another disk, and to check for or fix errors in your FileVault disk image you have to be logged in as another user. Turns out, all my boot volumes were borked and all my FileVault home folders were borked. Guess over the years I’ve done one too many hard-shutdowns. I am warning you, for all that is good in this world, check your backups, run fsck on your local drives, check your FileVault images. Seriously. Do it now.
Unexpected side effect of repairing everything: everything is faster, even on the non-ssd machine. Dropbox is down to 7 minutes instead of fifteen. No, it’s not quite the 30 seconds of my SSD, but it’s still a very big deal. I doubled my boot performance for free. Seriously, go repair all your drives.
One last thing I discovered when playing with all of this. It’s well-known that you can’t back up a FileVault user while that user is logged in, and OSX automatically tries to do the backup when you log out. With a big sparsebundle, this leaves your machine unusable, sometimes for a day or two on the initial backup.
But what you may not know (I didn’t), is that you can back up your FileVault stuff if you are logged out even if another user is logged in. Just log in as the other user and hit “Start backup” and your (now dismounted) FileVault folder will get backed up. So the machine is still useable. Just make two accounts. A little bit of pain to maintain them, but it works in a pinch when you want to continue the backup but diagnose some critical alert on a production box at 3am.
Of course, this led me to another idea. Why back up your home folder offline at all? Just create a second encrypted sparsebundle, rsync your home folder to it, dismount it, and backup that. No need to ever log out. Of course, this requires double the disk space on your local machine (sorry SSD…) and if the NSA knows you stored the same encrypted data twice perhaps they can cook up some attack that takes only a billion trillion years, instead of a trillion trillion years. But this line of inquiry seems worthy of further study.
So the moral of the story is:
Comments are closed.
I still can’t believe people are keeping their stuff logged out just to do a backup. I don’t understand it. I’d expect more from a EFI’ed GUID’d fs that supports an LVM,
Why don’t they just LVM their home directories like the rest of the UNIX world?
I’m feeling a bit of vertigo seeing these sparsebundles as striped file based/block based backups. Why couldn’t they just do a block based backup to begin with?
Striped images are useful for reclaiming free space, as logically empty stripes are simply deleted. A monolithic sparse image would have to be compacted and shrunk in order for a “gap” in the image’s file system to result in a “gap” in the host file system. This kind of compacting operation is very expensive to do over the network as it requires rewriting large portions of the image file. Generally, moving files around in the image is more efficient in a sparse bundle because you can logically re-order and delete stripes without actually re-writing them. If the disk is local, this isn’t such a big deal, but over a network it makes a big difference.