Archive for the ‘Sys Admin’ Category
Beware of svctm in Linux’s iostat
I’ve been studying the source of iostat again and trying to understand whether all of its calculations I explained here are valid and correct. Two of the columns did not seem consistent to me. The await and svctm columns are supposed to measure the average time from beginning to end of requests including device queueing, and actual time to service the request on the device, respectively. But there’s really no instrumentation to support that distinction. The device statistics you can get from the kernel do not provide timing information about device queueing, only a) begin-to-end timing of completed requests and b) the time accumulated by requests that haven’t yet completed. I concluded that the await is correct, but the svctm cannot be.
I just looked at the sysstat website, and it has been updated recently to warn about this, too:
svctm
The average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more. This field will be removed in a future sysstat version.
A review of Web Operations by John Allspaw and Jesse Robbins
Web Operations. By John Allspaw and Jesse Robbins, O’Reilly 2010, with a chapter by myself. (Here’s a link to the publisher’s site).
I wrote a chapter for this book, and it’s now on shelves in bookstores near you. I got my dead-tree copy today and read everyone else’s contributions to it. It’s a good book. A group effort such as this one is necessarily going to have some differences in style and even overlapping content, but overall it works very well. It includes chapters from some really smart people, some of whom I was not previously familiar with. John and Jesse obviously have good connections. A lot of the folks are from Flickr.
Here are the highlights in my opinion.
- Theo Schlossnagle, who has a place on my list of essential books, opens things with an overview of what web operations really is, and why it’s hard. Don’t skip this. Theo’s introduction is concise and thoughtful.
- Eric Ries discusses the benefits of continuous deployment. He is right on the money. Right out of college I spent 3 years as a developer at a company with very little engineering discipline, and then left for another company built by a small ace team practicing extreme programming. Eric nails the benefits of continuous deployment — he really gets it. I hadn’t heard of Eric before, but now I’ve subscribed to his blog.
- John Allspaw (whose book on capacity planning is also on my list of essentials) and Richard Cook discuss how complex systems fail. This chapter appeared in part as a whitepaper and blog post on John’s blog, and is expanded in this book. I have spent a lot of time examining failures for clients, and as VP of Consulting, also a lot of time examining Percona’s own mistakes. I fully agree with the conclusions in this chapter. A few key points: there is never a single root cause; our desire to find one blinds us and keeps us from learning; true failures are inherently unpredictable and happen only when a series of things fails; avoiding failure requires experience with failure. This echoes another book I’ve read recently, The Black Swan.
- Brian Moon’s chapter on unexpected traffic spikes. If you get a chance to hear Brian speak, take it. He’s an engaging guy with interesting and relevant stories to tell. Stories are always a better experience than bullet points.
- Jake Loomis’s chapter on postmortems. My own research into prevention of emergencies agrees almost perfectly with his list of things to do on page 225. Read this chapter carefully! Now, knowing how to put this into action is hard — very hard — but at least you’ll have a place to start. The worst compliment I ever got after fixing a system that’d run out of hard drive space (due to utter lack of basic monitoring) was that I’d “saved the day.” Baloney. Postmortems can be a great way to learn your infrastructure’s weaknesses and prevent emergencies in the future. I’m fully confident that this particular client will again deploy new servers without adding them into Nagios, and the results will be predictable.
- Naturally, my chapter about choosing a relational database architecture for web applications (skewed towards MySQL). There is a chapter on NoSQL databases by Eric Florenzano as well, but it is more introductionary-level.
What wasn’t so good? I didn’t get a lot of value out of John’s interview with Heather Champ, on community management and web operations. I did not think the interview format worked well in a book full of essays. But that might just be me. Also, a couple of places in two or three chapters felt a bit rant-ish without a lot of clear actionable advice; I think readers won’t get so much out of this.
Overall, though, this is a great book, badly needed, on a topic that is simply not yet recognized for its true importance. As Theo writes, we’re seeing the emergence of web operations as a very large profession; it’s one whose definition is not yet formalized or agreed-upon, but that’ll change. It’s too important not to. Jesse’s introduction repeats this sentiment: the world now relies on the web, and so the world relies also on the engineers who make it run. Web operations is work that matters.
How I keep track of notes
This is the follow-up to my post on how I keep track of tasks. It’s important for me to have a good system for keeping notes and other files organized. The problem usually turns out to be that I want them organized several different ways simultaneously: by date, by project, by person, by subject. Alas, if I keep them in files on a hard drive, I can only choose one such organizing strategy, because filesystems are a single hierarchy.
I choose to organize by date, simply because most of the time I need access to notes and files about things I’m working on now or recently. If I need to find files by project or subject, there’s a search feature in my file browser, and it works really well! So date-organization is good enough for me.
Inside my home directory, I have a directory per year, and inside that, a directory per month. If I write a note today, it goes into the $HOME/etc/2010/07/03/ directory. The filename starts with today’s date. That’s the simple organizing principle behind my note system. It also lets me eventually move things off my computer into permanent storage, so I don’t have to keep backing things up forever and carrying around infinite amounts of data. I keep the last couple of years; if I need access to notes or projects from 2006, I can go pull a hard drive off the shelf and pop it into my hard drive dock (buy one of those, and you’ll never get ripped off again by external drives with their own enclosures and power supplies).
I still need a quick way to create files and place them there, or move them there after I create them. For creating files, I use Vim. There is nothing better than a plain-text editor for me. My Vim settings are such that if I begin a line with a hyphen, Vim keeps nice indentation for me, making it easy to take notes in bulleted lists with proper indentation. If you’re on a call with me and you hear typing, I’m probably taking notes into Vim.
But it’s a pain to type out the full path to the file including the year, month, and date. So I created some helper scripts and put them into my $PATH. The most important are ‘t’ and ‘c’. ‘t’ simply uses Vim to edit a file. (It also creates any required directories, based on today’s date.) So if I am on a call with Joe, I just type ‘t joe’ into a terminal, and I’m editing /home/baron/etc/2010/07/03-joe.txt.
The ‘c’ tool cats the file’s contents. If I type ‘c joe’, it executes ‘cat /home/baron/etc/2010/07/03-joe.txt’. This makes it easy to grep, copy and paste, and so on.
There are a few more tools: the ‘m’ tool moves any file into the date-based hierarchy, so if I save a PDF of an order-confirmation page, for example, I can then ‘m’ it and it goes into its proper place. And I have a few tools to list files I created today, yesterday, this week, and this month.
I have a very important convention: when I’m taking notes and something becomes my responsibility to follow up on, I type TODO in the notes. After the call ends, I can grep for TODO in the file and quickly transfer the item into the task system I described in the post linked from above. This is how I can be confident that I’m not forgetting anything I’m supposed to do: I take notes and write it out as it happens, and then review the notes afterwards.
All told, this system kind of feels too simple to be a system. Everyone else seems to use complicated online gizmos named after groceries, or whizbang apps created by 37Signals, but I’ve found none of them to meet my needs, and just went back to basics. Basic is good. Basic works. Basic lets me concentrate on what I’m doing.
As I said in my previous post, part of this is based on the GTD book, which I read through a couple of times (with a year in between) and picked the parts that made sense to me. I think it’s a useful book to read, if you’re having trouble organizing yourself. I would just caution against spending all your energy getting organized — leave a little energy for actually doing your work!






