In an application such as a database server, instrumentation is like sex: it’s not enough to know how often things happen. You also care about how long they took, and in many cases you want to know how big they were.


“Things” are the things you want to optimize. Want to optimize queries? Then you need to know what activities that query causes. Most systems have at least some of this kind of instrumentation. If you look around at… let’s not pick on the usual targets… oh, say Sphinx, Redis, and memcached. What metrics do they provide? They provide counters that say how often various things happened. (Most of these systems provide very few and coarse-grained counters.) That’s not very helpful. So I read from disk N times, and I read from memory N times, and I compared rows N times… so what? I still don’t know anything relevant to execution time.

That’s why we need to measure how long things took. It’d be great if, for every Handler_X counter variable in MySQL’s SHOW STATUS, we also had a Handler_X_time in microseconds. True, better instrumentation could be designed, but that’d be a huge step forward already. Instead of guessing at the significance of Sort_rows, we could look at Sort_rows_time and see if a lot of time is being consumed sorting rows!

Next, we need to know how big things are. This is a rather generic term, but in a lot of cases, the size, difficulty, or some other metric of an operation is important. “I wrote a message to the network socket” is okay; “I wrote and it took N microseconds” is better; and “I wrote N bytes and it consumed M microseconds” is best. Made a temp table on disk? Nice – how big was it? Mine’s smaller than yours!

The final way that instrumentation is like sex: more is better, to an extent. You can get too much, but how often has that happened to you?

Done! Now Read These: