Archive for February, 2011
New Aspersa I/O analysis tool, diskstats
I’ve just committed some changes to diskstats, an I/O analysis tool in Aspersa that’s actually been in the Subversion repository for a long time, but in a barely usable fashion and with no documentation. Now it’s usable and documented.
It is basically a reimplementation of iostat in awk. Why on earth would I reinvent that wheel? Because I spend a lot of time gathering and analyzing raw data from /proc/diskstats, which is vital to really understanding what the storage subsystem is doing. The iostat tool hides important details. Seeing that detail has immediately solved many a disk performance problem and proven SAN vendors wrong, for instance. (I used to do this the old-fashioned way.) Disk performance, of course, is one of the most important things to analyze in a database server that’s struggling.
Also, iostat isn’t interactive, and I wanted an interactive, menu-driven tool to quickly slice and dice the data and drill down into what is happening with I/O. The data it accepts is in the same format as that stored by the stalk and collect tools, which is my default post-mortem toolset. And finally — and I know this might be hard to believe — I’ve been asked to fix problems many times on systems that don’t have iostat and I am not allowed to install it.
And wouldn’t you know it, as I wrote the user’s manual I found a bug, after all my ranting about how other tools show I/O stats wrong. I don’t have time to diagnose or fix the bug right now, so maybe someone else can contribute that. There is a test suite (remind me to explain sometime how I make Bash scripts highly testable) so if we find the problem and fix it, it’ll stay fixed. Contribute your fix to the bug report :-)
MySQL and PostgreSQL cross-links
A couple of extremely informative recent blog posts have gone to either Planet MySQL or Planet PostgreSQL, but not both, and I think everyone on both aggregators who cares about database internals should be interested in them. Here they are:
- Robert Haas: MySQL vs. PostgreSQL, Part 2: VACUUM vs. Purge
- Ewen Fortune: How InnoDB handles REDO logging
Happy cross-pollination.
Keeping docs and program options in sync
One of my pet peeves is when documentation is wrong. Another pet peeve is keeping documentation right. Crack open a source tarball for many programs and you’ll see a chunk of text that gets printed out when you use the –help option, and elsewhere in the program’s source code you’ll see the definitions of the command-line options. Maintaining a program like this is miserable. Using it is bad, too. I can name a lot of programs that say one thing and do another.
For Maatkit, we solved this problem by making the tool read its own source code and generate command-line options, default values, behaviors, dependencies, data types, and so on directly from its own embedded documentation. This is the same documentation that gets converted into man pages. So when you run the program, view its documentation, ask it for –help, or whatever you do, you get the same information. The documentation is part of the program, and if you change the documentation, you change the program.
For a while I was very unhappy with using Perl to reach outside the boundaries of Perl. It turns out that executing another program, capturing its output, controlling it, capturing its return code, etc is very buggy. So I started to write scripts that need this capability in bash, because it is obviously very good at these tasks. But it’s a bit harder to handle command-line options in bash, and the tools available for it differ or are unavailable on various platforms. So I ended up with usage information in a block of text, and program options defined in program code. Yuck!
I fixed that recently. I wrote a short script that reads the usage text and generates code to implement the options, including default values and options that are constrained to certain valid inputs. Life is good again.





