New Aspersa I/O analysis tool, diskstats
I’ve just committed some changes to diskstats, an I/O analysis tool in Aspersa that’s actually been in the Subversion repository for a long time, but in a barely usable fashion and with no documentation. Now it’s usable and documented.
It is basically a reimplementation of iostat in awk. Why on earth would I reinvent that wheel? Because I spend a lot of time gathering and analyzing raw data from /proc/diskstats, which is vital to really understanding what the storage subsystem is doing. The iostat tool hides important details. Seeing that detail has immediately solved many a disk performance problem and proven SAN vendors wrong, for instance. (I used to do this the old-fashioned way.) Disk performance, of course, is one of the most important things to analyze in a database server that’s struggling.
Also, iostat isn’t interactive, and I wanted an interactive, menu-driven tool to quickly slice and dice the data and drill down into what is happening with I/O. The data it accepts is in the same format as that stored by the stalk and collect tools, which is my default post-mortem toolset. And finally — and I know this might be hard to believe — I’ve been asked to fix problems many times on systems that don’t have iostat and I am not allowed to install it.
And wouldn’t you know it, as I wrote the user’s manual I found a bug, after all my ranting about how other tools show I/O stats wrong. I don’t have time to diagnose or fix the bug right now, so maybe someone else can contribute that. There is a test suite (remind me to explain sometime how I make Bash scripts highly testable) so if we find the problem and fix it, it’ll stay fixed. Contribute your fix to the bug report :-)



I used the aspersa tools to understand response times for reads vs writes. That data is invaluable for servers with RAID write caches which make writes very fast.
We then improved our monitoring to pull read and write latency directly from /prod/diskstats. Have you updated any of the monitoring scripts that you publish (for Ganglia)?
Thanks for publishing this.
Mark Callaghan
6 Feb 11 at 1:41 pm
One feature I’m missing is the ability to control reads/writes on a per-process basis, having that would make this the mother of all disk stat tools
Dan
7 Feb 11 at 11:03 am
Mark, I wrote some Cacti templates that pull from /proc/diskstats. I still haven’t even learned Ganglia yet. I need to someday.
Xaprb
7 Feb 11 at 1:03 pm
Mark,
I pushed the data from /proc/diskstats into Ganglia using MonAMI plugins (written in C). MonAMI is actually pretty cool and can probe data once and push it to multiple places (like Ganglia).
I also wrote MonAMI plugins for /proc/cpustat and for mysql.
The plugins are in the Proven Scaling SVN server. I can point you to them if you are interested.
Justin Swanhart
7 Feb 11 at 11:23 pm