Archive for the ‘Coding’ Category
How to gather statistics at regular intervals
I gather a lot of statistics such as performance data. Sometimes I have multiple things going on a system and I want to be able to align and compare the resulting data from multiple processes later. That means they need to be aligned on time intervals. Here is a naive way to gather stats at intervals:
while sleep 1; do gather-some-stats; done
There are two problems: each iteration will take longer than a second, so there will be drift; and the iterations will not be aligned exactly on the clock ticks, so the data isn’t as easy to correlate with other samples. This becomes a bigger problem when there are many such jobs gathering data at longer intervals such as 15 seconds or 5 minutes, where the lack of correlation between samples can be frustrating.
Here is what I’ve been doing recently. Is there a better way?
INTERVAL=1
while true; do
sleep=$(date +%s.%N | awk "{print $INTERVAL - (\$1 % $INTERVAL)}")
sleep $sleep
gather-some-stats
done
Keeping docs and program options in sync
One of my pet peeves is when documentation is wrong. Another pet peeve is keeping documentation right. Crack open a source tarball for many programs and you’ll see a chunk of text that gets printed out when you use the –help option, and elsewhere in the program’s source code you’ll see the definitions of the command-line options. Maintaining a program like this is miserable. Using it is bad, too. I can name a lot of programs that say one thing and do another.
For Maatkit, we solved this problem by making the tool read its own source code and generate command-line options, default values, behaviors, dependencies, data types, and so on directly from its own embedded documentation. This is the same documentation that gets converted into man pages. So when you run the program, view its documentation, ask it for –help, or whatever you do, you get the same information. The documentation is part of the program, and if you change the documentation, you change the program.
For a while I was very unhappy with using Perl to reach outside the boundaries of Perl. It turns out that executing another program, capturing its output, controlling it, capturing its return code, etc is very buggy. So I started to write scripts that need this capability in bash, because it is obviously very good at these tasks. But it’s a bit harder to handle command-line options in bash, and the tools available for it differ or are unavailable on various platforms. So I ended up with usage information in a block of text, and program options defined in program code. Yuck!
I fixed that recently. I wrote a short script that reads the usage text and generates code to implement the options, including default values and options that are constrained to certain valid inputs. Life is good again.
All measurements are wrong
I had the privilege to meet Neil Gunther and listen to him speak this week at Surge. During his talk, he brought up the point that all measurements are wrong by definition. I thought I knew what he meant, but I was stuck with tunnel vision about floating-point precision and such. I had it all wrong. The real answer is obvious and simple.
The point is that the process of measuring, and therefore the answer that comes out of the measurement process, is imprecise. And further, that we need to treat a measurement as a measurement, not as the true value of whatever it is we tried to measure. So although we may say “the CPU was 70% utilized,” we should really be thinking “the measurements of CPU busy-time totaled 70% of the measurements of elapsed-time.” There’s more, but I won’t repeat his whole talk. You might enjoy his book.
Neil mentioned that this way of thinking isn’t foreign — we learn it in physical sciences. Indeed, I immediately remembered all my chemistry and physics labs, and mechanical engineering classes, and…. But that’s a whole education away now. Somehow between then and now, I educated myself to think that computers manipulate numbers, and the numbers are somehow mathematically pure.
When computers store and retrieve numbers, that’s often imprecise, and that is continually present in my mind — but that’s a whole different matter.





