A script snippet to relative-ize numbers embedded in text
A lot of times I’m looking at several time-series samples of numbers embedded in free-form text, and I want to know how the numbers change over time. For example, two samples of SHOW INNODB STATUS piped through grep wait might contain the following:
Mutex spin waits 0, rounds 143359179688, OS waits 634106844 RW-shared spins 1224152309, OS waits 38278807; RW-excl spins 2432166425, OS waits 35264871 Mutex spin waits 0, rounds 143386303439, OS waits 634292093 RW-shared spins 1224197048, OS waits 38281423; RW-excl spins 2432347936, OS waits 35271423
How much have the numbers changed in the second sample? My head is too lazy to do that math. So Daniel Nichter and I whipped up Yet Another Snippet to self-discover patterns of text and numbers, and compare each line against the previous line that matches the same pattern. Let’s fetch it:
wget http://maatkit.googlecode.com/svn/trunk/util/rel
Now give it the above input, and it’ll print out something useful (emphasis mine):
Mutex spin waits 0, rounds 143359179688, OS waits 634106844 RW-shared spins 1224152309, OS waits 38278807; RW-excl spins 2432166425, OS waits 35264871 Mutex spin waits 0, rounds 27123751, OS waits 185249 RW-shared spins 44739, OS waits 2616; RW-excl spins 181511, OS waits 6552
My lazy brain likes that much better.


Great idea!
One suggestion: Since the real output doesn’t have the bold emphasis to identify what it relativized, what about prefixing the new numbers with a plus sign? (Or, obviously, a minus sign if the difference is negative.)
Anyway, just a thought. Either way this’ll definitely come in handy for me on occasion.
Ben
2 Sep 09 at 4:29 am
Nice utility, but it does assume the numbers are positive integers. I tried using it for the output from “mysqladmin status” and quickly found that that “Queries per second” were treated as two different numbers, one on each side of the decimal point.
Mitch Wright
2 Sep 09 at 2:24 pm
I would also add to the script in this case an indication of the time interval between the two samples.
While that means the output changes, if you are looking at the 4 lines in isolation in your example, you don’t know if that’s a minute,hour,or day of processing.
Ronald Bradford
2 Sep 09 at 5:28 pm
@Ronald:
Interesting, maybe it could be made smart enough to recognize timestamps at the beginning of lines, and/or if the output of a command is being piped to it in real time it can time the arrival of each line.
That said, in the scenario described in the post, it’s almost certainly just reading from a static text file, so there’s no available source of timing information. And that’s probably be the most common use case for this tool.
Ben
2 Sep 09 at 6:24 pm
Mitch, right I had the same thought — we need to recognize floating-point numbers too.
Xaprb
3 Sep 09 at 3:51 pm
@Ben
SHOW INNODB STATUS includes a date/time stamp, so if comparing two complete files, and then looking at these subset of lines, a time comparison is possible (but not trival with date/time in human format).
I’ve modified all my logging these days for key scripts to always include epoch_secs for this exact reason.
Ronald Bradford
3 Sep 09 at 7:36 pm
[...] Schwartz, had a script snippet to relative-ize numbers embedded in text to [...]
Log Buffer #160: a Carnival of the Vanities for DBAs | Pythian Group Blog
4 Sep 09 at 12:59 pm