Seeking input on a badness score for query execution
Suppose that you’re writing a new Maatkit tool (just a random example, really) and its job is to measure the difference in execution of queries. The simplest metric is execution time.
Now suppose that you’re trying to figure out a metric of badness. The query executes in a second on machine 1 and 1000 seconds on machine 2. That’s a pretty bad change. How do you quantify this?
Now you’ve got a query that executes in 1ms on machine 1, and 10ms on machine 2. It’s a tenfold change. Is it a bad change? Maybe it’s just the difference in which files were cached in memory, or network latency because someone flooded the TCP pipe and the packets had to be backed off and retried, or something like that. Is this significant? How should it contribute to the badness score?
Let’s think of another example too. Later in this mythical tool’s life, we’ll be examining EXPLAIN and looking at row estimates. There are important differences between estimates of 1 row, 2 rows, 20 rows and 2000 rows. But from 2 to 3 rows, or from 100 to 125 rows — is that a significant change? How should it contribute to the badness score? What about this: a full table scan vs. an index scan, how should that contribute?
The general problem that I’m gesturing at here is a kind of generic badness score, which can be an accumulation of lots (dozens) of factors. From my thinking on it so far, it’s a very complex problem, because you want to avoid false positives, you want to really capture a badness score in a way that’s quantifiable and sortable (this query is badder than that one, all things considered) and you don’t want to miss small things in the noise (these queries are the same in 23 of the 24 metrics, and that 24th metric is enough to trigger the alarm).
The other thing that’s worked its way into my small brain is this: it’s got to be a solved problem (unless it’s really intractable). But I can’t think of the right combination of things to point me to the right Computer Science literature.
Help?



In light of Michael Jackson’s death, I suggest you rethink your ‘badness’ title lest some think that’s a good thing.
Doug
26 Jun 09 at 1:36 pm
@Doug, Oh… that was bad.
Richard Ayotte
26 Jun 09 at 2:20 pm
I think a good start would be something like
table row returned/table row read * some multiplier if there is filesort or if the query is an index scan.
I deal with this issue a lot and I think the first thing one must recognize is that nothing happens in a vacuum. How long does it take the average query to in the processlist to run (yes, this will skew towards long running queries) and many instances of a particular query are there?
Robert Wultsch
26 Jun 09 at 3:03 pm
Baron, right now we tend to use some info from the queryplan as provided by the microslow patch. Mem/disk temp tables, mem/disk sorts, as well as # rows examined/returned. Rather than timing, it looks more at the flow of the query through the server, given the current dataset. It does of course also give time, so you can take it into account as well.
A “badness’ score could be a good extra for mk-query-digest.
Arjen Lentz
26 Jun 09 at 6:32 pm
It seems the reason why going from 100rows to 125rows (a 25% change) is not really likely to be significant is because it probably doesn’t cost any I/O. Leaving aside what’s possible, it would seem like you really want to measure the I/O cost of the query, and perhaps combine it with a CPU cost of the query. (Alternatively, have two separate badness measures.)
rich
26 Jun 09 at 8:17 pm
I think you want to get in touch with a statistician. They have tools to determine what constitutes a “significant” difference.
Tim McCormack
26 Jun 09 at 8:42 pm
Baron,
Joining Tim’s advice.
But also, back from my Math/CS studies, in order to give more weight to larger numbers, you usually use the log() function.
So for example, to see how far apart two numbers a,b are, you’d use something like:
log(|a|)*|a-b|/|a|
|a-b|/|a| gives you the percent in growth from a to b;
multiplying by log(|a|) gives very little weight to “small” numbers (whatever that means) and more weight to “large” numbers.
Shlomi Noach
28 Jun 09 at 1:29 am
Baron:
The advices on inspecting the number of rows returned are good. Since you’re asking for pointers in CS literature, I’d suggest anything in Big ‘O’ notation, since what you need is more of a theoretical and architecture agnostic way of determining how bad a query is.
Together with the number of rows, I’d compute the data type length of each column, this should give you a better idea of how much info the server will have to process and transfer back to get the results to the client.
I’d also take into account if columns can take NULLs or not, since this puts a performance penalty on MySQL, but you know a lot more about this than me :P
Either way, my ideal algorithm would use these information plus the output from explain, to take into account what stages MySQL goes through to bring you the data.
But maybe I divert.
Fernando Ipar
9 Jul 09 at 10:10 am