Xaprb

Stay curious!

Archive for the ‘mysql’ tag

Seeking input on a badness score for query execution

with 7 comments

Suppose that you’re writing a new Maatkit tool (just a random example, really) and its job is to measure the difference in execution of queries. The simplest metric is execution time.

Now suppose that you’re trying to figure out a metric of badness. The query executes in a second on machine 1 and 1000 seconds on machine 2.  That’s a pretty bad change.  How do you quantify this?

Now you’ve got a query that executes in 1ms on machine 1, and 10ms on machine 2.  It’s a tenfold change.  Is it a bad change?  Maybe it’s just the difference in which files were cached in memory, or network latency because someone flooded the TCP pipe and the packets had to be backed off and retried, or something like that.  Is this significant?  How should it contribute to the badness score?

Let’s think of another example too.  Later in this mythical tool’s life, we’ll be examining EXPLAIN and looking at row estimates.  There are important differences between estimates of 1 row, 2 rows, 20 rows and 2000 rows.  But from 2 to 3 rows, or from 100 to 125 rows — is that a significant change? How should it contribute to the badness score?  What about this: a full table scan vs. an index scan, how should that contribute?

The general problem that I’m gesturing at here is a kind of generic badness score, which can be an accumulation of lots (dozens) of factors.  From my thinking on it so far, it’s a very complex problem, because you want to avoid false positives, you want to really capture a badness score in a way that’s quantifiable and sortable (this query is badder than that one, all things considered) and you don’t want to miss small things in the noise (these queries are the same in 23 of the 24 metrics, and that 24th metric is enough to trigger the alarm).

The other thing that’s worked its way into my small brain is this: it’s got to be a solved problem (unless it’s really intractable).  But I can’t think of the right combination of things to point me to the right Computer Science literature.

Help?

Written by Xaprb

June 26th, 2009 at 12:06 pm

Posted in Maatkit, SQL

Tagged with

MyISAM quote of the day

with 6 comments

Seen in #maatkit on Freenode:

I never realized just how terrible recovering MyISAM from a crash can be

Sad but true — it can be pretty painful. This is one of the reasons I pretty much recommend InnoDB (okay, okay, XtraDB) for most data unless it’s read-only.

Written by Xaprb

June 18th, 2009 at 5:03 pm

Posted in SQL

Tagged with , , ,

Extended covering indexes

with 7 comments

As you can probably guess, I’m catching up on reading my blogs. I’ve just read with interest about TokuDB’s multiple clustering indexes. It’s kind of an obvious thought, once someone has pointed it out to you. I’ve only been around products that insist there can be Only One clustered index (and then there’s ScaleDB, who say “think differently already”).

Anyway, we already know that there are quite a few database products that use clustered indexes and to avoid update overhead, require every non-clustered index to store the clustered key as the “pointer” for row lookups. Thus there are “hidden columns” which are present at the leaf nodes, but not the non-leaf nodes, of secondary indexes. Why not take that idea and run with it a little? Here’s what I mean:

create table test (
  a int,
  b int,
  c int,
  primary key(a),
  key(b) plus(c)
);

This would index column b, which because of the clustered primary key would contain column a at the leaf nodes; and additionally we’ve requested for it to store column c. And then we would be able to do this:

explain select c from test where b = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test
         type: ref
possible_keys: b
          key: b
      key_len: 5
          ref: const
         rows: 1
        Extra: Using index

The “Using index” is the key to note there. (Yes, I invented that EXPLAIN result; it is not possible to get with current MySQL and current storage engines.) This strikes me as an improvement over TokuDB, which apparently says you can have all or none. Why not let people say which columns they want?

Written by Xaprb

June 7th, 2009 at 3:15 pm