Comments on: MySQL profiling case study, part 2 http://www.xaprb.com/blog/2006/10/17/mysql-profiling-case-study-part-2/ Stay curious! Fri, 10 May 2013 18:25:19 +0000 hourly 1 http://wordpress.org/?v=3.5.1 By: asd123123123 http://www.xaprb.com/blog/2006/10/17/mysql-profiling-case-study-part-2/#comment-17041 asd123123123 Tue, 29 Sep 2009 13:17:02 +0000 http://www.xaprb.com/blog/?p=248#comment-17041 asd

]]>
By: Gisbert http://www.xaprb.com/blog/2006/10/17/mysql-profiling-case-study-part-2/#comment-2456 Gisbert Tue, 14 Nov 2006 17:01:54 +0000 http://www.xaprb.com/blog/?p=248#comment-2456 MySQL has now
acknowledged the strange preference for certain indexes as a bug.

]]>
By: Xaprb http://www.xaprb.com/blog/2006/10/17/mysql-profiling-case-study-part-2/#comment-2454 Xaprb Tue, 14 Nov 2006 16:25:19 +0000 http://www.xaprb.com/blog/?p=248#comment-2454 Hi Gisbert, thanks for writing in. You are right about the (day,ad) and (ad,day) change — I may not have called that difference out clearly. It was crucial to getting better performance. I’ve written a lot about that in the past.

MySQL will choose a full scan in certain cases even when another index would be more selective. The reason? It’s cheaper to scan the data itself than to probe and index and look up, probe and look up, on and on. With InnoDB, secondary indexes are quite expensive compared to the primary key, because once you find an entry in the secondary index you only have a tuple from the primary key, which you have to navigate to find the actual row.

The ratio is not exactly clear from the documentation, but is said to be about 20 to 30 percent. In other words, if you have 100,000 rows and an index will select 35,000 of them, it’s probably cheaper to just scan the whole table, and MySQL will do that instead of using the index. I should look in the source code to find out where that decision is made, and maybe make it clearer what the magic number is (it also depends on statistics, which are generated from 8 random b-tree dives and so can be wildly inaccurate sometimes).

]]>
By: Gisbert http://www.xaprb.com/blog/2006/10/17/mysql-profiling-case-study-part-2/#comment-2442 Gisbert Mon, 13 Nov 2006 11:19:25 +0000 http://www.xaprb.com/blog/?p=248#comment-2442 I think there are several issues to consider here. First off, note that
the indexes used in step 1 and in step 2 of your previous article are
subtly different, with important consequences:

The table as defined at the start of the article has (amongst others) an
index like so:


key ad (ad, day)

But the table in step 2 uses


primary key (day, ad)

Note the change of order of variables! This has grave consequences, as
the MySQL manual tells us: “MySQL cannot use a partial index if the columns do not form a leftmost prefix of the index.

Given that the WHERE clause restricts on day but not on ad, the index in
example 1 just cannot be used — period. In contrast, the index in example 2 could be used, since day is in fact a leftmost prefix of the primary key.

However, it will in fact not be used, as you mention in this article; my
own experience supports this observation. Why this is so is not clear to
me, either, because the manual explicitly says that BETWEEN clauses are
OK for using an index, and it also says that MySQL will “normally” use
the most selective index that finds the smallest number of rows. (In my
case, index client would select 36286 rows, but the index on (day,ad)
would select 24880, which is substantially less.)

Further experimentation with the paramaters of the query yields an
interesting result, though. Your original query used the WHERE clause


where client=11 and day between '2007-01-01' and '2007-01-31'

When I tighten the date range to


where client=11 and day between '2007-01-01' and '2007-01-21'

(note: only 21 instead of 31 days), I still get the same result: the
primary key will not be used. However, when I take a 20 day range


where client=11 and day between '2007-01-01' and '2007-01-20'

then MySQL suddenly starts to use the primary key, as described in your
previous article!

A closer look at the rows considered in these cases shows:

client:                   36286 rows considered
primary for 21 day range: 15954 rows considered
primary for 20 day range: 15305 rows considered

So, strangely, MySQL considers 15305 less than 36286 (sounds plausible
to me…), but 15954 greater or equal than 36286. Interesting!

(By the way, since the table is filled with some degree of randomness,
your mileage may vary. But the results should be in comparable ranges.)

What becomes apparent is that the result is heavily influenced by the
optimizer’s idea of the current key distribution. For this reason,
ANALYZE TABLE is always A Good Thing after many inserts and/or deletes
on MyISAM tables. (For InnoDB, this should not be necessary.)

Now, by specifying USE INDEX(primary) as suggested by you in this
article (FORCE INDEX is a bit stronger but in fact not needed here), you
make up for some, ahem, non-standard comparison algorithm used by the MySQL query optimizer. However, contrary to what the article suggests, this trick will work both on the original table (with the artificial
primary key on id) and the modified one.

So, the message is not just “define a good primary key” (although that
is always a good idea) or “bad primary keys can prevent use of good
other indexes”, but rather threefold:

  • In multiple-part indexes, take care of the order in which you specify
    the individual fields.
  • Have your MyISAM tables’ statistics up to date through ANALYZE TABLE.
  • Help the optimizer with explicit hints if it has problems with
    the order on natural numbers.
]]>