Xaprb

Stay curious!

Why won’t MySQL use the best index in a join?

with 3 comments

Someone recently asked me why the wrong index was being used for a JOIN, making the query run very slowly. We ran EXPLAIN and saw this abridged output:

explain select [columns] from m
   left join u on m.intcol = u.intcol and m.url = u.url
where u.url is null\G
*************************** 1. row ***************************
  select_type: SIMPLE
        table: m
         type: ALL
         rows: 2717
*************************** 2. row ***************************
  select_type: SIMPLE
        table: u
         type: ref
          key: idx_intcol
      key_len: 2
          ref: m.intcol
         rows: 64486
        Extra: Using where; Not exists

The column is a 2-byte unsigned integer. Here is the relevant part of the table definition:

CREATE TABLE `u` (
  `intcol` smallint(5) unsigned NOT NULL,
  `url` varchar(760) CHARACTER SET latin1 COLLATE latin1_general_cs NOT NULL,
  UNIQUE KEY `url` (`url`,`intcol`),
  KEY `idx_intcol` (`intcol`),
) ENGINE=InnoDB

As you can see, the url index looks like it should be a better index for the query, with two columns instead of one. The shortcut I often use to diagnose issues like this is EXPLAIN EXTENDED, followed by SHOW WARNINGS. The resulting warning message is often revealing:

select 1 AS `1` from... on...
  (`m`.`url` = convert(`u`.`url` using utf8))))...

Now it’s easy to see that the index can’t be used because of a character set mismatch. The ‘m’ table indeed has a different character set and collation:

CREATE TABLE `m` (
  `url` varchar(255) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8

The easiest solution in this case was to change the ‘m’ table’s definition, because it is a scratch table used as part of an import process.

Written by Xaprb

August 20th, 2012 at 11:06 am

Posted in SQL

3 Responses to 'Why won’t MySQL use the best index in a join?'

Subscribe to comments with RSS

  1. Nice article. Short, simple to understand and to the point. Thanks for this good advice

    Ole

    20 Aug 12 at 11:59 am

  2. Hi,

    one more issue can make idx_intcol more effective.
    the url-int index is very big one (each node need to contain a lot of chars (which i can assume that the beginning is quite the same “http://aaa.bbb.com/blablka”.
    this issue cause to a very “deep” index which require a lot of reading in the index tree.
    the small int is a much smaller index, and the needs for IS NULL (or not exist in the EXPLAIN) makes the search easier since it not need to read the actual data.

    i.e. if it will not find the int – the row matches.
    if it does, it will go to the rows which has the same in and check the url…

    Can you can post the EXPLAIN after the charset change?

    eplaut

    20 Aug 12 at 1:28 pm

  3. Eplaut,

    I know, this is not the ideal index, although in the real production use case it is fine and does not need to be optimized further; the cost and effort would not result in enough improvement to be worth it. The point of this post was simply to show readers how to figure out what the execution plan is doing in some cases.

    Xaprb

    20 Aug 12 at 2:17 pm

Leave a Reply