Why won't MySQL use the best index in a join?

Someone recently asked me why the wrong index was being used for a JOIN, making the query run very slowly. We ran EXPLAIN and saw this abridged output:

explain select [columns] from m
   left join u on m.intcol = u.intcol and m.url = u.url
where u.url is null\G
*************************** 1. row ***************************
  select_type: SIMPLE
        table: m
         type: ALL
         rows: 2717
*************************** 2. row ***************************
  select_type: SIMPLE
        table: u
         type: ref
          key: idx_intcol
      key_len: 2
          ref: m.intcol
         rows: 64486
        Extra: Using where; Not exists

The column is a 2-byte unsigned integer. Here is the relevant part of the table definition:

  `intcol` smallint(5) unsigned NOT NULL,
  `url` varchar(760) CHARACTER SET latin1 COLLATE latin1_general_cs NOT NULL,
  UNIQUE KEY `url` (`url`,`intcol`),
  KEY `idx_intcol` (`intcol`),

As you can see, the url index looks like it should be a better index for the query, with two columns instead of one. The shortcut I often use to diagnose issues like this is EXPLAIN EXTENDED, followed by SHOW WARNINGS. The resulting warning message is often revealing:

select 1 AS `1` from... on...
  (`m`.`url` = convert(`u`.`url` using utf8))))...

Now it’s easy to see that the index can’t be used because of a character set mismatch. The ’m’ table indeed has a different character set and collation:

  `url` varchar(255) DEFAULT NULL

The easiest solution in this case was to change the ’m’ table’s definition, because it is a scratch table used as part of an import process.

I'm Baron Schwartz, the founder and CEO of VividCortex. I am the author of High Performance MySQL and lots of open-source software for performance analysis, monitoring, and system administration. I contribute to various database communities such as Oracle, PostgreSQL, Redis and MongoDB. More about me.