Archive for the ‘explain’ tag
I’ll be attending the 2008 MySQL Conference and Expo again this year, and I’m looking forward to hearing some great sessions, meeting new and old friends, and giving sessions myself. As a proposal reviewer, I looked at and voted on 250+ proposals for sessions and tutorials for this conference. There are going to be some great sessions and tutorials.
If you haven’t come to the conference previously, it’s well worth your time and money, in my opinion.
I (Baron Schwartz) am giving two sessions myself, on extremely practical topics. One is the query cache, and the other is EXPLAIN. Both are the subject of many myths and misunderstandings! My goal is to remove all the programmer-speak and show you how they really work. Once you understand that, you can understand the technical terminology. (But it’s very hard to go the other direction).
I haven’t decided yet which sessions I want to attend, but I know this: I’m not going to miss seeing how Beat Vontobel solves a Su Doku puzzle with only self-joins. His session on views last year was just amazing.
Hopefully there’ll be plenty of time to sit down for meals and chats with all the people I correspond with throughout the year, but rarely get to see or talk to!
1And no, I don’t get any kickback for saying nice things about the conference. Even reviewing all those proposals was a volunteer job. And Jay Pipes tricked me into it, the rat! He told me it would be only a few hours. Haha, you can’t review 250 proposals in a few hours… I have to say though, some of them were really rewarding to read. One of them was about holding a cosmic prayer circle or something like that. Without expressing any opinion on my religion/spirituality, I did have to vote NO on that one — sorry, wrong conference.
If you’ve ever wished you could see MySQL’s EXPLAIN output formatted as a tree, now you can. MySQL Visual Explain transforms EXPLAIN output into a hierarchical view of the query plan, which is significantly easier to read and understand.
What it does
MySQL Visual Explain is a command-line tool, not a Graphical User Interface (GUI). You can use it two ways:
- Give it a query and some connection options, and it will connect and EXPLAIN the query, then show you the result as a tree.
- Give it the output of EXPLAIN in any of several formats, and it will parse it and turn it into a tree.
Here’s a simple example. Given the following query,
select actor_id, (select count(film_id) from sakila.film join sakila.film_actor using(film_id)) from sakila.actor;
You get this EXPLAIN output:
+----+-------------+------------+-------+----------------+--------------------+---------+---------------------+------+-------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+-------+----------------+--------------------+---------+---------------------+------+-------------+ | 1 | PRIMARY | actor | index | NULL | PRIMARY | 2 | NULL | 200 | Using index | | 2 | SUBQUERY | film | index | PRIMARY | idx_fk_language_id | 1 | NULL | 951 | Using index | | 2 | SUBQUERY | film_actor | ref | idx_fk_film_id | idx_fk_film_id | 2 | sakila.film.film_id | 2 | Using index | +----+-------------+------------+-------+----------------+--------------------+---------+---------------------+------+-------------+
MySQL Visual Explain turns this into the following query execution plan:
SUBQUERY +- JOIN | +- Index lookup | | key film_actor->idx_fk_film_id | | possible_keys idx_fk_film_id | | key_len 2 | | ref sakila.film.film_id | | rows 2 | +- Index scan | key film->idx_fk_language_id | possible_keys PRIMARY | key_len 1 | rows 951 +- Index scan key actor->PRIMARY key_len 2 rows 200
You should read this as a depth-first tree traversal. In other words, the root of the tree is the output node — the last thing that happens in query execution.
As I said, this is a simple example. When your queries have many subqueries and/or UNIONs, you quickly get much more complicated EXPLAIN output, which is very hard to understand. It’s much easier to read the tree representation for complex queries. Here are thumbnails of the query execution plans of some real queries from a project I’ve worked on:
The corresponding EXPLAIN output is very hard to understand, even though I’ve become an expert on EXPLAIN. I can understand the tree view without trouble. I don’t think it matters how much of an expert I am, a tree view is always going to be easier to understand.
How it works
MySQL Visual Explain tries to reverse-engineer EXPLAIN by re-ordering and dividing the input into parent/child relationships. How exactly it happens is complex, though it can be expressed in code fairly succinctly. I wanted to write this tool nearly a year ago, but after studying EXPLAIN for a while, I found I just didn’t understand it well enough. Then this spring at the MySQL Conference and Expo 2007, I saw a talk on the Query Optimizer by Timour Katchaounov, who works on the optimizer team. Timour helped me grasp how MySQL executes queries. Most importantly, he showed a diagram of the execution plan as a left-deep tree. I always assumed the query execution plan was a bushy tree, but once I saw the left-deep tree I understood more about how MySQL works (for example, I now understand why MySQL doesn’t support
FULL OUTER JOIN).
The crucial bridge between that and understanding EXPLAIN was a slide that showed the nodes of the tree mapped to rows in EXPLAIN. This made me see how to approach the problem. After spending hours reading the MySQL manual and source code, and studying many examples, I slowly understood how to go backwards from EXPLAIN to a tree.
It is not as simple as it sounds! There are many details, such as how to decide which nodes should be children of which other nodes (when there are no subqueries or UNIONs, of course it’s trivial). EXPLAIN’s output can be quite complex, and a moderately complicated query takes me a few minutes to reverse-engineer by hand — and most of the time I get it wrong. Maybe I can write the details in another blog post, or contribute them to MySQL’s documentation or the MySQL Forge wiki.
If you find bugs
I couldn’t have written this tool without unit tests, especially since I had to start over twice when I found I was misunderstanding something major (that’s a huge plug for test-driven development). If you find a query it transforms wrong, please report it via the project’s Sourceforge bug tracking system. Please give the query and EXPLAIN output, so I can add it to the test suite.
MySQL themselves, and many community members, have sometimes discussed the need for a tree view of EXPLAIN. I sincerely hope they implement that feature and make this little tool obsolete in future versions of MySQL. I also understand MySQL is trying to add more information to EXPLAIN. Currently it’s not possible to get a complete query execution plan from EXPLAIN, because it doesn’t show you everything the server does while executing the query. If MySQL adds information, I’ll update this tool. I have a feature request pending to show when a GROUP BY happens, for example.
I also specifically wrote this tool to be useful as a module, not just a command-line utility. This makes it possible for you to use the module in your own programs. I have a few ideas for this myself, though I may not get time to implement them.
Your support appreciated
Though the end result makes it look easy, this was a seriously hard project that took many evenings and weekends of research, testing, and coding, with many false starts. If you feel inclined, there’s a “support” link in the navigation bar at the top of this page! (For those of you who tried to send me something and it got returned, I’ve fixed that issue).
Is your MySQL server doing an extra index scan on queries that need to check a key for matches or NULL? It’s easy for this to happen accidentally, but it’s also easy to fix, especially in MySQL 5.0 and up. Here’s how.
If you read the manual page for EXPLAIN, you’ll see the ref_or_null “join type” (I think “access type” is really a better term) mentioned. If you see this in EXPLAIN, it means MySQL has to search the index for matches, then search again for NULLs. If there are no NULLs in that column, and MySQL knew that, it could avoid the extra search.
You might see this in subqueries or when you use elaborate JOIN clauses, or even when you use a simple WHERE clause. For example, here’s a query that will do a ref_or_null access plan on the Sakila sample database:
explain select * from sakila.film where original_language_id = 3 or original_language_id is null\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: film type: ref_or_null possible_keys: idx_fk_original_language_id key: idx_fk_original_language_id key_len: 2 ref: const rows: 476 Extra: Using where
Why does this happen? Simple: the indexed column is defined as NULLable. Here’s a query you can use to find all such columns:
select c.table_schema, c.table_name, c.column_name from information_schema.columns as c inner join information_schema.key_column_usage using(table_schema, table_name, column_name) where c.table_schema = 'sakila' and c.is_nullable = 'yes'; +--------------+------------+----------------------+ | table_schema | table_name | column_name | +--------------+------------+----------------------+ | sakila | film | original_language_id | | sakila | payment | rental_id | +--------------+------------+----------------------+
If the column shouldn’t be allowed to be NULL, make sure you specify that in the column’s options! In fact, the original_language_id column probably should be defined as NULLable, but I commonly see columns defined as NULLable when they shouldn’t be. The performance penalty isn’t the end of the world, but it’s still worth fixing.