Xaprb

Stay curious!

How MySQL really executes a query

with 24 comments

WARNING: nearly every word of this post is intentionally false. This is an April Fool’s joke.

There is so much misinformation out there about how MySQL works. If you’ve talked to someone from PostgreSQL, you’ve surely heard some of it: “MySQL doesn’t even have transactions,” for example. And this from a PostgreSQL user, who uses a database that doesn’t even run on Windows.

But even within the ranks of people who supposedly know MySQL’s workings, lots of people just get things dead wrong. I wanted to set the record straight here, so I thought I would give a little walk-through of how MySQL executes a query.

Let’s see it in action from start to finish.

SELECT SQL_NO_CACHE COUNT(col1) FROM users
WHERE userid IN(
   SELECT userid FROM othertbl
)
ORDER BY userid DESC;

The steps to execute this query are as follows:

  1. The MySQL API sends the query to the server as a series of tokens.
  2. The query is parsed.
  3. MySQL checks privileges. This step can be really complex, because it might involve a call to an LDAP or Active Directory service. So I’ll gloss over this for right now.
  4. The query cache would normally be checked at this stage, but I’ve given the SQL_NO_CACHE instruction. So the query cache isn’t checked. (If it were, it might find a saved result from a query without SQL_NO_CACHE and return it, which would be against the instruction.)
  5. The optimizer executes the subquery and returns a list of userid values that can be substituted into the IN() expression.
  6. The optimizer looks at the COUNT() expression. As you know, all COUNT() queries can be executed instantly on MyISAM tables — an optimization that does not exist for InnoDB, which is really slow at COUNT(). Actually, InnoDB is just slower across the board than MyISAM, which is why you should always use MyISAM.
  7. The parsed and optimized query is compiled into a query plan: native byte code that can be run directly on the CPU.
  8. The optimized query plan is stored for later executions. If a similar query is issued later, the parsing and compilation can be skipped. This is stored in the query cache, oddly enough. (It’s just one of those funny things about the query cache, which as I say, doesn’t work as you expect.)
  9. The query is executed, possibly on many CPUs simultaneously. If this happens, the results from each CPU have to be combined into the final results, which is what the Sort_merge_passes status variable counts. (This is also why you want to configure the sort_buffer_size as large as you can make it; generally I suggest making it 75% of available memory, or even larger in some cases.)
  10. If there is no index on userid, then the ORDER BY can be skipped. But if there’s an index, MySQL’s row-level locking and MVCC (implemented at the server level, not in the storage engines) will force reading of results from the index. In that case, the result has to be put into a file, which is then sorted. This is called a filesort. You can see “Using filesort” in the EXPLAIN output if this happens.
  11. If the result is bigger than the server’s memory, the server also has to use a temporary table on disk. If it’ll have to do this, you’ll see “Using temporary; Using disk table” in EXPLAIN output.
  12. The results are returned to the client via a separate network connection (so that queries can be asynchronous).
  13. After the query finishes, it is logged into the binary logs, which are MyISAM’s transaction logs. If you have these logs, you can undo statements, so you don’t need backups. You can just use the binary logs to roll back the changes. That’s another advantage of MyISAM — it has binary logs.

That’s pretty much the whole thing. I almost posted the lines and file names of the code for each step. But I decided not to. You can research it yourself if you want. It’s actually pretty simple; the MySQL planner is really straightforward and modular, so you should be able to see exactly how all of the above steps work. Just read the code.

If you’re really curious, you can use the following query to test your knowledge of MySQL’s execution plan. Trace this one in your Visual Studio debugger and see how it works:

SELECT 0x417072696C20466F6F6C21;

That query doesn’t access any tables, so it has a special execution plan that should show you whether you understand what’s been explained above. Have fun!

WARNING: nearly every word of this post is intentionally false. This is an April Fool’s joke.

Written by Xaprb

April 1st, 2009 at 4:16 am

Posted in PostgreSQL

Tagged with ,

24 Responses to 'How MySQL really executes a query'

Subscribe to comments with RSS

  1. Good one! My brain started to twitch around #6. Then I took a quick look into ASCII table…

    Mchl

    1 Apr 09 at 4:49 am

  2. Hi! this is great info, really ;)

    BTW: your site does not render well on Eagle Eyes (http://www.smashingmagazine.com/2009/03/31/breaking-internet-explorer-81-eagle-eyes-leaked/)

    Roland Bouman

    1 Apr 09 at 5:37 am

  3. Some of this information is dead wrong!

    A single query cannot be run on multiple CPUs. I don’t have time to find the reference, but I am 100% sure about this, as I have been projecting making the changes required to run one query on multiple CPUs.

    Also the statement that one should always use MyISAM shows the incompetence of the author. MyISAM does not support transactions, which can be a really essential feature.

    David Andersen

    1 Apr 09 at 6:24 am

  4. I thought MySQL does not support LDAP. However, you say in step 3 MySQL can check privileges using an LDAP service. Can you please elaborate on that?

    och

    1 Apr 09 at 6:29 am

  5. I was about to say “WTF?” and actually re-read your post 3 times over, but then got it all. You are right.

    You and Giuseppe are really due to the guillotine.

    Shlomi Noach

    1 Apr 09 at 6:34 am

  6. April fool?

    och

    1 Apr 09 at 6:38 am

  7. @David Andersen,
    You’re right. What a piece of carelessly shallow information!
    The author fails to mention if the byte code produced by MySQL is little endian or big endian, and why this makes queries much faster on a Mac. I also saw a reference to it in the manual, but I can’t remember the page.
    Giuseppe

    Giuseppe Maxia

    1 Apr 09 at 6:58 am

  8. #v+
    The optimized query plan is stored for later executions
    #v-
    hehe I wanted to ask .. then
    SELECT 0x417072696C20466F6F6C21;
    :-)

    erkules

    1 Apr 09 at 7:00 am

  9. @Giuseppe:

    I think the incompetence of the author is the matter at hand. I’ll be sure to put it in the manual in bytecode, for future reference.

    Shlomi Noach

    1 Apr 09 at 7:10 am

  10. “As you know, all COUNT() queries can be executed instantly on MyISAM tables”

    This is clearly not true on several levels:

    1. Nothing can be executed “instantly” – even if it’s constant-time, it’s not “instantly”
    2. MyISAM still needs to count the rows if there is a WHERE clause – the optimisation only exists for COUNT(*) without a WHERE.

    COUNT(*) without a WHERE is really an edge-case – real-world systems rarely need to execute such queries, even if they’re superficially useful to DBAs.

    Mark R

    1 Apr 09 at 7:20 am

  11. Muhahah, I almost thought you were serious for a minute

    Mark R

    1 Apr 09 at 7:23 am

  12. Let’s write a 1st of April version of the whole MySQL manual so people looking for information would just get really confused. That’s about as funny as pouring milk on the floor.

    But, alright, you had me fooled there.

    David Andersen

    1 Apr 09 at 7:36 am

  13. The sad thing is that I converted the hex string to ASCII in my head. I didn’t even know that I could do that until now.

    Antony Curtis

    1 Apr 09 at 9:05 am

  14. Hi,
    Good article.

    I want to tell you all here that,I am using Oracle, MySQL, PostgreSQL.

    PostgreSQL is far better than all these 3 databases.

    MySQL is no where near to PostgreSQL. If you want to know PostgreSQL, use it. Not only refer the features, but also work on it.

    Then only you came to know why PostgreSQL is better?

    Praveen

    1 Apr 09 at 9:06 am

  15. hi,

    PostgreSQL runs in most of the opearating system even in windows.

    I am using PostgreSQL from past 3 years.

    I don’t know why author has made a statement like this?

    Praveen

    1 Apr 09 at 9:07 am

  16. Actually, the fact that this fooled people is freaky….

    But this is also good news in a way, as it means there’s also plenty of potential market for training and other services.

    Arjen Lentz

    1 Apr 09 at 9:10 am

  17. Can’t anybody see that this article is a April 1 joke?

    April

    1 Apr 09 at 11:57 am

  18. A few I know fell for this, amusing. Nice one.

    Mit

    1 Apr 09 at 12:01 pm

  19. Great post! I was buying it until #5. I *wish* there was a way to instruct the optimizer to just look for those few id numbers and substitute them into the the IN query.

    Paul Burney

    1 Apr 09 at 12:18 pm

  20. LMAO.

    Jay Pipes

    1 Apr 09 at 12:23 pm

  21. “Actually, InnoDB is just slower across the board than MyISAM, which is why you should always use MyISAM.”

    unless of course you like to have minor things like relational integrity taken care of for you by the database engine rather than by application code

    getting the correct answer slowly ~always~ trumps getting the wrong answer quickly

    but yeah, nice april 1 article

    ;o)

    rudy

    1 Apr 09 at 12:45 pm

  22. Praveen, PostgreSQL is better than PostgreSQL!?!?

    “This sentence is false.”

    Xaprb

    1 Apr 09 at 1:34 pm

  23. In parts of the world, it is now April 2. So I am going to close comments and (as planned all along, honestly) put a big red warning at the start and end of the post. Thanks for all the fish!

    And for the record, Giuseppe TOTALLY got me. I sat there staring at it, thinking there had been an error; tried again; tried three times. THREE TIMES. Duh.

    Xaprb

    1 Apr 09 at 1:38 pm

  24. Hm, looks like the Journalist theme has a bug. When I close comments, it not only disallows further comments, it doesn’t display the existing ones. So I’ll leave them open.

    Xaprb

    3 Apr 09 at 10:40 am

Leave a Reply