Xaprb

Stay curious!

How to optimize subqueries and joins in MySQL

with 16 comments

I have written before about using joins instead of subqueries, especially for NOT IN queries, which can usually be rewritten as exclusion joins — sometimes with huge efficiency gains. In this article I’ll look more closely at the performance characteristics of a few queries I’ve optimized in MySQL 5.0.3. I’ll also show you some tricks you can use to get MySQL to optimize queries better when you know it’s being inefficient.

Updates in a join

I wrote recently about the theoretical problems caused by UPDATE statements with FROM clauses (Many-to-one problems in SQL). This particular query shows the performance difference between a “correct” query and a “bad, nonstandard” query I wrote recently at work.

The application for this query is to update a table with aggregated clicks per day from a table of click-tracking data for online advertising. I have pre-populated the aggregate table, with one row per ad per day, over the time period I’m interested in. I have a table called tracking that holds non-aggregated click data. Here are the simplified table structures:

create table aggregate(
    day date not null,
    ad int not null,
    clicks int,
    primary key(day,ad)
) type=InnoDB;

create table tracking (
    tracking int not null auto_increment primary key,
    day date not null,
    ad int not null,
    clicktype enum('real','fraud','unknown') not null,
    clicks int not null,
    unique index(day, ad, clicktype)
) type=InnoDB;

Since the tables are InnoDB, the clustered index is the primary key. Notice two things about the tracking table: it has a surrogate key so it’s not clustered on the natural primary key (day, ad, clicktype), and clicks are separated out by clicktype, so I’ll have to aggregate the clicks I want. These are the design constraints I have to work with; I didn’t design the table.

For this article, I populated my tables with some pseudo-random data. First, I filled my aggregate table with some ads and days over which I want to work. I want to aggregate ads 1 through 50 over the time period 2005-01-01 through 2005-01-10, so that’s 500 rows in the aggregate table. The tracking table is moderately large. I filled it with dates from 2003-04-07 to 2006-01-01 (1,000 days) for ads 1 through 1000, with one row for each click type. That’s a total of three million rows. The clicks are random numbers between 1 and 100.

Given a numbers table with at least 1,000 rows, here are scripts to populate the tables. The second query will probably take a while to run, and will create a medium-sized chunk of data, so don’t think something is wrong when it keeps on grinding (15 minutes isn’t unreasonable). By the way, it’s not a good idea to run these queries on a production server!

insert into aggregate(day, ad)        
    select date_add('2006-01-01', interval a.i - 1 day), b.i
    from number as a
        cross join number as b 
    where a.i <= 10 and b.i <= 50;

insert into tracking(day, ad, clicktype, clicks)
    select date_add('2003-04-07', interval a.i - 1 day),
        b.i,
        c.type,
        rand() * 100
    from number as a 
        cross join number as b
        cross join (
            select 'real' as type
            union all select 'fraud'
            union all select 'unknown'
        ) as c
    where a.i <= 1000 and b.i <= 1000;

Now that the tables are set up, I’ll move on to the queries. Since MySQL doesn’t provide really good tools to profile queries, I ran these queries several times, disregarding the first run because it may not have been cached the same way as subsequent runs. The first query is the “bad, evil” way to do it. The second is the official, standards-compliant way.

Notice that the first query is a join, but the second query is a dependent (correlated) subquery, where the subquery refers to values in the enclosing query. They are equivalent, but the query plan is likely to be completely different. My goal here is to measure the performance characteristics of the two methods.

update aggregate as a
    inner join (
        select day, ad, sum(t.clicks) as c
        from tracking as t
        where t.clicktype in ('real', 'unknown')
        group by day, ad
    ) as t on a.ad = t.ad and a.day = t.day
set clicks = c;

update aggregate as a
    set clicks = (
        select sum(t.clicks)
        from tracking as t
        where t.day = a.day
            and t.ad = a.ad
            and t.clicktype in ('real', 'unknown')
    );

The first query runs in 96.4 seconds on my machine, plus or minus .15 seconds. The second runs in 0.02 seconds consistently. That’s a pretty big performance difference. Since these are UPDATE statements, I can’t EXPLAIN them and see exactly what the optimizer is doing differently, but I know one thing: the first query fails entirely when the tracking table has 30 million rows instead of 3 million, because it runs out of space for locks.

I have seen cases where the join query is slightly less efficient on small sets of data, but more efficient on larger sets; depending on the data, the probing strategy might not scale very well as the size of the outer table grows.

The queries also perform differently depending on whether the values in aggregate are 50 ads and 10 days, or 5 ads and 100 days. This is because the indexes on tracking are leftmost-prefixed on day, not ad. Your mileage will vary. For example, when I insert 5,000 rows (10 days, 500 ads) into aggregate instead of 50, the joined query takes 104.45 seconds, but the subquery takes 0.04. When I insert 50,000 rows (1,000 days, 500 ads), the joined query runs in 98.71 seconds, and the subquery in 3.87. The performance depends heavily on the characteristics of the data, the way I’m aggregating, indexes, and so on.

If I were being really scientific, I’d run vmstat and iostat and count I/O and other statistics too, to see how the queries access the tables differently, but I leave that as an exercise for the reader.

In the query at work, I ended up using the join. It’s more efficient on that specific data. I was actually surprised when I wrote the test queries for this article and found the subquery performing so much better. The difference could be due to any number of things — data, indexes, the fact we haven’t optimized the tables at work because they’re too big to touch, different architecture (Xeon vs. my AMD64), server configuration, memory, disk speed… who knows.

Moral of the story: try different ways of doing the same thing!

Badly optimized subqueries

I have mentioned before, without going into specifics, that using joins instead of subqueries can be hugely more efficient. Even though two queries might mean the same thing, and even though you’re supposed to tell the server what to do and let it figure out how, sometimes you just have to tell it how. Otherwise, the query optimizer might choose a really stupid query plan. I ran into an example of this recently. I have a three-level hierarchy of tables: category¸ subcategory, and item. There are a few thousand rows in category, a few hundred thousand in subcategory, and millions in item. You can disregard the category table from now on; I mentioned it for context, but it’s not used in the queries. Here are the table create statements:

create table subcategory (
    id int not null primary key,
    category int not null,
    index(category)
) engine=InnoDB;

create table item(
    id int not null auto_increment primary key,
    subcategory int not null,
    index(subcategory)
) engine=InnoDB;

I’ll fill the tables with some sample data again:

insert into subcategory(id, category)
    select i, i/100 from number
    where i <= 300000;

insert into item(subcategory)
    select id
    from (
        select id, rand() * 20 as num_rows from subcategory
    ) as x
        cross join number
    where i <= num_rows;

create temporary table t as
    select subcategory from item
    group by subcategory
    having count(*) = 19
    limit 100;

insert into item (subcategory)
    select subcategory
    from t
        cross join number
    where i < 2000;

Again, these queries may take a while to complete, and are not suitable for a production machine. The idea is to insert pseudo-random numbers of rows into item, so each subcategory has between 1 and 2018 items. It’s not a completely realistic distribution, but it’ll do.

I want to find all subcategories containing more than 2000 items in a given category. First, I find a subcategory with more than 2000 items, then use its category in the rest of the queries that follow. Here are queries that will do this:

select c.id
from subcategory as c
    inner join item as i on i.subcategory = c.id
group by c.id
having count(*) > 2000;

-- choose one of the results, then
select * from subcategory where id = ????
-- result: category = 14

Now that I have found a suitable value (14), I’ll use it in subsequent queries. Here’s my query to find all subcategories with more than 2000 items in category 14:

select c.id
from subcategory as c
    inner join item as i on i.subcategory = c.id
where c.category = 14
group by c.id
having count(*) > 2000;

In my specific data, there are 10 rows in the results, and the query finishes in a couple of seconds. EXPLAIN shows good index usage; it’s not a bad query at all, considering the table size. The query plan is to just run through ranges on some indexes and count the entries. So far, so good.

Now suppose I wanted to grab all columns from subcategory. I could join the results of the above query as a subquery, or select MAX or some such (since I know the values will be unique over the grouping set), but I could also do the following, right?

select * from subcategory
where id in (
    select c.id
    from subcategory as c
        inner join item as i on i.subcategory = c.id
    where c.category = 14
    group by c.id
    having count(*) > 2000
);

This query will finish about the time the Sun turns into a brown dwarf and swallows the Earth. I don’t know how long it will take because I’ve never felt like letting it run indefinitely. You’d think, from looking at the query, that it would a) evaluate the inner query and find the 10 values, b) go find those 10 rows in subcategory, which should be really fast to look up by their primary keys. Nope. Here’s the query plan:

*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: subcategory
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 300783
        Extra: Using where
*************************** 2. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: c
         type: ref
possible_keys: PRIMARY,category
          key: category
      key_len: 4
          ref: const
         rows: 100
        Extra: Using where; Using index; Using temporary; Using filesort
*************************** 3. row ***************************
           id: 2
  select_type: DEPENDENT SUBQUERY
        table: i
         type: ref
possible_keys: subcategory
          key: subcategory
      key_len: 4
          ref: c.id
         rows: 28
        Extra: Using index

If you’re not familiar with analyzing MySQL query plans, here’s the synopsis: MySQL has decided to run the query from the outside in, not the inside out. I’ll look at each part of the query in turn.

The outer query simply becomes select * from subcategory. Even though there’s a limitation on subcategory in the inner query (WHERE category = 14), MySQL isn’t applying that filter to the outer query for some reason. I don’t know why. All I know is it’s doing a table scan (that’s what type: ALL means), and not using any indexes. That’s a table scan over several hundred thousand rows.

For each row in the outer query, it’s performing the inner query, even though there are no references in the inner query to values in the enclosing scope, because it has “optimized” the inner query by rewriting it to refer to the outer query. At this point, the query plan becomes nested loops. For each loop in the outer query, the query probes into the inner query. Here is the query plan, after the optimizer rewrites it:

select * from subcategory as s
where <in_optimizer>(
   s.id,<exists>(
   select c.id
   from subcategory as c
      join item as i
   where ((i.subcategory = c.id) and (c.category = 14))
   group by c.id
   having ((count(0) > 2000)
      and (<cache>(s.id) = <ref_null_helper>(c.id))))
)

You can get the optimized query from EXPLAIN EXTENDED followed by SHOW WARNINGS. Notice the reference to the outer scope in the HAVING clause.

I’m not bringing this up to bash MySQL’s optimization strategy. It’s pretty common knowledge that MySQL doesn’t yet optimize subqueries very well in some cases, and this particular problem is widely reported. I’m just pointing out that it is up to the programmer to check queries and make sure they aren’t badly optimized. In most cases, it’s safer just to stay away from subqueries if they’re not needed — especially WHERE... IN() or WHERE... NOT IN() queries.

My new rule for myself is “when in doubt, EXPLAIN the query.” If it’s a big table, I’m automatically doubtful.

How to force the inner query to execute first

The query in the preceding section suffers because MySQL executes it from the outside in as a correlated subquery, instead of from the inside out without correlation. It’s possible to get MySQL to execute the inner query first, materialized as a temporary table, and avoid the huge performance penalty.

MySQL materializes subqueries in the FROM clause (commonly, and somewhat misleadingly, known as derived tables). This means MySQL executes the inner query first and saves the results in a temporary table, then uses it in the rest of the table. This is exactly the behavior I wanted to happen when I wrote that query! Here’s the modified query:

select * from subcategory
where id in (
    select id from (
        select c.id
        from subcategory as c
            inner join item as i on i.subcategory = c.id
        where c.category = 14
        group by c.id
        having count(*) > 2000
    ) as x
);

All I did was wrap the subquery in another subquery. MySQL thinks there’s a dependent subquery, but now it’s only the wrapper query, which is probing into a temporary table with only a few rows, so it is fast anyway. At this point this is a silly optimization; it would probably be better to just rewrite the query as a join. Among other things, that would avoid the danger of someone noticing the wrapper subquery is superfluous and “cleaning up the code.”

This optimization can be used in a number of ways, for example to prevent MySQL from complaining about a subquery selecting data from a table being modified elsewhere in the query. Unfortunately, it doesn’t work to get around the restriction about temporary tables only appearing once in a query.

Written by Xaprb

April 30th, 2006 at 9:02 pm

Posted in SQL

16 Responses to 'How to optimize subqueries and joins in MySQL'

Subscribe to comments with RSS or TrackBack to 'How to optimize subqueries and joins in MySQL'.

  1. The bad optimization of sub-queries by MySQL is something that they don’t plan on fixing until MySQL 5.2, which is scheduled to be released as production code in 2007. Personally, I feel that this issue should be resolved much earlier than that. If you agree, add your comments to bug 9090.

    voiperster

    8 May 06 at 6:51 am

  2. I was having this problem too, but my problem went deeper, as I needed two tables joined, and I need the data out of them - I couldn’t see how to make your code work in that scenario.

    My SQL looked like this:

    select col1, col2, col3 from t1
     join (select col as col2  from t2 where name='data1') t2
     on t1.id=t2.id
     join (select col as col3 from t2 where name='data1') t3
     order by col2, col3 limit 5

    This recursively broke the mysql engine, as I have 100000 records in both tables (I’m trying to build a table from a name-value pairs table BTW). All the keys in the world didn’t optimize the subqueries.

    Finally after much trial and error I found creating a view was the answer. A view describing the two subqueries now has an execution plan with only one table scan, and additional columns can be added without much performance hit. Full SQL now looks like:

    CREATE VIEW v_col2 AS
      SELECT * FROM t1 where name='Data2';
    
    CREATE VIEW v_col3 AS
      SELECT * FROM t2 where name='Data1';
    
    SELECT col1,t2.data as  col2,t3.data as col3
     from t1 join v_col2 t2 on t1.id=t2.id
     join v_col3 t3 order by col2, col3 limit 5

    Interestingly temporary tables didn’t work quite as well (still wanted a full row scan - but I may have missed a key), but I didn’t like them anyway as they are not dynamically updated.

    Hope that helps someone!

    Anonymous

    18 Jun 06 at 6:02 am

  3. Thank you for the walk through. I am in the process of creating an image database much like your sub-categories and items description. So far I have avoided sub queries, but I think I will be using them soon. I did not even know that MySQL would do it any other way than outside in. I am going to print this out to use as a reference as I move forward. If you have found any other discussion on the optimization of sub-queries I would be very interested. Thanks!

    Peter

    29 Nov 06 at 1:07 pm

  4. Hey, I’m really enjoying your blog and have had success using this particular trick to optimize the subquery execution. Now I’m having a related issue, although not identical. Your feedback would be greatly appreciated.

    I’m in MySQL 4.1.12a

    There are three tables: interval_span, impression, and ad_unit. The idea is that the impression table has records that need to be counted based primarily on their network_id and ad_unit_type_id (which is retrieved via a join to ad_unit), and secondarily on the view_time being within the applicable date range.

    Some table structure details:

    mysql> desc interval_span;
    +------------------+----------+------+-----+---------------------+----------------+
    | Field            | Type     | Null | Key | Default             | Extra          |
    +------------------+----------+------+-----+---------------------+----------------+
    | id               | int(11)  |      | PRI | NULL                | auto_increment |
    | ad_unit_type_id  | int(11)  |      | PRI | 0                   |                |
    | network_id       | int(11)  |      | MUL | 0                   |                |
    | begin_date       | datetime |      | PRI | 0000-00-00 00:00:00 |                |
    | end_date         | datetime |      | PRI | 0000-00-00 00:00:00 |                |
    | roadblock_id     | int(11)  | YES  | MUL | NULL                |                |
    | expected_traffic | int(11)  | YES  |     | NULL                |                |
    | impression_count | int(11)  | YES  |     | NULL                |                |
    +------------------+----------+------+-----+---------------------+----------------+
    8 rows in set (0.00 sec)
    
    mysql> show index from interval_span;
    +---------------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
    | Table         | Non_unique | Key_name            | Seq_in_index | Column_name     | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
    +---------------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
    | interval_span |          0 | PRIMARY             |            1 | id              | A         |      106396 |     NULL | NULL   |      | BTREE      |         |
    | interval_span |          0 | PRIMARY             |            2 | ad_unit_type_id | A         |      106396 |     NULL | NULL   |      | BTREE      |         |
    | interval_span |          0 | PRIMARY             |            3 | begin_date      | A         |      106396 |     NULL | NULL   |      | BTREE      |         |
    | interval_span |          0 | PRIMARY             |            4 | end_date        | A         |      106396 |     NULL | NULL   |      | BTREE      |         |
    | interval_span |          1 | interval_span_FI_1  |            1 | ad_unit_type_id | A         |          13 |     NULL | NULL   |      | BTREE      |         |
    | interval_span |          1 | interval_span_FI_2  |            1 | network_id      | A         |           5 |     NULL | NULL   |      | BTREE      |         |
    | interval_span |          1 | interval_span_FI_3  |            1 | roadblock_id    | A         |      106396 |     NULL | NULL   | YES  | BTREE      |         |
    | interval_span |          1 | interval_span_idx_1 |            1 | begin_date      | A         |        1899 |     NULL | NULL   |      | BTREE      |         |
    | interval_span |          1 | interval_span_idx_2 |            1 | end_date        | A         |        1899 |     NULL | NULL   |      | BTREE      |         |
    +---------------+------------+---------------------+--------------+-----------------+-----------+-------------+----------+--------+------+------------+---------+
    9 rows in set (0.00 sec)
    
    mysql> select count(*) from interval_span;
    +----------+
    | count(*) |
    +----------+
    |   106396 |
    +----------+
    
    mysql> desc impression;
    +-------------+--------------+------+-----+---------+----------------+
    | Field       | Type         | Null | Key | Default | Extra          |
    +-------------+--------------+------+-----+---------+----------------+
    | id          | int(11)      |      | PRI | NULL    | auto_increment |
    | view_time   | datetime     | YES  | MUL | NULL    |                |
    | ad_unit_id  | int(11)      |      | PRI | 0       |                |
    | network_id  | int(11)      | YES  | MUL | NULL    |                |
    | remote_addr | varchar(255) | YES  |     | NULL    |                |
    +-------------+--------------+------+-----+---------+----------------+
    5 rows in set (0.00 sec)
    
    mysql> show index from impression;
    +------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    | Table      | Non_unique | Key_name         | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
    +------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    | impression |          0 | PRIMARY          |            1 | id          | A         |     2842354 |     NULL | NULL   |      | BTREE      |         |
    | impression |          0 | PRIMARY          |            2 | ad_unit_id  | A         |     2842354 |     NULL | NULL   |      | BTREE      |         |
    | impression |          1 | impression_FI_1  |            1 | ad_unit_id  | A         |          50 |     NULL | NULL   |      | BTREE      |         |
    | impression |          1 | impression_FI_2  |            1 | network_id  | A         |           1 |     NULL | NULL   | YES  | BTREE      |         |
    | impression |          1 | impression_idx_1 |            1 | view_time   | A         |       94745 |     NULL | NULL   | YES  | BTREE      |         |
    | impression |          1 | impression_idx_2 |            1 | view_time   | A         |       94745 |     NULL | NULL   | YES  | BTREE      |         |
    | impression |          1 | impression_idx_2 |            2 | network_id  | A         |       94745 |     NULL | NULL   | YES  | BTREE      |         |
    +------------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
    7 rows in set (0.00 sec)
    
    mysql> select count(*) from impression;
    +----------+
    | count(*) |
    +----------+
    |  2842354 |
    +----------+

    Now, to the meat of it… Here’s the update query we’re attempting:

    UPDATE interval_span span
       SET span.impression_count = (
    			SELECT count(imp.id)
    			  FROM impression imp
    						 INNER JOIN ad_unit au ON imp.ad_unit_id = au.id
    			 WHERE imp.view_time >= span.begin_date
    			   AND imp.view_time < span.end_date
    			   AND imp.network_id = span.network_id
    			   AND au.ad_unit_type_id = span.ad_unit_type_id
    )
    WHERE span.begin_date <= '2006-12-03 04:40:00'
      AND span.begin_date >= '2006-12-03 04:00:00';

    This is the smallest date range I can set up (interval_span begin/end date ranges are 30 minutes long). This update query returns: Query OK, 3 rows affected (2 min 13.50 sec)
    Rows matched: 70 Changed: 3 Warnings: 0

    I need to get this time WAY down. Here are some notes I took while testing:

    1. Running inner select count query with constant values takes 2.2 sec
    2. Running whole update without inner join on ad_unit takes much longer
    3. Creating the impression_idx_2 index on impression(view_time, network_id) reduced execution by 10%

    Ok, here’s some more notes, that hopefully will lend more insight. In order to try to use mysql’s explain tool, I re-wrote the query as a select, it runs in almost identical time, so I think I’ve got it right, but haven’t figured out how to optimize it further:

    SELECT span.id, count(imp.id)
      FROM interval_span span
    	     INNER JOIN impression imp ON imp.network_id = span.network_id
    			 INNER JOIN ad_unit au ON imp.ad_unit_id = au.id
     WHERE span.begin_date <= '2006-12-03 04:40:00'
       AND span.begin_date >= '2006-12-03 04:00:00'
       AND imp.view_time >= span.begin_date
       AND imp.view_time < span.end_date
       AND au.ad_unit_type_id = span.ad_unit_type_id
    GROUP BY span.id;
    
    +--------+---------------+
    | id     | count(imp.id) |
    +--------+---------------+
    | 219136 |         19013 |
    | 219146 |         58026 |
    | 219156 |         35805 |
    +--------+---------------+
    3 rows in set (2 min 21.20 sec)

    Any insight you could lend would be GREATLY appreciated! I’ve been working on this for quite some time. Please feel free to publish this note and your response, I’ve found your responses to others very helpful, I think this could help others as well.

    Thanks,

    Lon

    Lon B.

    6 Dec 06 at 6:17 pm

  5. Hi. Im trying to run your example,
    but when I populate the tables with the inserts, i get this error:

    ERROR 1146 (42S02): Table ‘vy_test.number’ doesn’t exist

    I tried in Mysql 4.1.20 and 5.0.27-standard.

    any help appreciated!

    Tim

    27 Feb 07 at 5:42 am

  6. I should have referred you to the numbers table in the article’s text, but apparently I forgot to link to it.

    Xaprb

    27 Feb 07 at 8:32 am

  7. Awesome! this is the best mysql blog ive seen. thanks for taking the time to write.

    Tim

    27 Feb 07 at 3:33 pm

  8. [...] Google tells me that this issue has been reported since 2004 in early version of MySQL like 4.x. The problem was not focused in any of the updates afterward. After that, I read the article on the topic How to optimize subqueries and joins in MySQL [...]

  9. Hello, just wanted to say thanks for this article (btw, currently #10 googling “dependent subquery” ;-)!
    The SQL programming is a new field to me (being a programmer for two decades in other fields), and I just happened to get into the problem explained here yesterday.
    Even if you say the problem ‘MySQL doesn’t yet optimize subqueries very well’ is common knowledge, nobody cares to state the problem along with good solutions, except your article which is also well readable to someone not being SQL guru.
    Here is my query with the problem (find the most recent items’ details for each ‘cid’ value in a table):

    SELECT cid,txt FROM gpoints WHERE id IN
    (SELECT MAX(id) FROM gpoints GROUP BY cid)

    With 13800 entries in the table, and 13 results for the inner subquery, this query ran for over 15 minutes.
    EXPLAIN showed me the cause, but I found no other way around, so I separated this into two queries (inner query first, then execute the outer query with the IN() list populated with values).
    Now your solution makes my query run in 0.05sec:

    SELECT cid,txt FROM gpoints WHERE id IN
    ( SELECT mi FROM
    (SELECT MAX(id) AS mi FROM gpoints GROUP BY cid)
    AS x)

    (As an outsider, I just cannot understand why MySQL doesn’t have a way to specify evaluation order for subqueries like this when the optimizer cannot recognize the 10000-fold time difference…)

    BB

    10 Jul 07 at 3:27 am

  10. Good Day,

    I have one question i’m eager to know.
    How can we set a field of a table to be a clustered index using MYSQL?

    Thx

    stephen

    31 Jul 07 at 1:11 am

  11. The only storage engine that offers clustered indexes at this time is InnoDB. The primary key is automatically clustered. You cannot change which index is clustered; you cannot make a primary key without making it clustered; and if you don’t create a primary key on an InnoDB table, it creates an invisible 6-byte primary key and clusters on it.

    Xaprb

    31 Jul 07 at 7:26 am

  12. Just found this article and your explanation has turned my 45 second query into a 0.1 second query. :)

    DaveO

    20 Sep 07 at 10:44 am

  13. Was given a script today that took approximately 40 minutes to run. It now runs in under a minute. Thanks to this form of query optimization. Many more issues to hammer out but this was a huge one. Thanks a ton!

    Nolan Hurlburt

    28 Sep 07 at 1:36 pm

  14. [...] en glimrende artikel om et lille MySQL [...]

  15. This is ENORMOUSLY helpful. I was wondering why my subqueries seemed to be re-executing the subquery for each row of the outer query, even though there were no dependencies. Thanks!

    Chris Rogers

    17 Nov 08 at 11:43 pm

Leave a Reply