Xaprb

Stay curious!

How to simulate FULL OUTER JOIN in MySQL

with 26 comments

In this article I’ll show several ways to emulate a FULL OUTER join on a RDBMS that doesn’t support it, as is the case with even the most recent versions of MySQL. This useful query is surprisingly tricky to get right.

Introduction

A standard SQL FULL OUTER join is like a LEFT or RIGHT join, except that it includes all rows from both tables, matching them where possible and filling in with NULLs where there is no match. I’ll illustrate that for clarity. Here are two of my favorite tables, apples and oranges:

I’ll join them on price. Here is the left join:

select * from apples as a
    left outer join oranges as o on a.price = o.price

And the right join:

select * from apples as a
    right outer join oranges as o on a.price = o.price

The FULL OUTER JOIN of these two tables, on price, should give the following result:

That’s the result I’ll be working toward in this article. Here is a script to create and populate the example tables, so you can follow along:

create table apples (variety char(10) not null primary key, price int not null);
create table oranges (variety char(10) not null primary key, price int not null);
insert into apples(variety, price) values('Fuji',5),('Gala',6);
insert into oranges(variety, price) values('Valencia',4),('Navel',5);

Method 1: two JOINs and a UNION

One method to simulate a full join is to take the union of two outer joins, for example,

select * from apples as a
    left outer join oranges as o on a.price = o.price
union
select * from apples as a
    right outer join oranges as o on a.price = o.price

This gives the desired results in this case, but it isn’t correct for all cases. Suppose there are duplicate records in the tables (remove the primary key and insert twice to create this situation). UNION eliminates duplicates, which a full join doesn’t do. UNION ALL isn’t the right answer either, because it will cause spurious duplicates. In fact, UNION generates two independent result sets and then combines them, so there is no way to get around this, because the two result sets need to “know about each other” to produce the right results.

There are legitimate cases where duplicate results are expected and correct. For instance, even when the rows are unique, selecting only certain columns, in which there are duplicates, could cause this situation. This doesn’t apply in relational theory, because a set never has duplicates no matter what, but it does in SQL.

Method 2: UNION ALL and an exclusion join

One way to make UNION include only the duplicates I want is to use an exclusion join to eliminate anything from the second result that is already included in the first, like this:

select * from apples as a
   left outer join oranges as o on a.price = o.price
union all
select * from apples as a
   right outer join oranges as o on a.price = o.price
where a.price is null;

This handles duplicate rows correctly and doesn’t include anything it shouldn’t. It’s necessary to use UNION ALL instead of plain UNION, which would eliminate the duplicates I want to keep. This may be significantly more efficient on large result sets, since there’s no need to sort and remove duplicates.

Method 3: use a mutex table

There’s a case where UNION won’t work: older versions of MySQL don’t support it. All is not lost, though.

I’ve written several articles explaining how to start with a set of mutually exclusive numbers (which I informally call a “mutex table“), then use the mutual exclusivity of the numbers to join things together in interesting ways. This lets me simulate subqueries and unions on earlier versions of MySQL, for example. This approach seems like it might work well here, too. For the following queries I’ll assume my mutex table has the values 0 and 1. Here’s a baseline mutex query for these two tables:

select * from mutex
    left outer join apples as a on i = 0
    left outer join oranges as o on i = 1;

Of course this isn’t a full join. If I want to full join on price, naturally, I need to include price in the join criteria somewhere, and the query above doesn’t even mention the price. But it’s a starting point for tinkering.

The mutex values in the leftmost column, combined with the join criteria, ensure that every row in the two tables gets included on its own row. The mutual exclusivity causes the Navel row not to be matched to the Fuji row, even though they have the same price. The correct behavior of a full join on price is to “fill in” the NULL values where the prices are equal. This modification to the join criteria will fill it in:

select * from mutex
    left outer join apples as a on i = 0
    left outer join oranges as o on i = 1 or a.price = o.price;

The or a.price = o.price relaxes the mutual exclusivity, telling the join to keep the rows separated unless they have the same price:

That’s getting closer. There is a spurious row, though. The Navel row at the bottom of the result set shouldn’t be there; it has already been matched to the Fuji row earlier, so there’s no need to include it with all those NULLs as though there were no matching row in apples. Can I eliminate the Navel row without eliminating the Valencia row?

That turns out to be harder to do. I stared at it for a while, thinking I could include a WHERE clause that would eliminate spurious rows based on the value of i, but after a bit I got a reality check: the row has already been included above, and WHERE clauses work a row at a time, so there’s no way to assert something about one row while applying the WHERE clause to another row. This simple fact is all I needed to realize there’s no way to eliminate the Navel row with the given information.

What I can do, though, is stack another copy of the apples table onto the right-hand side of the results thus far, matching them to the oranges values and confining them to rows with mutex value 1 instead of 0. Now I can write a WHERE clause to see if a row in the i = 1 part of the result set matches a row in the i = 0 part. I’ll write it without the WHERE clause to start:

select * from mutex
   left outer join apples as a on i = 0
   left outer join oranges as o on i = 1 or a.price = o.price
   left outer join apples as a2 on i = 1 and a2.price = o.price;

Now there is a way to tell between the Navel row, which I don’t want, and the Valencia, which I do: the Navel has non-NULL values in the rightmost copy of apples, but the Valencia doesn’t. All I have to do is eliminate rows that have matching values:

select * from mutex
   left outer join apples as a on i = 0
   left outer join oranges as o on i = 1 or a.price = o.price
   left outer join apples as a2 on i = 1 and a2.price = o.price
where o.price is null or a2.price is null

If the WHERE clause is hard to understand, perhaps it’s easier to think of it this way: where not(o.price is not null and a2.price is not null). Both clauses are identical; all I did was apply some boolean identities. Here is the result:

That result has the correct rows, but it has some extra columns, which I don’t need. Here’s the final query:

select a.*, o.* from mutex
   left outer join apples as a on i = 0
   left outer join oranges as o on i = 1 or a.price = o.price
   left outer join apples as a2 on i = 1 and a2.price = o.price
where o.price is null or a2.price is null

Remember, this can’t be done without the mutex table, because I need something to provide non-NULL values for every row; otherwise the joins would not include values where the leftmost table has no rows to contribute.

This technique works if there are duplicate rows, and works on older versions of MySQL, but is probably the least efficient of the three I’ve demonstrated here. As usual, which query is appropriate depends on circumstances.

Written by Xaprb

May 26th, 2006 at 9:45 pm

Posted in SQL

26 Responses to 'How to simulate FULL OUTER JOIN in MySQL'

Subscribe to comments with RSS or TrackBack to 'How to simulate FULL OUTER JOIN in MySQL'.

  1. I tried a UNION and a UNION all as you have described here, but those just gave more records with no way, that I could see, to disambiguate and get down to the “net change” in history records.

    My problem is as follows …

    I have been wrestling with this query for a couple of days now. Any help is much appreciated. I have a history table that logs all fields created for a record and every time the field is updated. This lets me track any change to the record and who touched it. The problem is I’d like to take the raw change data and distill it down to records like: FIELD changed_to_info changed_from_info BUT I keep getting duplicate records in my query. THANK-YOU! THANK-YOU!

    Here is a small test that I have distilled the problem into as I have tried to get it working …

    # test table representing logging to a "change history" table
    # NOTE: the FK to the record being updated not shown for simplicity
    create table test (
       pk 		 int(3) primary key not null auto_increment,
       `by`   varchar(10) not null,
       `when` datetime not null,
       `field` varchar(10) not null,
       revision int not null default 0,
       type varchar(5) not null,
       val varchar(20) not null
       );

    [ed: snipped some stuff that got mangled]

    Joe Simone

    16 Aug 06 at 2:07 pm

  2. Hi Joe,

    A lot of your comment got mangled, so I’ll do my best. Here’s the table as I got it from the INSERTs in your comment:

    +----+-------+------------+-------+----------+------+----------+
    | pk | by    | when       | field | revision | type | val      |
    +----+-------+------------+-------+----------+------+----------+
    |  1 | bob   | 2006-08-01 | food  |        1 | add  | pizza    |
    |  2 | bob   | 2006-08-01 | sport |        1 | add  | baseball |
    |  3 | bob   | 2006-08-01 | drink |        1 | add  | beer     |
    |  4 | bob   | 2006-08-01 | lang  |        1 | add  | java     |
    |  5 | bob   | 2006-08-01 | state |        1 | add  | fla      |
    |  6 | admin | 2006-08-05 | drink |        2 | upd  | cola     |
    |  7 | admin | 2006-08-05 | lang  |        2 | upd  | c++      |
    |  8 | admin | 2006-08-05 | food  |        2 | upd  | tacos    |
    |  9 | bob   | 2006-08-15 | drink |        3 | upd  | wine     |
    | 10 | bob   | 2006-08-15 | lang  |        3 | upd  | ruby     |
    | 11 | bob   | 2006-08-15 | state |        3 | upd  | ny       |
    | 12 | admin | 2006-08-16 | drink |        4 | upd  | beer     |
    +----+-------+------------+-------+----------+------+----------+

    You want to see (for example) that bob changed drink between revision 3 and 2, but not include a line showing changes between revision 3 and 1, right? Does this do what you need?

    select a.field, a.by, a.revision, a.val, b.by, b.revision, b.val
    from test as a
       left outer join test as b on a.field = b.field
          and b.revision < a.revision
       left outer join test as c on a.field = c.field
          and c.revision < a.revision and c.revision > b.revision
    where b.field is not null and c.field is null;
    
    +-------+-------+----------+-------+-------+----------+-------+
    | field | by    | revision | val   | by    | revision | val   |
    +-------+-------+----------+-------+-------+----------+-------+
    | drink | admin |        2 | cola  | bob   |        1 | beer  |
    | lang  | admin |        2 | c++   | bob   |        1 | java  |
    | food  | admin |        2 | tacos | bob   |        1 | pizza |
    | drink | bob   |        3 | wine  | admin |        2 | cola  |
    | lang  | bob   |        3 | ruby  | admin |        2 | c++   |
    | state | bob   |        3 | ny    | bob   |        1 | fla   |
    | drink | admin |        4 | beer  | bob   |        3 | wine  |
    +-------+-------+----------+-------+-------+----------+-------+

    This is essentially a 3-way cross join, so it’s an n^3 algorithm. You would be well advised to add an index on (field, revision) or it may take all day on a large data set :-) Also, it looks like you don’t need DATETIME, just DATE.

    Xaprb

    16 Aug 06 at 4:11 pm

  3. The small defination is

    LEFT OUTER JOIN – This returns all the matching rows and the unmatched rows of the left table of the SQL code.

    RIGHT OUTER JOIN – This returns all the matching rows and the unmatched rows of the right table of the SQL.

    FULL OUTER JOIN – This returns all the matching and unmatched rows from both the tables.

    Raj Shekhar

    23 Oct 06 at 5:26 am

  4. For anyone curious, method #1 CAN be extended to handle more than 2 tables… It just gets ugly. Basically, for each table LEFT JOIN all other tables to it, Then union each of those together.

    CREATE TABLE `table1` (
      `f1` int(11) default NULL,
      `gid` int(11) default NULL
    ) ENGINE=MyISAM;
    
    CREATE TABLE `table2`...
    
    CREATE TABLE `table3`...
    
    INSERT INTO table1 VALUES(1,1), (4,1), (7,1), (10,3), (13,4), (16,4), (19,4), (22,6);
    INSERT INTO table2 VALUES(2,1), (5,2), (8,2), (11,3), (14,4), (17,5), (20,5), (23,6);
    INSERT INTO table3 VALUES(3,2), (6,2), (9,3), (12,3), (15,5), (18,5), (21,6), (24,6);

    Here I’m trying to get the SUM of each group of numbers within a single table and display them together in the same result so I need a subquery here… i.e. table1, group 1 is 1+4+7 = 12; table3, group 2 is 3+6 = 9 ….

    (SELECT t1.gid, IFNULL(t1.sum1, 0) AS sum1, IFNULL(t2.sum2, 0) AS sum2, IFNULL(t3.sum3, 0) AS sum3
      FROM (SELECT sum(f1) AS sum1, gid FROM table1 GROUP BY gid) AS t1
      LEFT JOIN (SELECT sum(f1) AS sum2, gid FROM table2 GROUP BY gid) AS t2 ON (t1.gid = t2.gid)
      LEFT JOIN (SELECT sum(f1) AS sum3, gid FROM table3 GROUP BY gid) AS t3 ON (t1.gid = t3.gid)
    ) UNION (
    SELECT t2.gid, IFNULL(t1.sum1, 0) AS sum1, IFNULL(t2.sum2, 0) AS sum2, IFNULL(t3.sum3, 0) AS sum3
      FROM (SELECT sum(f1) AS sum2, gid FROM table2 GROUP BY gid) AS t2
      LEFT JOIN (SELECT sum(f1) AS sum1, gid FROM table1 GROUP BY gid) AS t1 ON (t2.gid = t1.gid)
      LEFT JOIN (SELECT sum(f1) AS sum3, gid FROM table3 GROUP BY gid) AS t3 ON (t2.gid = t3.gid)
    ) UNION (
    SELECT t3.gid, IFNULL(t1.sum1, 0) AS sum1, IFNULL(t2.sum2, 0) AS sum2, IFNULL(t3.sum3, 0) AS sum3
      FROM (SELECT sum(f1) AS sum3, gid FROM table3 GROUP BY gid) AS t3
      LEFT JOIN (SELECT sum(f1) AS sum1, gid FROM table1 GROUP BY gid) AS t1 ON (t3.gid = t1.gid)
      LEFT JOIN (SELECT sum(f1) AS sum2, gid FROM table2 GROUP BY gid) AS t2 ON (t3.gid = t2.gid)
    ) ORDER BY gid;
    +------+------+------+------+
    | gid  | sum1 | sum2 | sum3 |
    +------+------+------+------+
    |    1 |   12 |    2 |    0 |
    |    2 |    0 |   13 |    9 |
    |    3 |   10 |   11 |   21 |
    |    4 |   48 |   14 |    0 |
    |    5 |    0 |   37 |   33 |
    |    6 |   22 |   23 |   45 |
    +------+------+------+------+

    Matthew Montgomery

    25 Jan 07 at 1:05 pm

  5. Hi,

    Using your method, how can I use an ORDER BY sentence to order the rows by the common attribute (in your case, price)?

    kelmer

    9 May 07 at 4:23 pm

  6. You can just add ORDER BY as normal at the end of the query.

    Xaprb

    9 May 07 at 10:25 pm

  7. I tried this:

    (select * from apples as a
       left outer join oranges as o on a.price= o.price)
    union all
    (select * from applesas a
       right outer join oranges as o on a.price= o.price
    where a.price is null)
    order by a.price

    and I get an error saying: “#1250 – Table ‘a’ from one of the SELECTs cannot be used in global ORDER clause “

    If I use this instead

    select * from apples as a
       left outer join oranges as o on a.price = o.price
    union all
    select * from apples as a
       right outer join oranges as o on a.price = o.price
    where a.price is null
    order by a.price

    I get this:

    “#1109 – Unknown table ‘a’ in order clause “

    Where and how should I puit the ORDER BY sentence?

    kelmer

    10 May 07 at 2:58 am

  8. The MySQL manual explains what you need to do in the section on UNION syntax.

    Xaprb

    10 May 07 at 8:03 am

  9. Awesome article, helped me big time.

    -Dan

    Dan

    11 Jan 08 at 8:01 pm

  10. Its Wonderful! Explanation is best ……… even newbies like me can understand this better.
    Thanks a lot!

    Garry

    22 Feb 08 at 10:29 am

  11. Thanks for a useful post. Am I right that a full outer join implementation is not possible using MySQL’s NATURAL joins? I’m struggling to get a result where the column joined on does not appear twice. NATURAL joins achieve this but return the columns in different orders… any ideas?

    Mark

    12 Mar 08 at 7:41 pm

  12. It turns out that FULL OUTER JOIN isn’t possible no matter what. The reasons are kind of technical.

    You should be able to get the columns in the order you want if you specify them explicitly.

    Xaprb

    12 Mar 08 at 7:53 pm

  13. Great article… However i failed to join my table to give me what i want.
    Table: person
    CREATE TABLE `person` (
    `region` varchar(20) collate latin1_general_ci default NULL,
    `name` varchar(20) collate latin1_general_ci default NULL,
    `gender` char(2) collate latin1_general_ci default NULL,
    );

    insert into person (‘UK’,'Steve’,'m’);
    insert into person (‘UK’,'Marie’,'f’);
    insert into person (‘UK’,'Rob’,'m’);
    insert into person (‘US’,'Bob’,'m’);
    insert into person (‘US’,'Michele’,'f’);
    insert into person (‘NL’,'Jim’,'m’);

    Now i want to issue a query which will display in one row the number of male and female per region
    That should be
    Region|M_Count|F_Count
    UK|2|1
    US|1|1
    NL|1|0

    Kili

    15 Sep 08 at 8:05 am

  14. Hi Kili,

    As far as I know we can not take two counts from single query, so we need to use sub-queries so you can try this following query:

    select a.region, (a.Malecount b.Malecount) Males, (a.Femalecount b.Femalecount) Females
    from
    ( select region, count(*) MaleCount, 0 FemaleCount from person where gender=’M'group by region)a,
    ( select region, 0 MaleCount, count(*) FemaleCount from person where gender=’F'group by region)b

    I think this should work.
    where a.region = b.region

    Garry

    15 Sep 08 at 8:39 am

  15. Sorry! please use last line of the last comment as part of query.
    Sorry for typo err.

    Garry

    15 Sep 08 at 8:41 am

  16. Thanks Garry,

    It refuse to accept the (a.Malecount b.Malecount) Males synthax.

    Any other idea

    Kili

    Kili

    15 Sep 08 at 3:59 pm

  17. I tried
    select a.region, (a.Malecount b.Malecount) Males, (a.Femalecount b.Femalecount) Females
    from
    ( select region, count(*) MaleCount, 0 FemaleCount from person where gender=’M’group by region)a,
    ( select region, 0 MaleCount, count(*) FemaleCount from person where gender=’F’group by region)b where a.region = b.region

    which return correct result but for only one region.

    Kili

    Kili

    15 Sep 08 at 4:13 pm

  18. Hey Kili,

    I’m pretty late to the game here, but you can do the following to find the number of people of each gender for each region:

    select
    region,
    sum(if(gender = ‘m’, 1, 0)) as males,
    sum(if(gender = ‘f’, 1, 0)) as females
    from person
    group by region;

    See

    http://dev.mysql.com/tech-resources/articles/wizard/print_version.html

    for more info.

    Mike

    Mike Gibson

    17 Nov 08 at 8:07 pm

  19. 好帖,感谢博主分享!

    内事绝不问白肚,
    外事不决问狗狗,
    房事不决问网易。
    注意第一句是“绝不”。
    —————————–
    内事绝不问百度,
    外事不决问笔搜,
    房事不决问网易。
    注意第二句是“笔搜”。

    笔锋侠

    28 Nov 08 at 3:57 am

  20. Thanks Mike for your great tip.. That exactly what i was looking for.

    笔锋侠: Thanks for your post as well though am not sure what who has the decryption key.

    Kili

    Kili

    2 Dec 08 at 10:30 am

  21. i believe there is one more way to do it, something like this:

    SELECT *
    FROM apples AS a
    LEFT OUTER JOIN
    (
    SELECT o2.*
    FROM apples AS a
    RIGHT OUTER JOIN oranges AS o2
    ON a.price = o2.price
    ) AS o ON a.price = o.price

    mauricio

    19 Jan 09 at 1:46 pm

  22. [...] correspondencia con la otra tabla. Sin embargo no está implementada en MySQL, aunque podemos simularla con UNION ALL (UNION se utiliza para combinar los resultados de varios SELECT; la palabra clave ALL tiene el [...]

    MySQL

    20 Feb 09 at 11:59 am

  23. I got another solution

    (in spanish)
    http://disemq21.blogspot.com/2009/08/another-mysql-full-join.html

    regards,
    edward

    edward

    10 Aug 09 at 3:36 pm

  24. [...] MySQL does not support Full Outer Joins natively (there are some work-arounds).  However, it is roughly like the union of Left Outer Joins and Right Outer Joins according to what I have read.  See http://www.xaprb.com/blog/2006/05/26/how-to-write-full-outer-join-in-mysql/ [...]

  25. [...] Dann gibt’s hier einige Lösungen. http://www.xaprb.com/blog/2006/05/26/how-to-write-full-outer-join-in-mysql/ [...]

Leave a Reply