Xaprb

Stay curious!

Finding queries with duplicate columns

with 3 comments

A while ago I wrote about a tool to help make upgrades safer. Since then, we have gotten several people to help sponsor development on this tool, and a few of our customers are using it to help find problems before they upgrade their systems.

I can’t think of a single one of the Maatkit tools that didn’t grow out of the need for deeper insight into some part of the system. This tool is no exception. And as always, these tools are like flashlights. When you crouch down near the floor, and shine your flashlight under the refrigerator, you should expect to find a few things that make you cringe.

The other day, one of our customers was using this tool and we started getting an error. The error was caused by the part of the tool that verifies that result sets are the same. Our thought on how to do this was to checksum the results of a query. You can read the specification of exactly how we plan to do this if you want, but I’ll just give you the short version here: use a subquery. But some queries cannot be put into a subquery without causing errors.

Here’s an example:

mysql> select a, a from t;
+------+------+
| a    | a    |
+------+------+
|    1 |    1 | 
+------+------+

If you wrap this query into a subquery, you will get an error:

mysql> select * from (select a, a from t) as a;
ERROR 1060 (42S21): Duplicate column name 'a'

Of course the problem is that the innermost query is actually invalid in a relational sense. Once again, it comes back to the fact that SQL doesn’t keep you from doing things that make no sense.

The exact query that we were seeing on our client was a little bit more subtle, but it’s still a fairly common pattern that I have seen in the real world:

select a.*, [other stuff], a.column1 from ...

This type of query should be catchable by mk-query-audit, when we write it. Oooh — another flashlight to find hairy things with wobbly eyes under your fridge!

Written by Xaprb

August 7th, 2009 at 5:39 am

Posted in Maatkit,SQL

3 Responses to 'Finding queries with duplicate columns'

Subscribe to comments with RSS

  1. Baron,

    The problem you describe can also rise from normal SQL queries, for example (using ‘world’ database):

    mysql> select * from (select * from Country join City on (Country.code = City.CountryCode)) as sel1;
    ERROR 1060 (42S21): Duplicate column name ‘Name’

    It just happens that both tables have a ‘Name’ column. This is still fine, since with SQL I can always say ‘SELECT City.Name’ explicitly.

    So I’m not sure this necessarily falls under “things that make no sense”. Rather, perhaps the “SELECT * FROM (…inner query…) is the problematic part, whereas it cannot be guaranteed to succeed.

    It is questionable if every single column should be aliased; but even then collisions may occur.

    Regards,
    Shlomi

    Shlomi Noach

    8 Aug 09 at 9:02 am

  2. I said it makes no sense because you can’t tell one from the other. What happens if you fetch a row from the result into a Perl hash or a Python dictionary…? What if you wanted to get at the data in one of the Name columns but not the other (assuming their contents differ)?

    The relational model is saner here IMO: everything is sets, and sets don’t allow duplicates. “If something is true, saying it twice doesn’t make it more true.”

    Xaprb

    8 Aug 09 at 1:44 pm

  3. But in many cases I don’t fetch results automatically into some hash or dictionary, but rather do rs.getString(“Country.Name”); or rs.getString(“City.Name”); (e.g. in Java).
    ORM Frameworks also solve this problem.

    Therefore, while I’m using fully qualified column names, there are no duplicates. The relational model stays sane.

    What I’m saying is that it’s a very common case. How many tables just have “id” for AUTO_INCREMENT? It’s so common and I’m not sure it doesn’t make sense.

    Shlomi Noach

    8 Aug 09 at 2:08 pm

Leave a Reply