Xaprb

Stay curious!

My personal SQL coding standards

with 14 comments

Coding standards are a Good Thing. Unfortunately, most organizations don’t have much in the way of SQL coding standards, and there aren’t many automated tools to help. As a result, I believe SQL is often less consistently written than many other languages. Here are my suggestions.

The following are only my opinions.

Indentation

I use indentation as with any other programming language. I prefer to indent the beginning of each clause in a statement equally, because I consider them all equally important; in other words, I don’t consider the ORDER BY clause somehow “inferior to” the WHERE clause. The result looks like this:

select column ...
from ...
where ...
group by ...
order by ...

Some programmers prefer to indent each part of the statement in turn, which leads to code that looks like this:

select column ...
    from ...
        where ...
            group by ...
                order by ...

I find this visually confusing, and it is unmaneagable in larger queries. Another indentation style I really dislike is right-left-justification, thusly:

  select column ...
    from table ...
   where criterion ...
group by groupingclause ...
  having havingclause ...
order by orderingclause ...

I think it’s very hard to read, but more importantly, it’s beastly to maintain. I have better things to do than spend a bunch of time placing my cursor at the beginning of the line, adding spaces to right-justify something, then deleting internal whitespace in the line to left-justify whatever is on the right-hand side of the justification boundary. And as soon as I delete or add another part to the query, I probably have to adjust everything all over again. The following style is almost as hard to maintain:

select      column,
            column2,
            column3
from        table ...
where       criterion ...

It looks great until you need to join tables or have subqueries. Then it’s a disaster. Sometimes it’s not only hard to maintain, it’s hard to read, too. Take a look at this blog entry about modelling hierarchies in SQL, and notice how hard it is to figure out which tables are joined to which others, where the join criteria are, and so forth.

When I need to break lines, I indent one level, like so:

select
    column1,
    column2,
    column3,
    column4,
from ...
where ...
group by ...
order by ...

When I write a clause that is part of another clause, I indent it as well. For example, JOINs are part of the FROM clause, so they should be indented after the first line:

select column ...
from ...
    inner join ...
    inner join ...
    left outer join ...
where ...
group by ...
order by ...

When I write subqueries, I treat the opening and closing parentheses just as opening and closing braces in C, which I do not place on a line by themselves:

select column ...
from ...
    inner join (
        select column ...
        from table ...
    ) as x ...
    inner join ...
    left outer join ...
where ...
group by ...
order by ...

When I write INSERT ... SELECT statements, I indent everything but the first clause:

insert into ...
    select ...

Keep lines short

I try to keep lines short enough that I don’t have to scroll horizontally in a non-wrapping editor. This typically means no more than 80 characters wide. When I have to break a line, I break before operators (logical operators, such as AND, or mathematical operators like +). I indent each broken line one level from the first line of the broken clause, as I explained above.

Commas are not operators, and in my opinion belong at the end of the line. I think the following is very hard to read and maintain:

select
     col1
    ,col2
    ,col3 ...

The reasons why operators, but not commas, should go at the beginning of the line are well-discussed in the literature of other programming languages. The goals are immediate comprehension when reading code, and ease of maintenance.

Uppercase or lowercase?

SQL is a case-insensitive language. There are two parts of the language to worry about: keywords and object names. I lowercase all keywords in queries, but I usually uppercase them in my writing, such as when I mention a SELECT statement in a sentence. Why? Simple: uppercasing them in normal writing makes them stand out, and since they’re not surrounded by uppercased text, the shouting is short-lived. I lowercase keywords in queries, because

  • it is much easier to read mixed-case letters with ascenders and descenders (this has been proven time and time again in the laboratory)
  • it’s a lot easier to type and time is precious, as are my tendons
  • any decent SQL editor highlights keywords anyway, making them stand out visually and obviating the need to uppercase them

As for object names, I type them in the correct case. If the object is named SomeTable, I think it’s sloppy to type it as sometable or SOMETABLE. Uniformity is important for readability and maintainability, and typing an object name in its proper case is following the obvious leader. Some platforms, such as MySQL, are case-sensitive in this area anyway. This lack of uniformity is why I detest case-insensitive languages, and try to program as though they’re case-sensitive anyway.

Object naming standards

Tables, columns, views, stored procedures, indexes, and so forth can really benefit from naming standards. Here are my suggestions:

  • Tables should have singular names. In my experience, it makes names of related objects much easier to manage. You might not think so at first, but I’ve seen more than one organization change their mind after it gets really messy and switch from plural to singular. This point, however, is probably the biggest single invitation to a flame war in coding standards.
  • Choose a naming convention for indexes and foreign keys, such as FK_child_parent and IX_table_col.
  • Don’t run words together in names. Either use MixedCapitalization to separate them, or use underscores_to_separate_words. All-lowercase words mashed together are hard to both read and type, and make it too easy to make typos. Here are some examples of names (in this case, column names) that are hard to read: emailwhenrowsexceed, clientsteamleader, apiratesheet.
  • Name things what they are, not what they do. I once redesigned a schema with a table naming standard that encouraged naming tables in a “ThisToThat” fashion. For example, one of the table names was something like “ProdGroupToSuperProdGroupAssociationToProdMenuCategoryAssociation.” Not only was that a super-long, confusing, meaningless name, it turned out the table had only one column not involved in relationships to other tables, called “CategoryOrder.” This column recorded the preferred ordering of a ProdGroup’s Category records. There was a default ordering, and this table had records only when the default ordering needed to be overriden. Therefore, the table recorded category ordering, and I renamed it to something more sensible, like CategoryOrdering.
  • Long is not good, but short is even worse. Don’t kid yourself — when you come back 6 months later to maintain the code you thought was crystal-clear, you’ll no longer have any idea what srdvpls4tp is (standard_deviation_plus_4_time_period).
  • Abbreviations are only good when they’re so overwhelmingly common they’re part of everyday English, in my opinion. At the risk of picking on my current employer, costrecon is a poor name because “recon” is a more common abbreviation for “reconnaissance” than “reconciliation” (the intended meaning). I can never remember what the “other recon word” is. It is literally an effort for me to think of “reconciliation.”
  • I hate Hungarian naming schemes, and all variations thereof. They just make everything harder to read and write, and they are like comments — they’re just a maintenance hazard.

I see I’m starting to go on longer than I want to. Much has been written about naming in other languages, and it all applies to SQL, so I’ll leave it at that. Good naming is good naming.

Write out the full syntax

It’s much clearer to write the full syntax of the language than to use shortcuts. For example, I always explicitly state whether a JOIN is a FULL, LEFT, or INNER join.

Likewise, I always use the AS keyword in aliases. Including it draws attention to the fact that something is being renamed, and omitting it can cause some subtle bugs. If it’s omitted, the difference between an alias and the next element in a list is a comma — a small, easy-to-miss character. For instance,

select a b, c, d, e
from table1 table2

That’s just hard to read. Look again — it’s not doing what you might think. Did you find both bugs?

Always insert into named columns

You’ll see me do it in this blog for brevity, but in the real world I never do blind (anonymous) inserts. It’s hard to figure out what data goes where, and if a table structure changes, blind inserts will break — possibly badly, but even worse, possibly not badly, in which case inserts will succeed and corrupt the data data, instead of failing and at least preserving data integrity.

This is an anonymous insert:

insert into t1 values (1, 5);

This is better, because it uses named columns:

insert into t1(c2, c5) values (1, 5)

Joins

Old-style joins, where the tables are named in the FROM clause but no join clauses are given, are hard to read, write and maintain. Since the join criteria end up in the WHERE clause, it’s hard to tell what’s a join criterion and what is for eliminating unwanted rows. I never use old-style joins.

I never use RIGHT OUTER joins. They can always be rewritten as LEFT OUTER joins, which people expect and can understand more readily.

It really helps improve readability if the order of the ON statements in join clauses is consistent. For example,

...
from table1
    inner join table2 on table1.a = table2.a
    inner join table3 on table1.b = table3.b
...

This is a lot easier to understand than

...
from table1
    inner join table2 on table1.a = table2.a
    inner join table3 on table3.b = table1.b
...

I think it’s important to fully qualify tables in joins too, even if the columns aren’t ambiguous. It lets me know where the data is coming from without forcing me to go back and forth from the query to the table structure. I try to qualify everywhere I refer to a column, not just in one section of a query, when more than one table is involved in the query.

Fully qualifying also future-proofs the join. If a column name is unambiguous because it only appears in one table, but then someone adds a column with the same name to another table, the query will cause an error. And it may be hard to figure out how to fix the query, if you don’t know which table had the column originally and which it just got added to.

USING clauses can greatly simplify and clarify joins, but may not behave the same as standard ON clauses on a given platform, even between different versions (as in MySQL 4 and 5). I think they’re good to use when they help, but only when the statement is fully qualified, so there’s no ambiguity. And I avoid NATURAL JOIN, because it’s not future-proof if someone adds columns to the tables involved. I think a NATURAL JOIN is the JOIN equivalent to a blind insert, which I discussed above.

Grouping and ordering

SQL allows referring to columns by number, but then you have to do a bunch of mental cross-references between the GROUP BY and ORDER BY clauses to figure out how the data is sorted and ordered:

select c1, c2, sum(c3), max(c4)
from t1
group by 1, 2
order by 2, 4

is much less readable and maintainable than

select c1, c2, sum(c3), max(c4)
from t1
group by c1, c2
order by c2, max(c4)

I continue to refer people to this post on how to group data correctly in SQL. It is a weak point for many programmers.

Tools

A lot of my opinions on coding standards are informed by great books like Perl Best Practices.

Other programming languages have great tools to help maintain consistent coding style (such as perltidy or indent), but I can’t find the equivalent in SQL. There is one online SQL formatter, but I don’t think it does a very good job. If I’m faced with catastrophically ugly code, I suppose it’s a place to start, but I wouldn’t use it to enforce a coding standard.

Written by Xaprb

April 26th, 2006 at 10:00 pm

Posted in SQL

14 Responses to 'My personal SQL coding standards'

Subscribe to comments with RSS or TrackBack to 'My personal SQL coding standards'.

  1. I agree with your style. Except I tend to use UPPER_CASE for my object names since that would jive with what the DBA sees. I still use lowercase for my keywords like “select” though.

    I try to avoid this sort of hassle by using Hibernate nowadays though.

  2. Yu Tang has translated this article into Japanese and published it on his website.

    Xaprb

    9 Aug 06 at 9:53 am

  3. Very much like, and agree with, your thinking on this (other than the fact that I usually capitalise keywords since, in my world, chunks of SQL can often get embedded in a string whereby the editor won’t colour them).

    It’s odd that a SQL coding standard seems to be such an unknown concept, whereas in fact the looseness of the language (case insensitivity etc) demands it all the more.

    I put together my own thoughts on this a while ago (at http://www.cslacey.co.uk/TSQLCodingStandards.pdf ). Probably don’t cover as much ground as you, but of those things I do mention, I agree with you almost entirely.

    Chris Lacey

    12 Sep 06 at 5:38 am

  4. Hi to all of you guys!

    I definitely agree on some of your ideas..

    btw, this are my ideas on some of your topics..

    select
         col1
         ,col2
         ,col3 …
    

    i can say that this coding isn’t hard to maintain and and read.

    the columns can be easily spotted and the great thing about

    this is if you need to delete a column(specially the last of the select list),

    you don’t need to look for the comma b4 that column.

    “… from table1 inner join table2 on table1.a = table2.a
    inner join table3 on table3.b = table1.b …” 

    and for this one, if i am correct, this coding is aimed for performance..

    if the left table has the foreign key and the right table has the primary

    key, the matching will be more faster than the way around because the
    value that will be used for equality is a primary key. well, that was
    all i had in my mind. if i am wrong or whatsoever, i’ll be happy to be
    corrected. anyway, great thanks to you, Xaprb, for providing this
    great topic! :)

    Airoso

    8 Jan 07 at 10:06 pm

  5. Hi,
    I tried to implement your ideas in this free online (and desktop) SQL Formatter / Beautifier: http://www.sqlinform.com
    Hope this is is of interest
    GuidoMArcel

    GuidoMArcel

    30 Aug 07 at 5:02 am

  6. Thanks for the ideas!

    Anthony Altemara

    30 Mar 08 at 3:53 pm

  7. Hi,

    I’m just preparing to write a SQL style guide for my company so I came across your article. I like most of it though of course there are details that differ from my style.

    But I think the main point here is: No matter how your style guide looks in detail, the important thing is that you have one at all!

    Even if you’re coding alone at home and private it is sometimes hard to remember your style (how did I do this last time…), but if your coding in a company and there are different people working on the same statements it’s just essential.

    Finally, almost 10 years after “inventing” it, I will for the first time ever publicly share my ultra special kind of where clause *tataa!*

    As I definetly HATE this one-row-is-different-from-every-other-row-issue I eventually started writing something like this:

    select
    id,
    firstname,
    lastname
    from
    employees
    where 1=1
    and id > 100
    and department = ‘Development’

    (And normally I would also put the “>” in the same column as the “=”. I like building blocks, in my opinion it increases readability tremendously.)

    I would be interested to hear what you think about this one.

    And if anyone has a good idea to handle the same problem in the select clause (for a time I used to just add a ‘x’ for the last column, but I don’t really like it, because it changes the result) I would also be very interested.

    Best regards!

    Marcus

    12 Jan 09 at 12:49 pm

  8. Ok, HTML cropped my spaces. Originally, the statement should look like this

    select
    ….id,
    ….firstname,
    ….lastname
    from
    ….employees
    where 1=1
    ….and id > 100
    ….and department = ‘Development’

    Marcus

    12 Jan 09 at 12:52 pm

  9. Marcus – yep I definitely like (and have been using for many years) the WHERE 1=1 idea, so that all following conditions can start with AND.

    I don’t tend to use it when I have a fixed query that never changes (just because it looks a bit silly I suppose), but it’s proved invaluable when building up dynamic queries based on multiple filters that the user may have switched on or switched off.

    Chris Lacey

    14 Jan 09 at 12:41 pm

  10. Hey, nice syntax. I’m currently building a page for my band but I wanted my code to be nice and easy to read, so I started writing a “Coding standards” document (though I’m the only one develpoing the site, LOL). I like the style you use for SQL. I’m gonna put you and this page in the credits ;)

    David Medina

    30 May 09 at 11:26 pm

  11. One highlighting approach I never sow in SQL editors is to mark the extent and scope of a sub-query.
    It would be nice to immediatly visualize where a query starts and where it ends and what is it alias.
    With a convoluted and long querry, replate with joins and three or more levels of sub-queries, it is not that easy to visualize…

    Meir

    8 Oct 09 at 7:16 am

  12. Meir, that’s a great idea. I would appreciate that too.

    Xaprb

    8 Oct 09 at 8:25 am

  13. Hi Meir,

    that’s indeed an excellent idea, in my career I’ve also had to face “some” statements that where a few hundred lines and had a lot of subqueries in every flavour…

    If any SQL editor developer reads this: do it!

    And if you’re alread implementing this, what about nested function calls? There were numerous times in my life when I spend endless minutes analyzing the structure of some DECODE … SUBSTRING … INSTRING … LENGTH … -1 … SUBSTRING SUBSTRING SUBSTRING constructs that were just driving me crazy.

    Code folding would also be nice there…

    ;-)

    Marcus

    8 Oct 09 at 2:54 pm

  14. Thanks everyone for your encouragement!
    I wonder if any of you are using the MySQL Workbench?
    (I read its documentation but setting it up and importing the DB schema intimidated me so I passed. The GUI Query Browser worked so far fine for me, thank you! Do you think however that should I try it?)
    Anyway, does the WB query editor highlight it so? If not, anyone who is a WB user, please post a request for this type of highlight. I personally don’t think I should if I am not at least a newbie user.
    Thanks!
    Meir

    Meir

    8 Oct 09 at 5:19 pm

Leave a Reply