The power of a good SQL naming convention
At my previous employer, one of the early decisions that had huge payoffs later was the SQL naming conventions. A good naming convention is more than just a nicety. It lets you write programs that don’t need to be told about the relationships among tables and columns.
There are many ways to do this, I think. But in practice, I think I’ve seen only a few customer systems that have a completely consistent, logical naming convention. And there are levels of convenience; some systems have a couple extra rules that make a big difference.
In this post I’ll explain the components of my ideal naming conventions.
- Consistent letter case. I prefer all lowercase for readability and type-ability. Regardless, you need the same case for both table and column names, unless your chosen programming language is case-insensitive (and if it is, you should be flogged for using it).
- This isn’t mandatory, but word separators (underscores) are pretty nice. If you run things together, you can’t tell the difference between “API rate sheet” and “a pirate sheet.” Yeah, that’s a real example.
- If a table has a single-column primary key, such as an auto-increment column, then that column is named the same thing as the table. For example, the
usertable’s primary key is calleduser. Naming itidoruser_idor anything else may be logical and consistent too, but in my experience it leads to a lot more code. - If a column expresses a relationship among tables, name it the same as the related table. For example, a table of blog posts should have a column called
userwhich is a foreign key to theusertable. - Singular. Both table and column names are singular. Plurals add a ton of complexity, and defeat the niceness of naming columns and tables the same thing. Fooling around with conversions between plural and singular (goose/geese, moose/moose, cow/cattle) is a waste of synapses and code.
Sakila’s convention
That’s not a lot of rules, is it? Let’s see how the Sakila sample database would fare if these rules were applied to it. Two of the core tables are actor and film, with the “acted in” relationship expressed in the film_actor table. The tables look like this (simplified):
CREATE TABLE actor ( actor_id smallint unsigned NOT NULL auto_increment, first_name varchar(45) NOT NULL, last_name varchar(45) NOT NULL, PRIMARY KEY (actor_id) ); CREATE TABLE film ( film_id smallint unsigned NOT NULL auto_increment, title varchar(255) NOT NULL, description text, ... other columns ... PRIMARY KEY(film_id) ); CREATE TABLE film_actor ( actor_id smallint unsigned NOT NULL, film_id smallint unsigned NOT NULL, PRIMARY KEY (actor_id,film_id) );
What’s right about this
This is already a pretty nice convention. For example, tables are singular, and the columns that have the same meaning have the same name everywhere. This means you can write
select actor.first_name from actor inner join film_actor using(actor_id) inner join film using(film_id);
The ability to use the USING keyword in a join is one way to test whether your naming convention makes sense. If you had gone with the “every primary key is named id, and foreign keys are named [table]_id” convention that’s pretty common, you’d have to write
from actor inner join film_actor on actor.id = film_actor.actor_id inner join film on film.id = film_actor.film_id;
This is not nearly as elegant. So Sakila’s naming convention is pretty nice already.
What I’d change about Sakila
If I had designed Sakila, I’d have done this:
CREATE TABLE actor ( actor smallint unsigned NOT NULL auto_increment, first_name varchar(45) NOT NULL, last_name varchar(45) NOT NULL, PRIMARY KEY (actor) ); CREATE TABLE film ( film smallint unsigned NOT NULL auto_increment, title varchar(255) NOT NULL, description text, ... other columns ... PRIMARY KEY(film) ); CREATE TABLE cast ( actor smallint unsigned NOT NULL, film smallint unsigned NOT NULL, PRIMARY KEY (actor, film) );
It’s not a dramatic change in this case, and it doesn’t really simplify the example queries a lot, but consider what happens when you write an ORM on top of this simplified naming convention.
As an example, suppose your database has accounts that belong to clients, each of which is managed by a single employee. Look at the following code snippet:
$acc = new Account($account_no);
$email = new Email();
$email->to($acc->client->employee->email);
$email->body("Account $acc for client $acc->client is expired");
$email->send();
If the table and column names match, such an ORM is really easy to build. If they don’t, there’s more code to write.
It’s hard to estimate the reduction in lines of code, tests, and mistakes, but I think it’s pretty significant; some five-line programs I’ve written might have needed thousands of lines of code without the naming conventions, and I’m sure a lot of code would have needed supporting meta-data tables to define the mappings between different types of data.
Summary
Here are a couple of concrete ideas. With the conventions I’ve shown, it’s easy to write a simple recursive program that can examine your entire database for data consistency, based only on naming conventions. And you can easily write a program to dump an account and all its related data (client, employee, and so on) for such purposes as migrating a client to a new shard or creating a dataset for a test suite.
There are many good ways to do this, and your favorite method probably has lots to recommend it. But after having worked with lots of such systems myself (including one company who mandated that column names had to be globally unique, which was horrible), I still haven’t seen anything better than the simple conventions I’ve described above. It’s kind of a reductionistic “let’s make this absolutely as simple as possible” philosophy, and it really pays off.

Hi Baron,
nice post. I agree completely w/r/t lower case, separators and singulars - all from a practical perspective.
I am less enthused about your choice to call single fk columns after the table name and your preference for the USING syntax in join conditions.
My gripe with it is that it does not deal with things like film.language_id and film.original_language_id. That is, we can write one join with USING, but we have to resort to ON anyway for the other case, and I feel similar problems should always be solved in a similar way, meaning I won’t use USING if I have to use ON anyway.
So, I guess I’m saying, what is your recipe for cases like this (that is, table references same parent table more than once over different relationships)?
kind regards,
Roland Bouman
Roland Bouman
26 Oct 08 at 5:34 pm
I’d like to take a turn at nitpicking this list as well :-)
First, wrt to renaming the film_actor table as cast; while this is an interesting idea, I’ve always found that naming these mapping tables explicitly after the columns *in order* has eliminated a lot of confusion for people working with the table, especially when your columns don’t contain real information (ie. id numbers). So in my revised sakila database, I’d have called it “actor_film”, and developers could just *know* the actor_id came first and film_id was second.
The other thing I generally recommend is to never name database objects after SQL Keywords. I know some people try to avoid reserved words, but since words seem to jump between the reserved and non-reserved, I just try and avoid them altogether. Actually this is a corollary to another rule of mine, which is if you have to quote your object name, you’ve named it incorrectly.
Robert Treat
27 Oct 08 at 1:25 am
Hi Roland. Your point is good. There are always cases like that. I prefer to optimize for the common case. In the systems I have experience with, coding a few exceptions into a table or other globally accessible mapping worked well, and then 95% of queries were easy to write with USING(). I think it’s easiest to memorize the exceptions.
Robert, yeah, I kind of threw in the “cast” thing but didn’t explain why. I forgot it’s a keyword. My point here is that it’s best to name things after the concept they represent, when possible. I’ve seriously seen tables named things like “actor_to_film_actor_to_film_film_actor_….” you get the point. I think some of those table names basically used an entire line in the text editor, and of course it was impossible to tell what they really meant. Names like “attendee” instead of “meeting_person” can clarify things a lot, and these “x_y” tables often reflect a lack of thought. In this case I actually agree with you, more than I agree with myself. film_actor (or actor_film) is fine.
Xaprb
27 Oct 08 at 4:37 am
I think the example is limited in scope. While it works fine for a two table database, there’s still some “holes” in understanding the full concept here, which thankfully can be better explained through an example.
Can we see what the layout looks like if we add the following tables: producer, location, director.
I ask because I’m a bit confused as to what the nature of the last table is: cast. How would an insert statement work if you add an actor to a film? You have to insert into two tables?
I’m not new to databases, just a bit new to this idea. Thanks!
Zorg
27 Oct 08 at 10:59 am
I agree with most of the above. Personally, however I think surrogate keys should always be called ‘id’, and foreign keys ‘_id’.
Ie ‘user.id’ and ‘group.user_id’.
Constructs like ‘user.user’ feel clumsy, and uninformative - i find ‘user.id’ much friendlier on the eye.
It gives an obvious seperation of primary/foreign keys, and also allows things like ‘user_id=user.id’, which i think is clearer than ‘user.user=group.user’.
It’s also very easy to create join clauses using copy+paste, which is not to be underestimated when you have table names like ‘user_transaction_attribute_value_lookup_archive_20070901′!!!
(also, imagine a join clause on the above table where pkey column name is the same as the table name!)
There is so much personal taste in this sort of thing - FWIW, this is mine.
GBA
27 Oct 08 at 12:42 pm
What are your thoughts on using camelCase?
I used to separate with underscores, but have now changed to camelCase. I prefer them. Maybe it’s because you can see slightly more at a time with camelCase as opposed to using_underscores.
Do you think one is better, or is it just up to the developer as long as they stick with the convention?
alex
29 Oct 08 at 1:08 am
I am using camelCase instad of underscores, they seem to be friendlier than using underscores. Futher Some of my Module wise tables are being named separately as Conf_WhatEverTableName…HR_WhateverTableName,
Can anyone suggest whether any alternative for inculding module names in the tables…
Jag
31 Oct 08 at 1:36 am
[...] Baron has a great post on naming conventions for your schema. [...]
Log Buffer #121: a Carnival of the Vanities for DBAs
31 Oct 08 at 12:01 pm
[...] in Santa Clara, CA, USA has extended its deadline for topic proposals. Baron has a great post on naming conventions for your schema. In the how-to department, Falko Timme has a good tutorial on how to set up GreenSQL to protect [...]
Diamond Notes » Log Buffer #121: a Carnival of the Vanities for DBAs
31 Oct 08 at 1:51 pm
safe article man
totally set me straight on how to name my tables/columns
mate
4 Nov 08 at 10:18 pm
[...] para realizar una buena convención de nombres para los objetos de una base de datos, en Xaprb (en [...]
ArtÃculos destacados de Octubre | cambrico.net
7 Nov 08 at 5:30 am
Spaces, underscores, camelCase, are all nice but you quickly run into problems if non native speakers have to use these. There is always the doubt where these delimiters have or don’t have to go. Though the example of possible confusion leaving them out is valid, the chance of running in to it when naming tables is low more so since the names are not arbitrary but in context.
naming pk and fk basically the same is interesting. (we use id as pk and tablenameid as fk) but it seems to me that it is the argument of the amount of code to be written that is driving your naming convention and that consistency is only your second concern. With the right tools, the amount of code should not be a concern and automated code generation systems should be insensitive to the naming scheme or absence thereof.
esger
7 Nov 08 at 10:18 am
Seriously, table.id leads to “a lot more code” ? Not only is it a lot less code, it actually makes sense (DRY with meaning). Likewise, FKs should be table.foreign_id . Also, keeping everything lowcase (e.g. table, sequence, pk) is good for maximizing ORM compatibility.
alan
9 Nov 08 at 11:55 pm