How to write flexible INSERT and UPDATE statements in MySQL

MySQL provides several variations on INSERT and UPDATE to allow inserting and updating exactly the desired data. These features provide a lot of power and flexibility, making MySQL significantly more capable than it otherwise might be. In this article I’ll give an overview of each feature, help you understand how to choose among them, and point out some things to watch out for.

Setup

I am using MySQL 4.1.15 to create my examples. I assume MyISAM tables without support for transactions, with the following sample data:

create table t1 (
    a int not null primary key,
    b int not null,
    c int not null
) type=MyISAM;

create table t2 (
    d int not null primary key,
    e int not null,
    f int not null
) type=MyISAM;

insert into t1 (a, b, c) values
    (1, 2, 3),
    (2, 4, 6),
    (3, 6, 9);

insert into t2 (d, e, f) values
    (1, 1, 1),
    (4, 4, 4),
    (5, 5, 5);

Overview

Suppose I wish to insert the data from t2 into t1. This data would violate the primary key (a row exists where column a is 1) so the insert will fail: ERROR 1062 (23000): Duplicate entry '1' for key 1. Recall that in MySQL, a primary key is simply a unique index named PRIMARY. Any data that violates any unique index will cause the same problem.

This situation occurs frequently. For example, I might export some data to a spreadsheet, send it to a client, and the client might update or add some data and return the spreadsheet to me. That’s a terrible way to update data, but for various reasons, I’m sure many readers have found themselves in a similar situation. It happens a lot when I’m working with a client who has multiple versions of data in different spreadsheets, and I’m tasked with tidying it all up, standardizing formatting and importing it into a relational database. I have to start with one spreadsheet, then insert and/or update the differences from the others.

What I want to do is either insert only the new rows, or insert the new rows and update the changed rows (depending on the scenario). There are several ways to accomplish both tasks.

Inserting only new rows

If I want to insert only the rows that will not violate the unique index, I can:

Inserting new rows and updating existing rows

Now suppose I want to insert new rows and update existing rows. Again, there are several ways to do it, in one or two steps:

Complexities, incompatibilities, and bugs

If the queries above look like perfect solutions, don’t be fooled. I chose my data and table structures to demonstrate successful scenarios. There are many ways things can fail or be confusing:

Choosing a technique

Which method to use largely depends on requirements. If the software must support multiple database backends or versions, perhaps the generic, standard queries are the best bet. If performance is the goal and the queries don’t need to be portable, I see no reason not to use the solution that performs best. If the software must support multiple database backends and performance is critical, there’s probably no way to avoid writing different queries for each supported backend.

I belive fully portable or “platform-independent” SQL is mostly a myth. Writing generic “standard” SQL to the lowest common denominator almost certainly results in under-utilizing the RDBMS’s abilities. Getting the most from my software is more important than dreaming of “platform-independent” queries.

Summary

Sometimes a proprietary extension to standards provides something unavailable by any other means. In this article I have discussed several ways to use such non-standard extensions in MySQL for performance and convenience. Divergence from standards is a double-edged sword. Not only does it potentially make code non-portable, it can encourage mediocrity by teaching bad habits instead of teaching people the “right” way to do things. For example, updating multiple tables in a single statement, or inserting and updating at the same time, are definitely strange and ugly things to do. As in life, the most important thing is to find a good balance and determine which criteria really matter.

If this article was useful to you, you should subscribe to stay current with my upcoming articles. It’s free and convenient.

Edit 2006-02-26 I forgot to cross-reference INSERT IF NOT EXISTS queries in MySQL, a related post where I explain some variations on a particular scenario – for example, where there is no unique index on the column you want to avoid duplicating, or you want to allow only n duplicates.


Comments