How to write flexible INSERT and UPDATE statements in MySQL
MySQL provides several variations on INSERT and UPDATE to allow inserting and updating exactly the desired data. These features provide a lot of power and flexibility, making MySQL significantly more capable than it otherwise might be. In this article I’ll give an overview of each feature, help you understand how to choose among them, and point out some things to watch out for.
Setup
I am using MySQL 4.1.15 to create my examples. I assume MyISAM tables without support for transactions, with the following sample data:
create table t1 (
a int not null primary key,
b int not null,
c int not null
) type=MyISAM;
create table t2 (
d int not null primary key,
e int not null,
f int not null
) type=MyISAM;
insert into t1 (a, b, c) values
(1, 2, 3),
(2, 4, 6),
(3, 6, 9);
insert into t2 (d, e, f) values
(1, 1, 1),
(4, 4, 4),
(5, 5, 5);
Overview
Suppose I wish to insert the data from t2 into t1. This data would violate the primary key (a row exists where column a is 1) so the insert will fail: ERROR 1062 (23000): Duplicate entry '1' for key 1. Recall that in MySQL, a primary key is simply a unique index named PRIMARY. Any data that violates any unique index will cause the same problem.
This situation occurs frequently. For example, I might export some data to a spreadsheet, send it to a client, and the client might update or add some data and return the spreadsheet to me. That’s a terrible way to update data, but for various reasons, I’m sure many readers have found themselves in a similar situation. It happens a lot when I’m working with a client who has multiple versions of data in different spreadsheets, and I’m tasked with tidying it all up, standardizing formatting and importing it into a relational database. I have to start with one spreadsheet, then insert and/or update the differences from the others.
What I want to do is either insert only the new rows, or insert the new rows and update the changed rows (depending on the scenario). There are several ways to accomplish both tasks.
Inserting only new rows
If I want to insert only the rows that will not violate the unique index, I can:
-
Delete duplicate rows from
t2and insert everything that remains:delete t2 from t2 inner join t1 on a = d; insert into t1 select * from t2;
The first statement deletes the first row from t2; the second inserts the remaining two. The disadvantage of this approach is that it’s not transactional, since the tables are MyISAM and there are two statements. This may not be an issue if nothing else is altering either table at the same time. Another disadvantage is that I just deleted some data I might want in subsequent queries.
-
Use an exclusion join. This query should work on any RDBMS that supports
LEFT OUTER JOIN:insert into t1 (a, b, c) select l.d, l.e, l.f from t2 as l left outer join t1 as r on l.d = r.a where r.a is null;The downside to this method may be lower efficiency, depending on the data and how complex the join needs to be. The join is done up front, which can be a lot of work on large datasets, especially when only a few duplicate rows might exist.
-
Use
INSERT IGNOREto ignore the duplicate rows:insert ignore into t1 select * from t2;
The duplicate rows are ignored. Note that MySQL does not do a generic “duplicate-check” to see if the rows contain duplicate values when determining if a row is a duplicate and should be ignored. It only ignores rows that violate a unique index. If I have no unique index on a column,
IGNOREhas no effect.When using
IGNORE, MySQL prints information about duplicates:Query OK, 2 rows affected (0.02 sec) Records: 3 Duplicates: 1 Warnings: 0
This method is probably the fastest of all, especially if very few duplicate keys exist in
t2. MySQL simply tries to insert every row and keeps going when one fails. The disadvantage is thatIGNOREis a proprietary, non-standard extension.
Inserting new rows and updating existing rows
Now suppose I want to insert new rows and update existing rows. Again, there are several ways to do it, in one or two steps:
-
Use standard SQL in a two-step process to insert new rows and update existing rows. The following is one way to do it, but it’s not the best way:
insert into t1 (a, b, c) select l.d, l.e, l.f from t2 as l left outer join t1 as r on l.d = r.a where r.a is null; update t1 as l inner join t2 as r on l.a = r.d set l.b = r.e, l.c = r.f;The benefit to this approach is standards compliance. This should work on a wide variety of database platforms.
The downside is poor efficiency. Imagine the datasets are huge and there are only a few duplicate rows. The first statement inserts the (huge number of) new rows by joining the two huge datasets together. The next statement joins them together again, except this time the join is even bigger because of all the new rows in
t1! And worse yet, it updates the rows that just got inserted, which is certainly not needed. It is far better to do the update first, which should only affect a few rows, then insert the new rows:update t1 as l inner join t2 as r on l.a = r.d set l.b = r.e, l.c = r.f; insert into t1 (a, b, c) select l.d, l.e, l.f from t2 as l left outer join t1 as r on l.d = r.a where r.a is null;This is far more efficient, but it still might be very bad. It could lock the tables for a long time with large datasets, and like all two-step processes, it is not transactional.
-
Use non-standard MySQL extensions to make the two-step process more efficient. MySQL allows multiple-table updates, which can be used to mark which rows are duplicates during the
UPDATE, eliminating the need for an exclusion join in theINSERT. To accomplish this,t2needs a new column to record its “status,” which I will calldone.alter table t2 add done tinyint null; update t1 inner join t2 on t1.a = t2.d set t1.b = t2.e, t1.c = t2.f, t2.done = 1; insert into t1 (a, b, c) select d, e, f from t2 where done is null;This can be significantly more efficient on the large datasets I’ve been imagining. The downside to this approach is non-portability to other database platforms.
-
Use MySQL’s non-standard
ON DUPLICATE KEY UPDATEextension to accomplish the insert and update in a single step. As with the non-standardINSERT IGNOREabove, this is probably the fastest method:insert into t1(a, b, c) select d, e, f from t2 on duplicate key update b = e, c = f;There are other ways to write this statement, for example using the
VALUESfunction, which can help simplify complex queries by referring to the value which would have been inserted into the given column:insert into t1(a, b, c) select d, e, f from t2 on duplicate key update b = values(b), c = values(c);The disadvantage to this approach is lack of portability, of course. Inserting and updating in a single statement is highly non-standard.
Complexities, incompatibilities, and bugs
If the queries above look like perfect solutions, don’t be fooled. I chose my data and table structures to demonstrate successful scenarios. There are many ways things can fail or be confusing:
- Some versions of MySQL simply don’t support the features I discussed above. Check the MySQL documentation for details about which features are available on any given version.
- Some versions of MySQL have bugs that involve the above types of queries. For example, version 4.1.7 doesn’t like queries of the form
INSERT... SELECT... ON DUPLICATE KEY UPDATEat all. It will complain about a syntax error. It allows inserting literal values, but not the results of aSELECTstatement. In these buggy versions, the MySQL features may be available but not usable. Some versions of MySQL get confused by the
VALUESsyntax I demonstrated above. If your source and destination tables have similar column names, you may have this problem. For example, in MySQL 5.0.15-log with tables that have the same column names,insert into t1 (a, b, c) select a, b, c from t2 on duplicate key update b = values(b); ERROR 1052 (23000): Column 'b' in field list is ambiguous- Multi-table updates can be tricky if there is ambiguity in column names. Using an alias, such as ‘r’ for ‘right’ and ‘l’ for ‘left’ as I did above, can help.
- MySQL reports values for the number of rows affected. When an operation affects rows in multiple tables, or when a duplicate row causes an update to existing values, the rows-affected statistics change in odd ways. The MySQL manual explains how this works, so I don’t want to go into it; I just want to point out that you should expect odd values.
- The above techniques all assume
t2contains no duplicate values ofd, which is enforced in my examples by the primary key on that column. If this is not the case, it becomes significantly more difficult to write the queries. It may be best to clean upt2and create a primary key on the appropriate column(s), to avoid these problems. - I have caused crashes and binary log corruption in MySQL 4.1.7 with a combination of temporary tables, subqueries, and
ON DUPLICATE KEY UPDATE. I think this is a bug with this specific version, but I suggest testing everything on a non-production database server, even if it seems harmless. The statements that caused crashes for me seemed very innocuous.
Choosing a technique
Which method to use largely depends on requirements. If the software must support multiple database backends or versions, perhaps the generic, standard queries are the best bet. If performance is the goal and the queries don’t need to be portable, I see no reason not to use the solution that performs best. If the software must support multiple database backends and performance is critical, there’s probably no way to avoid writing different queries for each supported backend.
I belive fully portable or “platform-independent” SQL is mostly a myth. Writing generic “standard” SQL to the lowest common denominator almost certainly results in under-utilizing the RDBMS’s abilities. Getting the most from my software is more important than dreaming of “platform-independent” queries.
Summary
Sometimes a proprietary extension to standards provides something unavailable by any other means. In this article I have discussed several ways to use such non-standard extensions in MySQL for performance and convenience. Divergence from standards is a double-edged sword. Not only does it potentially make code non-portable, it can encourage mediocrity by teaching bad habits instead of teaching people the “right” way to do things. For example, updating multiple tables in a single statement, or inserting and updating at the same time, are definitely strange and ugly things to do. As in life, the most important thing is to find a good balance and determine which criteria really matter.
If this article was useful to you, you should subscribe to stay current with my upcoming articles. It’s free and convenient.
Edit 2006-02-26 I forgot to cross-reference INSERT IF NOT EXISTS queries in MySQL, a related post where I explain some variations on a particular scenario — for example, where there is no unique index on the column you want to avoid duplicating, or you want to allow only n duplicates.

How would you ensure that an insert statement does not violate the unique index of the table you are inserting in, PLUS is not already contained another table?
I am inserting multiple urls into a table called URL using INSERT IGNORE (multiple urls are being inserted with the one INSERT statement) but I also want to check that url’s don’t already exist in a table called LINKS.
Matt
11 Mar 06 at 3:50 pm
Matt,
It’s hard to say for sure without looking at your exact table schema, but I think exclusion joins might be the only way. Perhaps something like the following would work, assuming you are selecting from a table called SOURCE:
insert into URL (url) select s.url from SOURCE as s left outer join URL as u on s.url = u.url left outer join LINKS as l on s.url = l.url where u.url is null and l.url is null;That’s just a wild guess though.
Xaprb
12 Mar 06 at 11:42 am
I have a special circumstance where a table with over 150 fields needs to be updated periodically. Any of the 150 fields may change in each update, as well as new records created and there is only one primary key for the table.
Inserting the new records is easy, obviously. But I haven’t found a simple way to update all 150 fields with any method available.
What I’d like to be able to do is to update the whole record if the primary key is the same. I don’t need to update only the changed fields (although that would be nice) and I’d rather not list all 150 fields in any of the sections.
The problem so far is that I haven’t found any way to be able to replace the whole field that doesn’t, first, delete the fields from the table. As these tables are huge this could introduce errors if someone is editing the field at the time.
Can you think of any ideas to help here?
Eduo
23 Mar 06 at 8:56 am
Hi Eduo,
Without more details it’s hard to say for sure. I’m confused about a few things. For one, “only one primary key” — by definition there can only be one primary key per table. Also, “… delete the fields from the table…” Are you dropping columns, or did you mean to say delete rows?
If your table is huge (a relative term… gigabytes?) you probably don’t want to do things in bulk. You should probably write a program which updates or inserts one row at a time. Otherwise you’re locking a very large chunk of data for a long time. If you’re doing this in a transaction, that will also keep your log files small; a huge transaction that has to be rolled back may be very resource-hungry.
It is the DB server’s job to coordinate multiple simultaneous requests to read and modify data.
If I know only some columns in a row need to be updated, I’d consider write a program that dynamically generates an UPDATE statement for those columns only, even if I have to do it one row at a time.
Xaprb
23 Mar 06 at 2:24 pm
I need to write a SQL statement which says, add column if it does not exist. Is there a way to do that in MySQL? Something like
alter table x add column yz if not exist...Regards,
Mahesh
Mahesh
6 Apr 06 at 9:13 pm
Not that I know of. The ALTER TABLE syntax doesn’t allow such syntax.
Within a routine, the IF statement is allowed. So, for example,
create table test.t(c1 int); delimiter // create procedure addcol() begin if not exists( select * from information_schema.COLUMNS where COLUMN_NAME='c2' and TABLE_NAME='t' and TABLE_SCHEMA='test') then alter table t add c2 int; end if; end; // delimiter ; call addcol();Xaprb
6 Apr 06 at 9:41 pm
You mention but I am interested in the case where it does.
I have a situation where I have a table, historical_user_prefs, with duplicate records for an email address and an associated user’s preferences. The records represent changes made to the user’s preferences over time. Each record is timestamped. I would like to update another table, user_prefs, containing the latest values of the user’s preferences with information from the from the historical table. I have found that when I try an update using a multi-table join, only data from the first record found in historical_user_prefs is used to update the corresponding record in user_prefs. Could you describe how someone could update t1 in the case where t2 has duplicates?
Jeff
4 May 06 at 6:50 pm
Hi Jeff,
I assume you want to update the destination table from only the newest record in the source table. You’ve run into the problem with multi-table updates: the results are mathematically undefined. There’s more about this in my post on many-to-one problems in SQL. The good news is, you can do what you need.
The post I linked above explains more correct syntaxes, but here’s one way to do it from a multi-table update. You didn’t say exactly how the tables are structured, so I’ll guess:
update user_prefs as u inner join historical_user_prefs as h on u.id = h.user inner join ( select user, max(timestamp) as ts from historical_user_prefs group by user ) as max_ts on h.user = max_ts.user and h.timestamp = max_ts.ts set u.col1 = h.col1, u.col2 = ...In a nutshell, the subquery selects the most recent row for each user, then constrains the historical_user_prefs table to that row. This doesn’t have to work — you might have two rows for the same user with the same timestamp, though I’m guessing you don’t. If you have an AUTO_INCREMENT column on historical_user_prefs, use that instead.
If you don’t have version 4.1 or greater, subqueries won’t work. You could use a temporary table instead:
I hope that helps!
Xaprb
4 May 06 at 9:01 pm
Just a note. The aliasing facility for table names in
UPDATEdoesn’t work in MySQL 3.23.x or earlier, even though it works inSELECTand the claim in the documentation I’ve read online is that theUPDATEsyntax is the same.I had to upgrade to 4.1 on my RHEL 3 server to get the following syntax to work:
UPDATE notified AS link SET link.emailed = NOW( ) WHERE link.emailedCefn Hoile
26 May 06 at 2:01 pm
Here is a stored procedure that accepts a numeric value, which when appended to a string, represents a table to be updated. The table is not predetermined until a numeric value is passed.
CREATE PROCEDURE testproc (IN id int(14)) BEGIN declare vTbl varchar(100); set vTbl = concat('table_',id); update vTbl set ... where ...; ENDmysql does not seem to interpret
vTblas a defined variable, but instead reports: “table vTbl doesn’t exist”.Any suggestions on how I can trick it to accept the variable in the statement? Thanks
Jose
30 Jun 06 at 3:42 pm
Jose, a variable and an identifier are not interchangeable. One way you might accomplish your goal is to use prepared statements. MySQL 5.0 provides support for server-side prepared statements, and they can be used in 5.0.13 for stored procedures, but not stored functions or triggers.
Xaprb
1 Jul 06 at 9:26 am
I have a query with insert ignore into in mysql which I want in sql in a query only but not in pl/sql. Can you please give its answer as soon as possible.
thanks,
Megha
Megha
16 Oct 06 at 8:22 am
Megha, I’m sorry but I don’t understand your question at all. It sounds like you’re asking an Oracle-specific question. This is a MySQL-specific article.
Xaprb
16 Oct 06 at 8:49 am
Hi Xaprb
Great article mate – just got a few questions though:
Im looking to merge two tables in my db (or more specifically, insert one’s values into the other). This post is pretty helpful for how to set up the merge without duplicate records, but what if the column i dont want to duplicate isnt the index?
Let me explain in more detail – in my setup, table A contains spots which may be interesting for travellers in a given city. table B has the same data, but from a different source. (may be worth pointing out that the sources are varied, mostly coming from screen scrapes of other, local directories – which means the structure/primary key of table B can be changed, but not table A) Both will have the same columns (id, name, subtitle, cat_id, loc_id, address, added, by), but its the data in the ‘name’ column that might be identical, and which i dont want repeated.
So if the name of the listing is the same as one thats already in the original table, how can i make it so it isnt inserted? I realise INSERT IGNORE leaves out any records which have a matching primary key, but the current structure of the tables is such that the ‘id’ field is the primary key, not the name- and i dont want to change that because the directory script references the id as the default key
If it helps, i already have a script which uses fuzzy string matching (written in php) using levenstein and metaphone methods. i dont know if its practical at all, but is there maybe a way we can use that to check each item in table B against the records already in table A before its inserted?
look forward to hearing your ideas
PJ
PJ
23 Oct 06 at 10:15 pm
Hi PJ,
This is a pretty common problem. If you can add a UNIQUE index on the listing’s name in table A, you can use the techniques I’ve explained without modification. It sounds like you do want a UNIQUE index on that column to enforce your business rule of “no duplicate names.”
You may not be able to do that. If that’s the case, there are several possibilities. Possibility 1: If you only want to make sure that multiple values from table B don’t get inserted, you can do
INSERT INTO A ... SELECT FROM B GROUP BY B.nameto group the values from table B so there is only a single row per name. I realize this isn’t exactly what you want to do, since you also don’t want to insert anything that’s duplicated in table A.Possibility 2: do it a row at a time, but don’t do two queries (check, insert) per row. Instead, simulate
INSERT IF NOT EXISTS. Then you can write a single query that will insert only if nothing exists already. I wrote an article on how to simulateINSERT IF NOT EXISTSin MySQL.Possibility 3: Do it in an
INSERT... SELECTwith an exclusion self-join (see my article on exclusion joins) to exclude anything that already exists in table A. The structure of this query would look likeYou may also need a
GROUP BYto keep from inserting any rows from B which duplicate other rows in B. In that case, you need a subquery:insert into A(columns) select columns from ( select columns from B left outer join A on B.name = A.name where A.name is null group by B.name ) as subREPLACE,IGNOREand friends are 95% solutions to the common need of “don’t violate indexes”. You’re encountering the other 5%, where your indexes don’t automatically define which rows to insert and which to leave out.Xaprb
23 Oct 06 at 10:42 pm
I have a problem that should have an easy solution, I think, but being very new to Mysql (and SQL in general) the answer escapes me.
I’d like to load data into a table with a schema like:
auto-inc prim key, auto-datestamp, header1, header2, vardata1, vardata2
Loading/inserting is easy. After the initial load, basic general update would be OK – It looks like inserting only non-duplicated rows is easy. What I’d like to do is 1) insert any “new” rows where header1 and/or header2 do not exist yet, 2) insert any rows where header1 and header2 *already* exist, but the data in the vardata fields has changed, and 3) somehow keep track of any new rows, or rows that have changed.
End result is : I’d like to be able to find, in one query or a series of queries, what rows changed between any given dates, as well as be able to create a “current status” showing all the latest rows.
So I don’t want to replace rows that have changed – I wan’t to keep a history of all row changes, in other words.
Am I being clear? Is there a straighforward way to do this?
Thanks for an info!
mcomfort
30 Oct 06 at 6:35 pm
You may want to take a look at the properties of the TIMESTAMP data type, which can auto-update when the row is changed.
Keeping a history of all row changes is non-trivial, depending on what you really need, but my first reaction is “don’t do that, it indicates a design problem.” I say that without knowing anything about what you really need, so I could be wrong. But keeping track of all data changes is almost certainly a recipe for data bloat.
Two suggestions I’d make about it: don’t keep history records in the same table as the rows they record. Keep audit tables that have history on past versions of rows. That way the table that contains the “real” data (i.e. not history, which is meta-data) can have proper indexes and remain small, have good performance, and you can assure data integrity. The second suggestion is, be sure to spend a lot of time learning about data integrity and indexing, especially primary keys and unique indexes.
Xaprb
31 Oct 06 at 9:05 am
The concept of “audit tables” – excellent idea, that had not occurred to me…. (As I think you can tell, I’m new to some of the very basic concepts here.)
I just did a quick google on both audit tables and “unique indexes” & I think I already see how I need to do this – thx for pointing me in the right direction xaprb. I think I had started down the road of stupidity there.
mcomfort
31 Oct 06 at 12:00 pm
No problem :-) One of the best resources available is the MySQL manual itself — it is free and has better information about MySQL and databases in general than most books you can buy.
Xaprb
31 Oct 06 at 12:10 pm
I understand there are a lot of developers working with MySQL, and tips and tricks can be useful to anyone but your article contains some horrible misconceptions and perpetuates some failings I find common among people who work with MySQL and only MySQL.
A perspective on how data integrity works and how other database systems handle it can really be a good thing you know?
1 – Transactions / MyISAM
MyISAM is not transactional period. If you want, you can pretend that doing everything in one query is transactional but it shows a misunderstanding of transactions. Using MyISAM in this day and age is simply stupid.
2- INSERT IGNORE
You state regarding using INSERT IGNORE: “This method is probably the fastest of all, especially if very few duplicate keys exist in t2. MySQL simply tries to insert every row and keeps going when one fails. The disadvantage is that IGNORE is a proprietary, non-standard extension.”
The problem of course is that MySQL doesn’t state which insert has failed so you’re still going to have to manually look for the missing rows (unless of course you have no regard for your client’s data). Moreover in an automated setting the insert will work and data will be lost. I mean if you do this to get a speed advantage why insert anything at all? Inserting nothing has to be faster and data will be just as lost.
Quote: “The disadvantage is that IGNORE is a proprietary, non-standard extension.” I think you have missed the problem.
3 – Multi table updates
Any worthwhile database system can do multi table updates, the fact that MySQL is slowly catching up and still doing it it’s own way is NOT a very good promotion.
The bottom line is MySQL is perfectly useable (I would have said differently 1 or 2 years ago) but in order to be able to use it responsibly you HAVE to know about how things SHOULd work and how MySQL handles it so you can avoid it’s pitfalls.
Anyway, my 2 cents are spent. Cheers all and good luck
Martin
10 Nov 06 at 3:01 pm
I have a table named eis_summary which has a column tran_count and ptd_count. The primary_keys are start_date & summary_code. I’m trying to update the table by assigning the accumulated total of tran_count to the ptd_count column ie ptd_count = tran_count for the first row while ptd_count of second row = tran_count + ptd_count of first row and so on.
I’m using the sql below but it only updates the first row successfully while successive rows have ptd_count = tran_count.
Note that I’m working on Ingres database but any comment from any other database perspective will be appreciated.
Thanks for your assistance.
UPDATE eis_summary a FROM eis_summary b SET ptd_count = a.tran_count + (SELECT b.ptd_count) WHERE b.start_date = (SELECT max(start_date) FROM eis_summary WHERE start_date < a.start_date ) AND a.start_date <= date('today')Tevita
4 Jan 07 at 9:53 pm
Since this is a MySQL article, I’ll give a MySQL answer: in MySQL, you could do this with user-variables to maintain running totals. See my articles on advanced user-variable techniques for some ideas. I don’t know about Ingres; I’ve never used it.
Xaprb
5 Jan 07 at 11:42 am
The use of a single “insert ignore” combined with a single “update” is a powerful combo to write incremental record from highly concurrent and racing sources.
In my example, all sources try to increment a column, like set x = x +1 where pk=…
But the update won’t work if the record isn’t there. So all sources are posting an insert ignore (x) values (0). If the record isn’t there, it will be before the update in the batch.
If it is there, only the update changes the value. Sure all your sources are doing a stupid insert ignore, but we have indexes and the bandwidth.
I just don’t know how to solve that in pure SQL (without using “insert ignore”).
Any hints?
dd
1 Mar 07 at 7:03 pm
dd, see my article on how to write INSERT IF NOT EXISTS queries in standard SQL. I think that addresses your question.
Xaprb
1 Mar 07 at 7:09 pm
Hi,
I want to insert a record or update if exist.
The existence of the record is evaluted not by the primary key. I want to use other arbitrary field (of course UNIQUE field).
Then the IGNORE option don’t work for me because the duplicity is evaluted with the primary key field.
Thanks for your info.
Leonardo
Leonard
27 Mar 07 at 9:58 pm
Hi Leonard, see my reply to dd, just above your comment.
Xaprb
28 Mar 07 at 8:04 am
I am sure glad I found your page. Thank you very much for your hard work. I have been dealing with this issue all week.
I have an application with a CSV import feature that I am working on now. My boss said doing an insert or delete per row while reading the CSV is expensive. So I am trying to digest what you talk about here.
Before reading the CSV, I set up a temp table:
// LETS CREATE A TEMPORARY TABLE TO HOLD DATA BEFORE COMPARISON WITH REAL TABLE $SQL = "CREATE TEMPORARY TABLE $tmptable LIKE $tablename"; $mysql = mysql_query($SQL);I should note that the CSV has the proper headers in the right order as the fields in the database:
// LETS LOOK AT THE FIRST ROW AND GET HEADERS IN AN ARRAY USING THIS FUNCTION: $csv_headers = getFirstRow($csvfile, ",");Here is the description of my real table:
What I am curious about is which of the methods you described would you use in this case to do insert or updates?
I had started out doing insert or update per row based on a query that comparted domains.Id to CSV.Id and domains.domain to CSV.domain. If match, do update, if not, do insert. My boss said this was expensive with big files.
I am also wondering about the fact that the CSV file will not have Ids filled in for new domains so the logic above wouldn’t work. Although checking for domain name would as it is unique – but it is not the primary key.
Sorry for the length here – hope the detail helps though.
And thanks for this article! I will be rereading it again soon.
matt
Matt
29 Mar 07 at 9:09 pm
I’d probably do a LOAD DATA INFILE into the temporary table and then do an INSERT… SELECT.
Xaprb
30 Mar 07 at 9:21 am
Wickedly good article. Thank you. One error I noticed:
alter table t2 add done tinyint null;
update t1
inner join t2 on t1.a = t2.d
set t1.b = t2.e,
t1.c = t2.f,
t2.done = 1;
insert into t1 (a, b, c)
select d, e, f from t1
where done is null;
the last select statement should read:
select d, e, f from t2
but this helped me out in a jam. Thank you very much.
Tim
3 Jun 07 at 4:18 am
Thanks for catching that. I’ve fixed it.
Xaprb
3 Jun 07 at 9:29 am
very useful!
ronald
19 Jul 07 at 7:18 am
Hi,
I have a query to ask. I want to fetch a row using select statement and insert the datas into another table at the same time i need to update one more table for a variable for the same fetch, will that be possible.
insert into table x
(select ….
….
from table a,b,c)
in the same query i need to update one variable for table a for the same fetch. will that be possible.
Vin
22 Aug 07 at 8:30 am
No, there is no INSERTSELECTUPDATE keyword in SQL. You need to use multiple statements and probably a transaction.
Xaprb
22 Aug 07 at 9:36 am
Thanks for the advice! I was going round in circles with the INSERT IGNORE statement. I hadn’t added a unique index to my table, so the request to IGNORE was being IGNORED!
Microformats
6 Oct 07 at 7:04 pm
“Insert into account(CustNo) values (Select MAX(CustNo) from customer)”
I am working in NetBeans 5 and MySQL 4.1 and this would work in Oracle SQL but i cannot seem to find a way in MySQL NetBeans to get this working, any advice from anyone, would be greatly appreciated.
Clos01
3 Nov 07 at 10:11 am
Interesting stuff here. May I ask a question?
I’m building a data warehouse with MySQL 5 – still fairly new to using more than a SELECT statement, but getting better all the time. It’s working very well, I do the ETL transfers overnight when it’s quiet and pulling data into staging tables where I have a large number of updates to generate the surrogate keys prior to export into the main FACT table. At present I only do between 70 – 300 rows a night – it’s only one small part of what it will become, but is there a more efficient way than having a stored procedure full of update statements? There are a couple od Date rollovers where the update is dependant on the value of a Frequency field in the same table, so 1 = daily, 2 = weekly and so on. I’d like to use a CASE statement but can’t get it to work. I load the dimension tables into memory first, run the updates and then convert them back to MyISAM. It’s quick – takes less than a couple of seconds to load, process, export and load into the fact table but I’m concious that I will be handling much much more later on.
Matt
7 Nov 07 at 8:12 am
Take a look at Kettle — it’s an ETL system for MySQL. Also MySQL Archiver from the MySQL Toolkit.
Xaprb
7 Nov 07 at 9:09 am
Looked at Kettle a while back – couldn’t get it to work properly. I’ve got Talend running – too complex for me. I’m using scripts at the moment but will have another look. Downloaded the Toolkit and will have a play. Thnks for the advice.
Matt
7 Nov 07 at 9:42 am
Thanks for great examples. You really helped me a lot.
Serkan
10 Dec 07 at 5:48 am
Hi Xarpb,
This article is excellence. However my situtation is slightly different. I have 2 databases with the same table structure and so on. Now I want to combine the data in 2 databases into 1,i.e merging them. Now if I want to merge two table that have a unique key (auto-increment) if duplicate key is found instead of updating the old records with the new one, I would like to insert the new one with the unique being auto incremented. How would you do that ?
Many thanks,
Minh Hoang
Minh Hoang
18 Dec 07 at 7:24 pm
Hi Xaprb,
Good article. I have a situation here where i have in a table 2 rows like—>
row1
Branch Pub_Cod Bud_alloted W_I_P Am_Paid Old_budget
abc 411 10000 0 0 0
row2
Branch Pub_Cod Bud_alloted W_I_P Am_Paid Old_budget
xyz 422 0 0 0 0
Whe way i want the query to work is, i want to swap the Bud_alloted value i.e 1lakh from branch abc into 2nd row’s bud_alloted which is 0 now. And after swapping the value to 2nd row’s bud_alloted, in the 1st row the value 1lakh should be set in the old_budget column i.e something like this after swapping–>
row1
Branch Pub_Cod Bud_alloted W_I_P Am_Paid Old_budget
abc 411 0 0 0 100000
row2
Branch Pub_Cod Bud_alloted W_I_P Am_Paid Old_budget
xyz 422 100000 0 0 0
forgot to add 1 last bit here my branch,pub_cod is primary whereas am_paid and pub_cod are index key.
can this be achieved in mysql?
Thanks
regards,
ajos:)
ajos
9 Jan 08 at 9:21 am
How do I know if I got a warning?
x
9 Mar 08 at 6:54 pm
If you’re using the MySQL monitor, you can see it in the status output after the query. The warning count is passed back via the client protocol, but whether the tool you’re using pays attention, and whether it displays anything if there are warnings, varies.
Xaprb
9 Mar 08 at 9:01 pm
I want to insert…update into multiple tables how do i do that?
I have 3 tables e.g. table a table b table c.
table a (a_id)
table b( b_id, a_id) //a_id-foreign key referencing table a
table c (c_id, b_id) //b_id-foreign key referencing table b
Thanks :)
M
13 Apr 08 at 11:01 am
Wow, greate post!
Krzysztof Bieleńko
29 Apr 08 at 8:12 am
Thanks for posting this article. I’ve been looking for atomic upserts, and the INSERT INTO … ON DUPLICATE fits the bill nicely. I would not have know about the syntax if I hadn’t found this article on google. Thanks for posting!
Ben
18 Jul 08 at 1:42 pm
Good article
keep doing the quality work
surya
28 Jan 09 at 3:18 am
[...] How to write flexible INSERT and UPDATE statements in MySQL [...]
Shared Items - August 2, 2009 « Jeetu’s Shared Memory
2 Aug 09 at 11:38 pm