Tag Archive for 'transactions'

Using BASE instead of ACID for scalability

My editor Andy Oram recently sent me an ACM article on BASE, a technique for improving scalability by being willing to give up some other properties of traditional transactional systems.

It’s a really good read. In many ways it is the same religion everyone who’s successfully scaled a system Really Really Big has advocated. But this is different: it’s a very clear article, with a great writing style that really cuts out the fat and teaches the principles without being specific to any environment or sounding egotistical.

He mentions a lot of current thinking in the field, including the CAP principle, which Robert Hodges of Continuent first turned me onto a couple months ago. It has been proven formally, though I have not read the proof myself.

One of the most important concepts he advances is giving up the illusion of control. As programmers and DBAs, I think we may tend to like control too much. Foreign keys are a perfect example. I think the point here is that these things make you feel safe, but they don’t really make you safe. Just as with so many things in life, recognizing our inability to really control the systems we build is key to working with their strengths — instead of trying to bind them with iron bands.

Another great point is idempotency. This is a great way to help avoid problems with MySQL replication, by the way. I’ll leave the “why” as an exercise for the reader, but let me just point out that the file MySQL uses to remember its current position in replication is not synced to disk, so it will almost certainly get out of whack if MySQL dies ungracefully. (Google has solved this problem.)

A highly recommended read — worth more than most case studies about how specific companies have scaled their specific systems.

Technorati Tags:, , , , , , , , , , ,

You might also like:

  1. Get a free sample chapter of High Performance MySQL Second Edition

MySQL Toolkit version 989 released

Download MySQL Toolkit

This release of MySQL Toolkit fixes some minor bugs, and adds major new functionality to MySQL Parallel Dump.

Big News: MySQL Parallel Dump

I wrote a lot more tests and cleaned up MySQL Parallel Dump a lot (fixed bugs with failed dumps not being reported, for instance) but the really big news is I added chunking functionality to it. Now you can say

mysql-parallel-dump --chunksize 100000

and it will try to divide each table into chunks with 100,000 rows each. It can do the chunks in parallel, so it can actually be running several dumps from one table at the same time. The chunking is fuzzy: it’s a hard problem, and I adapted (and improved) the code from MySQL Table Checksum to do it. If you can improve it, please contribute your fixes (the Sourceforge project page has several ways for you to do that).

You can also dump by size, which is probably more useful for most people. To do 10MB per chunk (approximately), use this command:

mysql-parallel-dump --chunksize 10M

This is a big deal not just because it lets you parallelize dumps from a single table, but because having the dump split up makes it easier to restore in small chunks, which as readers have pointed out is a big help on transactional storage engines.

The parallel restore tool is in incubation. In the meantime, please put this tool through its paces. Clearly it’s not yet well-tested and I look forward to your bug reports!

Changelog

coffee grinder cuisinart coffee grinder la pavoni coffee grinder black and decker coffee grinder bodum coffee grinder mahlkonig coffee grinder mr coffee coffee grinder hamilton beach coffee grinder bunn coffee grinder jura coffee grinder astra coffee grinder delonghi coffee grinder grindmaster coffee grinder burr coffee grinder brewer coffee grinder bosch coffee grinder melitta coffee grinder electric coffee grinder antique coffee grinder electric skillet presto electric skillet rival electric skillet west bend electric skillet villaware electric skillet toastess electric skillet black and decker electric skillet hamilton beach electric skillet cuisinart electric skillet sunpentown electric skillet aroma electric skillet sunbeam electric skillet saladmaster electric skillet farberware electric skillet oster electric skillet ge electric skillet
Changelog for mysql-find:

2007-10-03: version 0.9.5

   * The --dbregex parameter didn't work right.

Changelog for mysql-heartbeat:

2007-10-03: version 1.0.1

   * --check hung forever.

Changelog for mysql-parallel-dump:

2007-10-03: version 0.9.6

   * Arguments to external program weren't honored.
   * System exit codes were lost, so errors weren't reported.
   * Added chunking.
   * Modularized and tested.
   * Added documentation.
   * Made --locktables negatable.
   * Changed default output to be less verbose and added --verbose option.
   * Added summary output.
Technorati Tags:, , , , ,

You might also like:

  1. MySQL Toolkit version 1254 released
  2. Introducing MySQL Parallel Restore
  3. MySQL Toolkit version 946 released
  4. MySQL Toolkit version 1030 released
  5. How to check and optimize MySQL tables in parallel

Version 1.5.2 of the innotop MySQL monitor released

Download innotop

This release is part of the unstable 1.5 branch. Its features will ultimately go into the stable 1.6 branch. You can download it from the innotop-devel package.

The major change is I’ve ripped out the W (Lock Waits) mode and enabled innotop to discover not only what a transaction is waiting for, but what it holds too. The new mode that replaces W is L (Locks). My last article goes into more detail on this.

Technorati Tags:, , , , ,

You might also like:

  1. How to debug InnoDB lock waits
  2. Version 1.6.0 of the innotop monitor for MySQL released
  3. How to monitor InnoDB lock waits
  4. Version 1.5.1 of the innotop MySQL monitor released
  5. A look at innotop’s new features

How to debug InnoDB lock waits

This article shows you how to use a little-known InnoDB feature to find out what is holding the lock for which an InnoDB transaction is waiting. I then show you how to use an undocumented feature to make this even easier with innotop.

Background

One of the most common complaints I’ve heard from DBAs used to other database servers is “I can’t find out who holds the locks that are blocking all these connections and making them time out.” I feel your pain. Before I helped scale my employer’s systems to deal with larger volumes of data, InnoDB lock contention was a serious issue. And as far as I knew, you couldn’t find out who was holding locks. I knew you could see who was waiting for locks to be granted; that’s easy. You just run SHOW INNODB STATUS and look for the following text:

------------
TRANSACTIONS
------------
Trx id counter 0 4874
Purge done for trx's n:o < 0 4869 undo n:o < 0 0
History list length 21
Total number of lock structs in row lock hash table 2
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0 4873, ACTIVE 6 sec, process no 7142, OS thread id 1141152064 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 368
MySQL thread id 9, query id 173 localhost root Sending data
select * from t1 for update
——- TRX HAS BEEN WAITING 6 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 9 page no 3 n bits 72 index `PRIMARY` of table `test/t1` trx id 0 4873 lock_mode X waiting
…

That’s fine, but who holds the lock? I thought there was no way to find that out.

InnoDB Lock Monitor

Until I learned about the InnoDB Lock Monitor, that is. You enable it by running the following command:

CREATE TABLE innodb_lock_monitor(a int) ENGINE=INNODB;

It’s quite an ugly hack, but it turns out the table name is actually “magical.” It’s a special table name that tells InnoDB to start the lock monitor. You can stop it by dropping the table again.

This little-noticed feature makes InnoDB print out a slightly modified version of what you see with SHOW INNODB STATUS. The “slight modification” is to print out not only the locks the transaction waits for, but also those it holds. For example, here’s the transaction that holds the locks:

---TRANSACTION 0 4872, ACTIVE 32 sec, process no 7142, OS thread id 1141287232
2 lock struct(s), heap size 368
MySQL thread id 8, query id 164 localhost root
TABLE LOCK table `test/t1` trx id 0 4872 lock mode IX
RECORD LOCKS space id 9 page no 3 n bits 72 index `PRIMARY` of table `test/t1` trx id 0 4872 lock_mode X
Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; compact format; info bits 0
 0: len 8; hex 73757072656d756d; asc supremum;;

Record lock, heap no 2 PHYSICAL RECORD: n_fields 3; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;; 1: len 6; hex 000000000d35; asc      5;; 2: len 7; hex 800000002d0110; asc     -  ;;

That’s fine, but there are, ah, limitations. As the manual says, InnoDB periodically prints out this text — essentially spewing InnoDB’s guts — to its standard output. This gets redirected to the server error log in any sane installation. Who’s looking there? And it gets printed out at long intervals, which seems to be about every 16 seconds on the machines I use.

Plus, if you’ve looked at the result, you’ll understand this is not something you want to search through manually looking for data. The output can be absolutely huge. What DBA wants to pore over thousands of hex-dumped rows from the table just to answer the question “who holds that lock?”

All in all, this is not very convenient (yep, I know that’s an understatement).

Slightly more convenient

What’s a little more convenient than combing through all that text by hand is writing a program to parse InnoDB’s status output. You don’t have to, though. That’s what I wrote innotop to do. And I’ve just released version 1.5.2, which at long last has the ability to watch a log file as well as connecting to server(s).

Here’s how this works: you start innotop, and press the L key to switch to Lock mode. This replaces the old Lock Wait mode, which was only able to monitor the InnoDB lock waits you see in the normal output of SHOW INNODB STATUS.

This mode shows you something like the following:

_____________________________ InnoDB Locks __________________________
CXN   ID  Type    Waiting  Wait   Active  Mode  DB    Table  Index
file  12  RECORD        1  00:10   00:10  X     test  t1     PRIMARY
file  12  TABLE         0  00:10   00:10  IX    test  t1
file  12  RECORD        1  00:10   00:10  X     test  t1     PRIMARY
file  11  TABLE         0  00:00   00:25  IX    test  t1
file  11  RECORD        0  00:00   00:25  X     test  t1     PRIMARY

That’s helpful! I can see the locks held and waited for in a nice tabular format. It’s pretty easy to see connection 11 is blocking connection 12.

This is still pretty inconvenient, though. To get access to the server’s error log, I have to run innotop on the database server machine itself. Is there a better way?

Even better

There is, in fact, but I discovered it completely by accident. It’s not documented, but the extra information doesn’t just get printed to the server log. It also shows up in SHOW INNODB STATUS! Now that’s a nice surprise. It means innotop can get lock information from a normal connection instead of monitoring a log file.

After discovering this, I immediately added some more features to innotop. There are now hot-keys in L mode to enable and disable the lock monitor. Now you can press L, press the ‘a’ key to start the lock monitor, see what’s blocking the waiting transaction, press ‘o’ to stop the lock monitor, and you’re done.

Best yet

I’m sure you InnoDB administrators already recognize what an improvement this is over the options you previously had (essentially, you didn’t have any). There’s still a long way to go, though. Locks could be in the INFORMATION_SCHEMA or in a SHOW LOCKS command. I won’t speculate on why they aren’t already.

Of course, the upcoming Falcon storage engine already has better features for debugging lock contention than this. But my guess is it’ll be a long time before Falcon has the market share InnoDB has. All things considered, InnoDB is a pretty nice piece of software.

Conclusion

Download innotop

The conclusion to this whole article is: use innotop if you use InnoDB. Heck, use it if you use MySQL at all. It makes a lot of things a lot easier, not just debugging InnoDB lock contention. Feedback is welcome — just use the Sourceforge bug tracker, forums, and mailing lists.

Technorati Tags:, , , , ,

You might also like:

  1. How to monitor InnoDB lock waits
  2. How I patched InnoDB to show locks held
  3. How to find out who is locking a table in MySQL
  4. Version 1.5.2 of the innotop MySQL monitor released
  5. A little-known way to cause a database deadlock

Archive strategies for OLTP servers, Part 3

In the first two articles in this series, I discussed archiving basics, relationships and dependencies, and specific archiving techniques for online transaction processing (OLTP) database servers. This article covers how to move the data from the OLTP source to the archive destination, what the archive destination might look like, and how to un-archive data. If you can un-archive easily and reliably, a whole new world of possibilities opens up.

For your reference, here are links to part 2 and part 1, and the original article on efficient SQL queries for archiving, which is the basis for this whole series.

How to move the data

At some point you have to actually take the data from the source and put it into the archive. This is a three-step process:

  1. Find archivable rows
  2. Insert them into the archive
  3. Delete them from the source

I wrote an article on how to find archivable rows efficiently, so I won’t go into it more here. Inserting and deleting are usually straightforward, but there are subtleties and tricks that can lead to nifty solutions.

Transactions

The most important question about actually moving the data is how to do it safely, with or without transactions. Even if the source and archive are on different servers, you can do distributed transactions, either in your application logic or with a two-phase commit supported by your database product. For most purposes, I’ve found it just as reliable and more efficient to handle the transaction in your application logic.

For many of the reasons mentioned in the second article in this series, I would recommend relaxing the consistency requirements between source and archive, so you can keep the archived data out of the source’s transaction. You can do this safely by performing the operations in the order I listed above: insert, delete, commit the insert, then commit the delete. If you are archiving to a file at the same time, you can also write to the file before the insert.

Your archive might also be non-transactional. If you’re using MySQL, you should think about using a faster non-transactional engine that stores the data more compactly, such as MyISAM, for the archived data.

Use replication to unload the OLTP server

One of the most effective ways to archive an OLTP server without impacting it too much is to do the hard work of finding and inserting on a slave server, then performing the delete on the master and letting it flow through replication. Here’s an example from a past employer: we replicated the order table to a “transfer” database on the data warehouse server. A job on the data warehouse server looked for orders that had completed and shipped, and thus could be archived. It copied these in a transaction to the long-term storage, then deleted on the OLTP server. This delete flowed through replication back to the data warehouse, and removed the rows from the transfer database.

The archive server

I’ve already mentioned some ways you might design the archive server, but there are a few other things I want to cover briefly. The first is what happens when you don’t use a different server at all, and just archive to adjacent tables on the OLTP server. This can be a fine way to do things. As long as the data isn’t used, it’s just taking up space on disk. However, it might make backups more difficult.

If you use a different server to hold the archived data, you should probably consider some kind of partitioning scheme. If your server doesn’t support partitioned tables natively, you might want to archive into a different table every so often, building views over all the tables to present them as a single logical table. There are some important advantages to this, especially if you eventually want to purge really old data. It is much easier to drop an entire table when it ages out than to delete rows from a huge table.

This gets into the topic of how to build a large data warehouse, so I’ll just leave it at this: if you forsee the archived data getting large, start planning for it now.

Duplicated data

Unless you use distributed transactions or some clever way to guarantee atomicity, there’s a chance you’ll insert to your archive but fail to delete from the source. Now you have duplicate data. How will you handle this?

First, decide if you can tolerate the situation. (I told you we hadn’t seen the last of the consistency requirements!) I suggest you take it as a given, if at all possible. Design your archiving jobs so they can tolerate existing data in case they get terminated uncleanly or otherwise have errors. If they try to insert rows that exist, you should probably overwrite the existing rows with new ones, which might have changed on the OLTP server. Make sure you don’t lose data from this, one way or another.

If you are archiving summary tables, you might need to be careful. A row that’s built incrementally on the OLTP system might need to be re-aggregated, instead of replaced, if it already exists in the archive.

Duplicated data makes some queries hard to get consistent. For instance, a view that takes the union of archived and un-archived data will tell a lie if a row exists in both places. You need to factor this in when deciding how to do the archiving. Duplicates can also happen during un-archiving.

Un-archiving

Why would you ever want to un-archive?

Here are some reasons you might benefit from being able to un-archive easily:

  • You treat all the data as equally important, but some of it as more likely to be accessed
  • You know there’s unused data but it’ll be inefficient to figure out which rows
  • You can’t get an exact answer on whether rows are archivable

Think of it this way: the ability to un-archive lowers the risk of archiving the wrong data, and allows you to archive things you might otherwise be unable to touch. It takes away or mitigates the downside.

This goes back to my analogy of archiving as flushing from a cache. You probably don’t treat databases as a multi-tier cache, and that’s a good thing. If the data isn’t where you expect, your applications would need to look elsewhere and retrieve it. Unless you write a wrapper around your database access layer that handles it automatically, this is probably infeasible.

However, you can still use the concept of retrieving missing data under certain circumstances. Does the following sound workable?

  1. Make most applications tolerate missing data and just do what they can with the data they have
  2. Identify points of entry where incoming data is a signal to un-archive something
  3. Hook an un-archiving system into the points of entry
  4. Archive freely and let un-archiving bring back anything it needs to

Here’s a concrete example from the advertising system I mentioned previously. This system archives ads eagerly; if they don’t have any traffic in a while, it moves them away. There are limited points of entry for ad-related data: traffic data, and a record of a sale that is attributed to an ad. The systems that load this incoming data can simply check whether all referenced ads exist, and if not, attempt to un-archive any missing ones. This happens as part of the routine checks for bad incoming data. This approach is fairly new for us, and might have some kinks we haven’t yet discovered, but there is virtually no downside. The data isn’t gone, it’s just elsewhere. Now we can archive data we couldn’t before, because it was too hard to get a definite “yes, this can be archived” answer. (It’s often easy to get a “no,” but hard to get a “yes.”)

Un-archiving is non-trivial. In fact, depending on your schema, you might need to be more careful about consistency requirements than you are with your archiving strategy. However, if you’re archiving correctly, un-archives should be few and far between, so you can afford a more expensive process.

In many ways, your options for un-archiving strategies might be similar to archiving strategies. In the systems I’ve worked on, we take the depth-first-search, dependencies-first, all-one-chunk approach I think is too inefficient to use for archiving.

If your archive is non-transactional, be careful to commit the insert into the OLTP system before you delete from the archive. Otherwise your delete will be permanent, but your insert might be rolled back, and the data is lost.

You don’t have to un-archive

If you don’t want to build an un-archiving system, you can build your applications to look in the archive for data they need and can’t find in the OLTP system. If you do this seldom enough, it might work fine. One order-history system I know of does this to find orders that aren’t in the OLTP server.

Archiving miscellany

To round out this series, here is a collection of notes and references I didn’t want to mix in along the way, but I think belong here somewhere.

First, if you’re using MySQL, I’ve written tools that can handle much of the work I’ve described here. The first is MySQL Archiver, which can find and move archivable data very efficiently. It is full-featured and should handle most needs as-is, but it’s also extensible, so you can plug your own code in to handle complex archivability checks, dependency-first archiving, and so forth. Another of my tools, MySQL Find, can monitor and create historical records of table sizes, so you can get a sense of which tables are largest and/or growing the fastest (this is a one-liner that can go into your daily crontab). If you are archiving from InnoDB tables, you might want to record deadlocks with MySQL Deadlock Logger, for obvious reasons.

All these tools are part of the MySQL Toolkit, which is Free Software. Another useful tool is innotop, a real-time interactive MySQL and InnoDB monitor I’ve written.

A couple more notes on MySQL: the choice of storage engines makes MySQL especially attractive for single-server archiving solutions, in my opinion. It’s quite practical to run your intense OLTP workload on InnoDB tables (or another transactional storage engine, such as Paul McCullagh’s PBXT) and store the archive tables side-by-side in MyISAM or ARCHIVE tables. If you’re using MySQL 5.1, you also get partitioned tables; if you’re not that bleeding-edge, you might consider the strategy I suggested at the recent MySQL Conference and Expo: archive to a small InnoDB table for high concurrency, then regularly convert it to MyISAM and place it into a MERGE collection, with a view that unions the InnoDB and MERGE tables (Sheeri Critzer blogged about this also, though I’m not sure how many people are actually doing it).

I don’t really like triggers and foreign key magic, so I relegated this suggestion to here: you can use triggers and/or foreign key constraints with ON UPDATE actions to help with archiving and purging. I don’t like these approaches because I think they’re hidden, dangerous code. In Microsoft SQL Server I usually used stored procedures to archive, but in MySQL these days I use MySQL Archiver (linked above).

MySQL’s Edwin DeSouza wrote me to bring my attention to some of Craig Mullins’ recent articles about archiving. Craig’s insight is valuable if you’re researching archiving.

I think that’s it for the miscellany.

Conclusion

This series has covered what I believe to be the full scope of archiving strategies, from requirements to specific techniques you can use to archive and un-archive data from your OLTP systems. As a reminder, the larger context here is to offer scaling back as an alternative to the usual scale-up-or-scale-out dichotomy. There are always more options than you think! Archiving can be difficult and complex, but the potential gains for some types of data can be large, especially compared to some of the more frequently-discussed scaling tactics. Like other solutions, it doesn’t work for all situations, but if you forsee a huge amount of data coming at you, you should consider archiving along with other scaling techniques.

Technorati Tags:, , , , , , , , , , , , , , ,

You might also like:

  1. Archive strategies for OLTP servers, Part 1
  2. MySQL Archiver can now archive each row to a different table
  3. Archive strategies for OLTP servers, Part 2
  4. How to write a lazy UNION in MySQL
  5. MySQL Archiver 0.9.1 released

Archive strategies for OLTP servers, Part 2

In the first article in this series on archiving strategies for online transaction processing (OLTP) database servers, I covered some basics: why to archive, and what to consider when gathering requirements for the archived data itself. This article is more technical. I want to help you understand how to choose which rows are archivable, and how to deal with complex data relationships and dependencies. In that context, I’ll also discuss a few concrete archiving strategies, their strengths and shortcomings, and how they can satisfy your requirements, especially requirements for data consistency, which as you will see is one of the most difficult problems in archiving.

Remember I’m basing these articles on the nibbling principle I explained in my very first article on archiving strategies. The goal is not to move away tables or take gigantic chunks out of tables manually. If you need to archive, you’ll need to do it frequently, perhaps continuously. That means you need to write automated, incremental jobs that nibble small chunks of unwanted data away without impacting the OLTP work you’re trying to optimize.

It’s a different matter if you’re archiving or purging from an OLAP system such as a data warehouse, of course.

What data should you archive?

You can’t archive until you know what you’re going to archive. You need to prioritize your efforts, and then for each type of data, which typically means each table, you need to know whether each row is archivable.

Prioritize

The first thing to do is determine priorities. You might know some data can be archived, but come back to that later. Focus on what needs to go away to make your transactional processes run more quickly. What gets queried the most? Consider tables “heavily queried” if they have extremes of any type of query — many small queries can cause just as much work as frequent long-running ones. Which tables have the most rows, the most data, the largest indexes? Which tables are growing the most quickly? Think also about access patterns; if you have tables that are frequently scanned partially or wholly for aggregates, those too are candidates.

Identify the worst offenders and think about which of them you need to do first. If you know some frequently-queried tables are growing very fast, I would consider prioritizing them, even if they’re not yet too large. I’ve seen tables reach a point where they become too large to query, and thus difficult to archive.

Another thing you might consider is how easy it will be to archive a row, especially if you are running close to capacity. If you can archive the second-most-important table easily, and it will give the server significant performance headroom, you might want to do that first. You can archive the more work-intensive most-important table later, when your server has some capacity to spare.

Is it archivable?

Now that you’ve identified which tables to focus on, you need to determine which rows are candidates for archiving. Rather than query the table, I would focus on identifying rules, which ideally you should be able to express as a WHERE clause in a query. The simpler the better. Here are some examples.

  • Is the row related to something that’s no longer current? For example, is it a product you don’t sell anymore, or related to a client with whom you no longer work?
  • Is it scratch data that never got used? For example, an abandoned shopping cart, an order that was invalidated and cancelled, or data that was prepared for upload to a business partner but was rejected by their systems?
  • Is it data that used to record a fact, but has now been “cancelled” and just records “zero?” For example, one system I’ve worked on records advertising traffic. For various reasons it can end up with rows that say “on this day, this advertisement was clicked zero times and caused zero sales.” This happened because of artifacts in the rollup processes. If a row records the same thing as the absence of a row, it can go (actually it should probably be purged, not archived).
  • Is it an orphan (its parent record is gone) or a widow[1] (it has no child records)?
  • Does it fall into some time window that makes it no longer likely to be accessed?

This last case is probably one of the most complex, but it’s important because data can often be archived by this criteria when it doesn’t meet any other. One example is summary tables to support decisions on OLTP systems (as opposed to a decision support system running from a data warehouse). If you do many calculations on the most recent data — say, the most recent quarter — you can probably archive previous quarters to slower storage.

Of course, you might not be able to get a definite answer on whether a row can be archived. Often the true answer is too expensive to get, or is in the outside world, perhaps even in the future. Sometimes the answer is “I think it can go, but if the client asks, it has to be there” or “I think it can go, as long as we don’t get some improbable event.” In these cases you can build your systems to cope with cache misses, as it were, and go ahead and archive the data based on probability. This is why I made the comparison with caching in the first article.

An example of this comes from the advertising system I mentioned. It is designed so an advertisement that doesn’t generate any traffic for some time can be archived, but retrieved if it gets traffic in the future. I’ll write more about un-archiving in the next article.

Finally, you might not need to specify criteria for each type of data by itself. OLTP systems often have parent-child relationships among tables, so in addition to orphan or widow checks, you can decide transitively. If a row is archivable when its parent is older than a certain age, then the row’s children can be archived by looking at their “grandparent” or “uncle.” I will call this a “transitive check” from now on.

How to handle data dependencies

Relationships among data, and ensuring consistency with respect to the relationships, usually make archiving itself complex and difficult. Just as you did with archived storage requirements, you need to gather requirements for the instantaneous state of the system during archiving. There are several strategies for meeting different requirements, depending on your data’s relationships.

Types of relationships, consistency guarantees, and efficiency

I already mentioned parent/child relationships. In general, whenever you have some data in one table that’s used to look up data in another table, you have one of several possibilities:

  1. There’s an actual database-enforced foreign key between the tables
  2. Some business logic, whether in queries, stored procedures, constraints or just application code, expects certain data in one table when it sees data in another
  3. Your systems treat it as a pleasant surprise when they find corresponding data in the two tables

My guess is most of you have a fair amount of data that’s described by cases one and two, and much less that falls into case three. Wouldn’t you know it, case three is the easiest to handle!

There are several levels of consistency you might choose to follow, and you can mix and match them depending on the data. These flow from the three kinds of relationships:

  • No orphans. Foreign keys in most systems enforce this rule. If you attempt to move a parent row away, the child will become an orphan, and most foreign keys will raise an error. Your application code might also forbid orphans. In this case you’ll have to archive the children before the parent.
  • Orphans are okay. In this case you can archive the parents first, and the children next. In some systems you can disable database-enforced foreign keys, if they’re the only reason for a no-orphans-allowed situation.
  • No matter. In this case you can archive however you please.

You need to balance your consistency requirements against the need to archive efficiently. If you require 100% consistent database state all the time, you might end up doing a lot of database transactions. Transactions are built to ensure consistency, but as I mentioned in previous articles, your archiving jobs need to be designed so they are the victims if there’s ever a conflict between them and OLTP processes. Huge transactions to ensure consistency may not be practical while archiving!

Strategies for archiving

Once you’ve decided what level of inconsistency you can tolerate, you can choose archive strategies. Just as with everything else, archive strategies are a trade-off. In this case the compromise is usually between efficiency and consistency.

Here are three basic types of archiving strategies. There may be others, but these are the ones I’ve used and/or considered using most of the time:

  1. Archive parents and children with recursion.
  2. Leave orphans, then clean them up later.
  3. Archive at the leaves of the dependency tree and work toward the root.

The first strategy is a classic computer-science depth-first-traversal problem. You typically start with a central table — the root of the tree or the root of a subtree — and for each row you recursively archive its children, only archiving the parent after all the children are gone. This can be difficult, especially since your OLTP schema might not be a tree; it can be a directed acyclic graph, or it can even be a graph with cycles. Satisfying all the dependencies without introducing infinite recursion can be a hard problem. You might also find that not all the dependencies are themselves archivable. You might need to gather a list of all data that is archivable before archiving any of it, which actually means two tree traversals. Finally, you might end up with a chunk of data that has to be archived all at once, which I think is impractical in many OLTP systems. The advantage, if you can pull this off, is that you get a consistent state as long as you do it all in one transaction.

By the way, it’s typically hard to archive tables that have inter-row relationships, such as nested-set or other tree representations, without violating consistency requirements.

If you can get away with it, it’s generally easier to archive the core table and leave orphans. There are a couple advantages to this. First, you can archive in single-row (or some other small amount) chunks. This satisfies the requirement to keep it low-impact. Second, it’s simpler and more efficient to determine whether the child rows can be archived. If they are orphans, they can be. The downside is your applications might not behave well, particularly if they are working with rows in the child table whose parent rows are disappearing.

A final strategy is to archive at the leaves of the dependency tree and work “up.” In this situation, you are sort of doing the first strategy — making sure dependency relationships aren’t broken — but you’re not trying to boil the ocean. Depending on how deep the dependency tree is, you can usually decide whether a row is archivable either transitively, or by checking whether it’s a widow. When you’re working at the leaves, you check whether the root is archivable; when you’re on an internal node you check if the node is a widow. There are a couple potential drawbacks, including efficiency and consistency requirements in the archive storage, which I’ll cover next.

It’s never that simple

Alas, there’s still more complexity to conquer. You have to consider how efficient it is to determine whether a row is archivable, and what sort of consistency you need to guarantee between the unarchived and the archived data.

Consistency, again

I’ll start with the consistency problem first. This is a different sort of consistency, which is why I didn’t mention it before. It would have confused things.

It’s not always enough to consider consistency in the OLTP system. You might have to think about the archived storage too, and even worse, the two of them in combination. Consider this scenario: you have a parent row in one table, some of whose children in another table can be archived. Do they need a parent row in their destination, too? In other words, do you need to copy the row from source.parent to archive.parent before you can insert into archive.child? And if so, does it bother you that the row in source.parent cannot be archived yet because some of its children are not archivable?

If you must have a row in archive.parent before inserting into archive.child, you will probably have to resign yourself to having two copies of the parent row — one in the source, one in the archive. Otherwise, you are enforcing an all-or-nothing constraint on the whole “family” of rows, and this will probably reduce the amount of data you can archive. In practice, I think most people will end up having no foreign keys or other consistency guarantees on the archived data, so child rows can be freely inserted into the archive. If your source and archive are on the same server, don’t foreign key the archive’s child table to the source’s parent table, or you won’t be able to archive from the parent! Along with this, you’ll probably need to treat an archived row as “dirty” or “un-authoritative” if it duplicates a row in the source. Upcoming articles will dig into this more.

Efficient archivability checks

The second potential complication of the strategies above is performance while checking whether a row can be archived. Both transitive checks and widow checks can have high cost.

Transitive checks for archivability can cause a heavy query load on certain tables. Here’s a real example, from the systems I described above. An ad is archivable if it has no traffic in the past 18 months. Tables that depend on ads, and which contain an ad’s ID, can be archived by checking the traffic table for any traffic for that ad in the last 18 months. This means a lot of queries against the traffic table, which is large and heavily queried anyway. It would be more efficient to archive the ad, and then archive its dependent tables by checking whether the ad doesn’t exist anymore. This works best if the ad table can be archived to a small fraction of its former size, which makes index lookups faster.

Another problem with transitive checks is if several relationships must be traversed. In this case you might have to look up values in intermediate tables before checking the final table. This adds more overhead.

The transitive checks are often “upward” checks in the dependency tree, and have a single parent row in a single parent table; or sometimes “up and across” to another table. By contrast, widow checks are downward, and might have to check many child tables for child rows. This can be expensive, particularly in tables that are close to the root of the dependency tree. The ads table is a good example. It has literally scores of dependent tables. Querying each of them before archiving a row would be prohibitively expensive. Even worse, a child table might not have an index on its parent column. If your application doesn’t traverse the relationship from parent to child, such an index wouldn’t be needed, and adding one just to support archiving queries might not be wise.

One compromise I have used occasionally, but not analyzed rigorously, is to build a table of parent row IDs that can be archived, and then access that from child archiving jobs, instead of doing transitive checks. This way at least the “traffic” table, or its equivalent, only takes the hit once.

Conclusion

I’ve covered a lot of ground in this article, starting with “deciding archivability” and “handling relationships.” Both require knowing your data, applications, schema, and consistency requirements. It helps to understand how they interact with various archiving strategies, and in turn how those strategies can violate or support your consistency requirements. Efficiency is a big deal — it’s the point of archiving, after all — so you need to include it when determining what you really require in terms of consistency.

The next article will cover some specific strategies for actually moving the data from the source to the archive, how to do that while fulfilling data consistency requirements (again!), and finally how to un-archive. This last point is crucial for scaling by “scaling back,” because it lets you archive data you’re not sure is archivable, yet pay virtually no penalty.


[1] I realize “widow” is not the opposite of “orphan,” but in computer science it is sometimes used to mean “childless,” especially for typesetting algorithms. Joe Celko also uses the term to mean “childless” in SQL For Smarties.

Technorati Tags:, , , , , , , , ,

You might also like:

  1. Archive strategies for OLTP servers, Part 1
  2. Archive strategies for OLTP servers, Part 3
  3. MySQL Archiver can now archive each row to a different table
  4. MySQL Archiver 0.9.1 released
  5. Why does InnoDB create two indexes per foreign key?