How Percona Toolkit divides tables into chunks
The tools we’ve redesigned in Percona Toolkit recently have moved away from a legacy technique for operating on small numbers of rows at a time, towards a more reliable and predictable method. We call the old version “chunking” and the new version “nibbling.” Many other MySQL tools I’ve seen either operate on entire tables, or use the “chunking” technique and are exposed to the problems it creates. I’ll compare the two briefly to explain the differences.
Chunking attempts to divide a table into ranges of rows of a desired size, such as 1000 rows. It does this by examining the minimum and maximum value of the primary key (or other suitable index), estimating the number of rows in the table, and dividing one by the other to create a list of boundary values. Suppose that the minimum value is 1 and the maximum is 1000000, and there are an estimated 100000 rows in the table. The chunk boundaries will fall on intervals of 10000. We can operate on 1-10000, 10001-20000, and so on.
This has a number of problems that might not be obvious at first. It practically requires a numeric, single-column index[1]. It can (and does, in practice) create oversized chunks when values are sparse in some places and packed tightly in others. It leads to a lot of code complexity and bugs when the table’s data changes (especially if new rows are inserted) as the tool works. It has edge cases related to special values such as 0, NULL, the date ’0000-00-00′, the beginning of the table, and the end of the table. In short, through years of experience we found that it simply doesn’t work well enough. It works in 95% of cases, but not all that well, and in a small fraction of cases it causes serious problems, such as a massively oversized chunk that interferes with other processes on the server. If you’d like more details on these types of problems, there is a lot of information in old bug reports.
The new “nibbling” technique we are using is actually not new at all. It is something I learned before I was even involved in MySQL very much. The context is in some old blog posts I wrote about archiving and purging. The idea is to fetch a row and use that as the lower bound of the first chunk (or “nibble” — we use the terms pretty interchangeably) of rows. Then use a SELECT with a LIMIT to find the upper boundary of the nibble, as well as the lower bound of the next nibble. After processing the nibble, repeat the steps with the lower bound of the next nibble. The disadvantage of this process is that there is quite a bit of code complexity when you get multi-column indexes, with NULL-able columns adding a whole new twist to the game. However, this is long since solved, and we have a reliable and well-tested library of routines for dealing with this. In return we get predictable behavior on practically any table with an index, even if it isn’t a unique index. The only remaining edge case is when a non-unique index contains a range of identical values that is larger than the desired chunk size, but that is easily detected and handled in application-specific ways.
The result of the work we’ve been doing recently, replacing “chunking” with “nibbling” in pt-table-checksum and pt-online-schema-change, is reliable and safe nibbling behavior. This has the further benefit of allowing us to do much more sophisticated techniques, such as dynamically varying the size of each chunk of rows in response to changing conditions such as server load or varying row size and complexity. Our recent tools are designed to target a predictable query time for each nibble, rather than a specific number of rows. I am happy to report that, in extensive real-world usage, they are able to stay extremely close to the target time even as conditions vary dramatically.
[1] Although in theory you can operate on the first column in a multi-column index, real-world experience shows that the first column in such indexes tends to have low cardinality, thus creating enormous chunks. And although it is possible to treat date, timestamp, and some other types as numeric, in practice it is very difficult. What number corresponds to 0000-00-00? Our attempts to create algorithms that would work on non-numeric types such as character-based columns were tremendously difficult and the results were discouraging; again, in the real world it doesn’t work well.



Hi, this is the very same technique used by openark-kit, though it is called “chunking” there. You guarantee a number of rows per chunk by counting up some order, and LIMIT.
I do that using user defined variables. Say you have a two column (a,b) primary key. In which case I do a:
SELECT a,b FROM tbl ORDER BY a ASC, b ASC LIMIT 1 INTO @c1, @c2
To compute end of range, start with @c1, @c2, same ORDER BY, LIMIT 1000 (for example)
Then take the highest value using an enclosing query on opposite order (a DESC, b DESC) LIMIT 1.
Since user defined variables assume the data type provided by query, this makes this method safe to use for any number of columns in your primary (or other UNIQUE) key, and for any type. My code actually does not care the least about the data type, other than recommending the best key to use based on some heuristics.
Trickey to explain in such short comment, but works well. I also found out about this, which makes your code a whole less readable as result.
This is also one of the major piece of code incorporated into Facebook’s Online Schema Change.
BTW, openark-kit does not handle NULLable columns, and requires at least one UNIQUE constraint to choose from.
You may also check out candidate_keys_recommended view in common_schema, which attempts to recommend the best unique key in some table for use as primary key.
It also makes for the best candidate for a chunking key.
Hope this wasn’t too much of a shameless plug. Apologies if so.
Shlomi Noach
6 May 12 at 3:12 pm
“It practically requires a numeric, single-column index[1].” Why? if you use CHUNKS you will add a “LIMIT n,1000″ -clause to queries (iteratively increading n from ZERO with the modulus you use for each iteration).
It works with ORDER BY on any column (whether it has an index or not) and even with no ORDER BY in the query (server will return data in *some* order and apply the LIMIT to that order).
Peter Laursen
6 May 12 at 3:46 pm
BTW: Funny enough Peter Zaitssev just published http://www.mysqlperformanceblog.com/2012/05/06/load-management-mysql/
“For example if I need to delete old data instead of DELETE FROM TBL WHERE ts<"2010-01-01" I’ll do “DELETE FROM TBL WHERE TS<"2010-01-01" LIMIT 1000 in the loop until no more rows need to be deleted."
Isn't that that 'old style' CHUNKing you tell here that Percona has found a better alternative to?
Peter Laursen
6 May 12 at 3:50 pm
Shlomi, yes, the compound operator problem you found is the same thing I solved with pt-archiver (then MySQL Archiver), and although really complex WHERE clauses result, it works well as you note in the linked blog post. I’ve seen it work on indexes with up to 10 columns (really). Of course, the resulting query was basically unreadable by humans. The easiest place to link to illustrate is a Maatkit test suite: http://code.google.com/p/maatkit/source/browse/trunk/common/t/TableNibbler.t#469
We don’t use user variables, though. Instead of LIMIT 1 OFFSET 1000, we actually capture two rows at the offset. The second row automatically becomes the lower boundary of the next chunk, and by comparing the first and second, we know if we’re in a potential infinite-loop scenario. So this has some benefits.
Xaprb
6 May 12 at 4:05 pm
Peter, no – the old-style “chunking” I’m referring to is based on math. None of these techniques uses an ever-increasing offset/limit approach.
Peter Zaitsev’s note is based on the assumption that rows are gone after processing them, and works fine in that scenario. Many of our tools have to keep a “bookmark” and start from there; the rows remain after processing, and we can’t keep scanning from the start and skipping over rows. That is too costly and intrusive. We have to do what I called non-backtracking index scans in the old blog posts I mentioned.
Xaprb
6 May 12 at 4:08 pm