How Percona Toolkit divides tables into chunks
Posted in Databases on May 6, 2012
The tools we’ve redesigned in Percona Toolkit recently have moved away from a legacy technique for operating on small numbers of rows at a time, towards a more reliable and predictable method. We call the old version “chunking” and the new version “nibbling.” Many other MySQL tools I’ve seen either operate on entire tables, or use the “chunking” technique and are exposed to the problems it creates. I’ll compare the two briefly to explain the differences.
Chunking attempts to divide a table into ranges of rows of a desired size, such as 1000 rows. It does this by examining the minimum and maximum value of the primary key (or other suitable index), estimating the number of rows in the table, and dividing one by the other to create a list of boundary values. Suppose that the minimum value is 1 and the maximum is 1000000, and there are an estimated 100000 rows in the table. The chunk boundaries will fall on intervals of 10000. We can operate on 1-10000, 10001-20000, and so on.
This has a number of problems that might not be obvious at first. It practically requires a numeric, single-column index. It can (and does, in practice) create oversized chunks when values are sparse in some places and packed tightly in others. It leads to a lot of code complexity and bugs when the table’s data changes (especially if new rows are inserted) as the tool works. It has edge cases related to special values such as 0, NULL, the date ‘0000-00-00′, the beginning of the table, and the end of the table. In short, through years of experience we found that it simply doesn’t work well enough. It works in 95% of cases, but not all that well, and in a small fraction of cases it causes serious problems, such as a massively oversized chunk that interferes with other processes on the server. If you’d like more details on these types of problems, there is a lot of information in old bug reports.
The new “nibbling” technique we are using is actually not new at all. It is something I learned before I was even involved in MySQL very much. The context is in some old blog posts I wrote about archiving and purging. The idea is to fetch a row and use that as the lower bound of the first chunk (or “nibble” – we use the terms pretty interchangeably) of rows. Then use a SELECT with a LIMIT to find the upper boundary of the nibble, as well as the lower bound of the next nibble. After processing the nibble, repeat the steps with the lower bound of the next nibble. The disadvantage of this process is that there is quite a bit of code complexity when you get multi-column indexes, with NULL-able columns adding a whole new twist to the game. However, this is long since solved, and we have a reliable and well-tested library of routines for dealing with this. In return we get predictable behavior on practically any table with an index, even if it isn’t a unique index. The only remaining edge case is when a non-unique index contains a range of identical values that is larger than the desired chunk size, but that is easily detected and handled in application-specific ways.
The result of the work we’ve been doing recently, replacing “chunking” with “nibbling” in pt-table-checksum and pt-online-schema-change, is reliable and safe nibbling behavior. This has the further benefit of allowing us to do much more sophisticated techniques, such as dynamically varying the size of each chunk of rows in response to changing conditions such as server load or varying row size and complexity. Our recent tools are designed to target a predictable query time for each nibble, rather than a specific number of rows. I am happy to report that, in extensive real-world usage, they are able to stay extremely close to the target time even as conditions vary dramatically.
 Although in theory you can operate on the first column in a multi-column index, real-world experience shows that the first column in such indexes tends to have low cardinality, thus creating enormous chunks. And although it is possible to treat date, timestamp, and some other types as numeric, in practice it is very difficult. What number corresponds to 0000-00-00? Our attempts to create algorithms that would work on non-numeric types such as character-based columns were tremendously difficult and the results were discouraging; again, in the real world it doesn’t work well.