Comments on: How Percona Toolkit divides tables into chunks http://www.xaprb.com/blog/2012/05/06/how-percona-toolkit-divides-tables-into-chunks/ Stay curious! Thu, 02 May 2013 12:36:53 +0000 hourly 1 http://wordpress.org/?v=3.5.1 By: Xaprb http://www.xaprb.com/blog/2012/05/06/how-percona-toolkit-divides-tables-into-chunks/#comment-20047 Xaprb Sun, 06 May 2012 20:08:17 +0000 http://www.xaprb.com/blog/?p=2339#comment-20047 Peter, no – the old-style “chunking” I’m referring to is based on math. None of these techniques uses an ever-increasing offset/limit approach.

Peter Zaitsev’s note is based on the assumption that rows are gone after processing them, and works fine in that scenario. Many of our tools have to keep a “bookmark” and start from there; the rows remain after processing, and we can’t keep scanning from the start and skipping over rows. That is too costly and intrusive. We have to do what I called non-backtracking index scans in the old blog posts I mentioned.

]]>
By: Xaprb http://www.xaprb.com/blog/2012/05/06/how-percona-toolkit-divides-tables-into-chunks/#comment-20046 Xaprb Sun, 06 May 2012 20:05:03 +0000 http://www.xaprb.com/blog/?p=2339#comment-20046 Shlomi, yes, the compound operator problem you found is the same thing I solved with pt-archiver (then MySQL Archiver), and although really complex WHERE clauses result, it works well as you note in the linked blog post. I’ve seen it work on indexes with up to 10 columns (really). Of course, the resulting query was basically unreadable by humans. The easiest place to link to illustrate is a Maatkit test suite: http://code.google.com/p/maatkit/source/browse/trunk/common/t/TableNibbler.t#469

We don’t use user variables, though. Instead of LIMIT 1 OFFSET 1000, we actually capture two rows at the offset. The second row automatically becomes the lower boundary of the next chunk, and by comparing the first and second, we know if we’re in a potential infinite-loop scenario. So this has some benefits.

]]>
By: Peter Laursen http://www.xaprb.com/blog/2012/05/06/how-percona-toolkit-divides-tables-into-chunks/#comment-20045 Peter Laursen Sun, 06 May 2012 19:50:31 +0000 http://www.xaprb.com/blog/?p=2339#comment-20045 BTW: Funny enough Peter Zaitssev just published http://www.mysqlperformanceblog.com/2012/05/06/load-management-mysql/

“For example if I need to delete old data instead of DELETE FROM TBL WHERE ts<"2010-01-01" I’ll do “DELETE FROM TBL WHERE TS<"2010-01-01" LIMIT 1000 in the loop until no more rows need to be deleted."

Isn't that that 'old style' CHUNKing you tell here that Percona has found a better alternative to?

]]>
By: Peter Laursen http://www.xaprb.com/blog/2012/05/06/how-percona-toolkit-divides-tables-into-chunks/#comment-20044 Peter Laursen Sun, 06 May 2012 19:46:02 +0000 http://www.xaprb.com/blog/?p=2339#comment-20044 “It practically requires a numeric, single-column index[1].” Why? if you use CHUNKS you will add a “LIMIT n,1000″ -clause to queries (iteratively increading n from ZERO with the modulus you use for each iteration).

It works with ORDER BY on any column (whether it has an index or not) and even with no ORDER BY in the query (server will return data in *some* order and apply the LIMIT to that order).

]]>
By: Shlomi Noach http://www.xaprb.com/blog/2012/05/06/how-percona-toolkit-divides-tables-into-chunks/#comment-20043 Shlomi Noach Sun, 06 May 2012 19:12:19 +0000 http://www.xaprb.com/blog/?p=2339#comment-20043 Hi, this is the very same technique used by openark-kit, though it is called “chunking” there. You guarantee a number of rows per chunk by counting up some order, and LIMIT.

I do that using user defined variables. Say you have a two column (a,b) primary key. In which case I do a:
SELECT a,b FROM tbl ORDER BY a ASC, b ASC LIMIT 1 INTO @c1, @c2

To compute end of range, start with @c1, @c2, same ORDER BY, LIMIT 1000 (for example)
Then take the highest value using an enclosing query on opposite order (a DESC, b DESC) LIMIT 1.

Since user defined variables assume the data type provided by query, this makes this method safe to use for any number of columns in your primary (or other UNIQUE) key, and for any type. My code actually does not care the least about the data type, other than recommending the best key to use based on some heuristics.

Trickey to explain in such short comment, but works well. I also found out about this, which makes your code a whole less readable as result.

This is also one of the major piece of code incorporated into Facebook’s Online Schema Change.

BTW, openark-kit does not handle NULLable columns, and requires at least one UNIQUE constraint to choose from.

You may also check out candidate_keys_recommended view in common_schema, which attempts to recommend the best unique key in some table for use as primary key.
It also makes for the best candidate for a chunking key.

Hope this wasn’t too much of a shameless plug. Apologies if so.

]]>