Comments on: How often should you use OPTIMIZE TABLE? http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/ Stay curious! Fri, 10 May 2013 18:25:19 +0000 hourly 1 http://wordpress.org/?v=3.5.1 By: Matt Corgan http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/#comment-17786 Matt Corgan Wed, 10 Feb 2010 04:42:13 +0000 http://www.xaprb.com/blog/?p=1611#comment-17786 Does anyone know what happens to a table of randomly inserted PKs as it grows over time in InnoDB? I’m not worried about the secondary indexes, just trying to understand the clustered index. Here is one possible interpretation… where does it go awry?

- as pages in the buffer pool fill up, they are split in memory
- split pages are flushed to the WAL and/or doublewrite buffer
- when the buffer pool fills up, some pages are flushed. Which ones?
- existing pages are written back to where they already reside?
- new pages are sorted and written out sequentially in new 1MB “extents”?

Is that even close? Seems like you’d get some extremely fragmented tables, which has actually been my experience. I do actually try to defragment every few months and see orders of magnitude speed-ups on sequential scans.

I’ve always wondered why there’s no background defragmentation process. That seems extremely valuable compared to all the TPS optimizations i see published. Do other databases have that? It would be great if it vacuumed data into 10-100MB sequences when IO capacity is available.

]]>
By: Shlomi Noach http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/#comment-17762 Shlomi Noach Mon, 08 Feb 2010 18:16:11 +0000 http://www.xaprb.com/blog/?p=1611#comment-17762 (cotinuing my last comment)
Of course, the number of page splits assumption above only relates to one index; when there are multiple keys there’s more work to be done. I was referring to a single tree structure changes.

]]>
By: Shlomi Noach http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/#comment-17761 Shlomi Noach Mon, 08 Feb 2010 18:12:51 +0000 http://www.xaprb.com/blog/?p=1611#comment-17761 Baron,

“…an insert into a fully defragmented index could cause worst-case page splitting and tree rebalancing…”

1. Worst case tree splitting involves N page splits, when N is the depth of the tree; so that’s usually up to 3-4 on numeric primary key. There is no re-balancing of the tree. A B/B+ tree is balanced by design; the splitting is among the mechanism which keeps it balanced.

2. And, InnoDB saves 1/16 space free. Which allows for such “last minute changes” to work out without splitting.

Disclaimer: I did not read the InnoDB B+ Tree implementation source code.

]]>
By: Xaprb http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/#comment-17760 Xaprb Mon, 08 Feb 2010 17:52:48 +0000 http://www.xaprb.com/blog/?p=1611#comment-17760 Sheeri, I think you’re taking as a given that

a) OPTIMIZE TABLE results in defragmentation, which might not be the case (there is only one primary key, but there can be N secondary indexes, so the table could be only 1/(N+1)th defragmented, leaving aside the plugin’s features which aren’t hooked into OPTIMIZE TABLE); and

b) a defragmented index is optimal, which might not be even remotely the case — an insert into a fully defragmented index could cause worst-case page splitting and tree rebalancing, so maybe the right answer is that the indexes reach an optimal degree of fragmentation on their own and should not be “fixed.”

But I think you agree implicitly that testing is harder than it should be, because in the absence of good instrumentation, the only way to test is to benchmark the server’s actual workload on the server’s actual data.

]]>
By: Sheeri K. Cabral http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/#comment-17759 Sheeri K. Cabral Mon, 08 Feb 2010 17:08:28 +0000 http://www.xaprb.com/blog/?p=1611#comment-17759 Baron,

This is all true. The best advice is “optimize when you will benefit”. The problem is that it involves a lot of testing — if there are frequent deletes or fragmentation-causing updates (not all updates cause fragmentation), you want to test to see what the “sweet spot” is.

Defragmenting every week is definitely excessive for 99.95% of companies out there, even for our clients at Pythian (and your clients at Percona, I’d bet).

As with everything, “how often should I do foo?” should first be changed to “Would I benefit from doing foo?” and then be followed up with testing to see how often is “good”.

]]>