<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How often should you use OPTIMIZE TABLE?</title>
	<atom:link href="http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/</link>
	<description>Stay curious!</description>
	<lastBuildDate>Mon, 06 Sep 2010 10:31:52 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Matt Corgan</title>
		<link>http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/#comment-17786</link>
		<dc:creator>Matt Corgan</dc:creator>
		<pubDate>Wed, 10 Feb 2010 04:42:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=1611#comment-17786</guid>
		<description>Does anyone know what happens to a table of randomly inserted PKs as it grows over time in InnoDB?  I&#039;m not worried about the secondary indexes, just trying to understand the clustered index.  Here is one possible interpretation... where does it go awry?

- as pages in the buffer pool fill up, they are split in memory
- split pages are flushed to the WAL and/or doublewrite buffer
- when the buffer pool fills up, some pages are flushed.  Which ones?
- existing pages are written back to where they already reside?
- new pages are sorted and written out sequentially in new 1MB &quot;extents&quot;?

Is that even close?  Seems like you&#039;d get some extremely fragmented tables, which has actually been my experience.  I do actually try to defragment every few months and see orders of magnitude speed-ups on sequential scans.

I&#039;ve always wondered why there&#039;s no background defragmentation process.  That seems extremely valuable compared to all the TPS optimizations i see published.  Do other databases have that?  It would be great if it vacuumed data into 10-100MB sequences when IO capacity is available.</description>
		<content:encoded><![CDATA[<p>Does anyone know what happens to a table of randomly inserted PKs as it grows over time in InnoDB?  I&#8217;m not worried about the secondary indexes, just trying to understand the clustered index.  Here is one possible interpretation&#8230; where does it go awry?</p>
<p>- as pages in the buffer pool fill up, they are split in memory<br />
- split pages are flushed to the WAL and/or doublewrite buffer<br />
- when the buffer pool fills up, some pages are flushed.  Which ones?<br />
- existing pages are written back to where they already reside?<br />
- new pages are sorted and written out sequentially in new 1MB &#8220;extents&#8221;?</p>
<p>Is that even close?  Seems like you&#8217;d get some extremely fragmented tables, which has actually been my experience.  I do actually try to defragment every few months and see orders of magnitude speed-ups on sequential scans.</p>
<p>I&#8217;ve always wondered why there&#8217;s no background defragmentation process.  That seems extremely valuable compared to all the TPS optimizations i see published.  Do other databases have that?  It would be great if it vacuumed data into 10-100MB sequences when IO capacity is available.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shlomi Noach</title>
		<link>http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/#comment-17762</link>
		<dc:creator>Shlomi Noach</dc:creator>
		<pubDate>Mon, 08 Feb 2010 18:16:11 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=1611#comment-17762</guid>
		<description>(cotinuing my last comment)
Of course, the number of page splits assumption above only relates to one index; when there are multiple keys there&#039;s more work to be done. I was referring to a single tree structure changes.</description>
		<content:encoded><![CDATA[<p>(cotinuing my last comment)<br />
Of course, the number of page splits assumption above only relates to one index; when there are multiple keys there&#8217;s more work to be done. I was referring to a single tree structure changes.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shlomi Noach</title>
		<link>http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/#comment-17761</link>
		<dc:creator>Shlomi Noach</dc:creator>
		<pubDate>Mon, 08 Feb 2010 18:12:51 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=1611#comment-17761</guid>
		<description>Baron,

&quot;...an insert into a fully defragmented index could cause worst-case page splitting and tree rebalancing...&quot;

1. Worst case tree splitting involves N page splits, when N is the depth of the tree; so that&#039;s usually up to 3-4 on numeric primary key. There is no re-balancing of the tree. A B/B+ tree is balanced by design; the splitting is among the mechanism which keeps it balanced.

2. And, InnoDB saves 1/16 space free. Which allows for such &quot;last minute changes&quot; to work out without splitting.

Disclaimer: I did not read the InnoDB B+ Tree implementation source code.</description>
		<content:encoded><![CDATA[<p>Baron,</p>
<p>&#8220;&#8230;an insert into a fully defragmented index could cause worst-case page splitting and tree rebalancing&#8230;&#8221;</p>
<p>1. Worst case tree splitting involves N page splits, when N is the depth of the tree; so that&#8217;s usually up to 3-4 on numeric primary key. There is no re-balancing of the tree. A B/B+ tree is balanced by design; the splitting is among the mechanism which keeps it balanced.</p>
<p>2. And, InnoDB saves 1/16 space free. Which allows for such &#8220;last minute changes&#8221; to work out without splitting.</p>
<p>Disclaimer: I did not read the InnoDB B+ Tree implementation source code.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Xaprb</title>
		<link>http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/#comment-17760</link>
		<dc:creator>Xaprb</dc:creator>
		<pubDate>Mon, 08 Feb 2010 17:52:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=1611#comment-17760</guid>
		<description>Sheeri, I think you&#039;re taking as a given that

a) OPTIMIZE TABLE results in defragmentation, which might not be the case (there is only one primary key, but there can be N secondary indexes, so the table could be only 1/(N+1)th defragmented, leaving aside the plugin&#039;s features which aren&#039;t hooked into OPTIMIZE TABLE); and

b) a defragmented index is optimal, which might not be even remotely the case -- an insert into a fully defragmented index could cause worst-case page splitting and tree rebalancing, so maybe the right answer is that the indexes reach an optimal degree of fragmentation on their own and should not be &quot;fixed.&quot;

But I think you agree implicitly that testing is harder than it should be, because in the absence of good instrumentation, the only way to test is to benchmark the server&#039;s actual workload on the server&#039;s actual data.</description>
		<content:encoded><![CDATA[<p>Sheeri, I think you&#8217;re taking as a given that</p>
<p>a) OPTIMIZE TABLE results in defragmentation, which might not be the case (there is only one primary key, but there can be N secondary indexes, so the table could be only 1/(N+1)th defragmented, leaving aside the plugin&#8217;s features which aren&#8217;t hooked into OPTIMIZE TABLE); and</p>
<p>b) a defragmented index is optimal, which might not be even remotely the case &#8212; an insert into a fully defragmented index could cause worst-case page splitting and tree rebalancing, so maybe the right answer is that the indexes reach an optimal degree of fragmentation on their own and should not be &#8220;fixed.&#8221;</p>
<p>But I think you agree implicitly that testing is harder than it should be, because in the absence of good instrumentation, the only way to test is to benchmark the server&#8217;s actual workload on the server&#8217;s actual data.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Sheeri K. Cabral</title>
		<link>http://www.xaprb.com/blog/2010/02/07/how-often-should-you-use-optimize-table/#comment-17759</link>
		<dc:creator>Sheeri K. Cabral</dc:creator>
		<pubDate>Mon, 08 Feb 2010 17:08:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=1611#comment-17759</guid>
		<description>Baron,

This is all true.  The best advice is &quot;optimize when you will benefit&quot;.  The problem is that it involves a lot of testing -- if there are frequent deletes or fragmentation-causing updates (not all updates cause fragmentation), you want to test to see what the &quot;sweet spot&quot; is.

Defragmenting every week is definitely excessive for 99.95% of companies out there, even for our clients at Pythian (and your clients at Percona, I&#039;d bet).

As with everything, &quot;how often should I do foo?&quot; should first be changed to &quot;Would I benefit from doing foo?&quot; and then be followed up with testing to see how often is &quot;good&quot;.</description>
		<content:encoded><![CDATA[<p>Baron,</p>
<p>This is all true.  The best advice is &#8220;optimize when you will benefit&#8221;.  The problem is that it involves a lot of testing &#8212; if there are frequent deletes or fragmentation-causing updates (not all updates cause fragmentation), you want to test to see what the &#8220;sweet spot&#8221; is.</p>
<p>Defragmenting every week is definitely excessive for 99.95% of companies out there, even for our clients at Pythian (and your clients at Percona, I&#8217;d bet).</p>
<p>As with everything, &#8220;how often should I do foo?&#8221; should first be changed to &#8220;Would I benefit from doing foo?&#8221; and then be followed up with testing to see how often is &#8220;good&#8221;.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
