<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: When to use surrogate keys in InnoDB tables</title>
	<atom:link href="http://www.xaprb.com/blog/2006/05/10/when-to-avoid-and-when-to-use-surrogate-keys-in-innodb-tables/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.xaprb.com/blog/2006/05/10/when-to-avoid-and-when-to-use-surrogate-keys-in-innodb-tables/</link>
	<description>Stay curious!</description>
	<pubDate>Tue, 06 Jan 2009 02:40:42 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
		<item>
		<title>By: David Phillips</title>
		<link>http://www.xaprb.com/blog/2006/05/10/when-to-avoid-and-when-to-use-surrogate-keys-in-innodb-tables/#comment-14075</link>
		<dc:creator>David Phillips</dc:creator>
		<pubDate>Mon, 10 Dec 2007 22:17:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=128#comment-14075</guid>
		<description>Murray,

As Xaprb said, your second insert is slow because you are inserting in a random order.  Before loading, sort your data in primary key order and the two inserts should run in similar time.</description>
		<content:encoded><![CDATA[<p>Murray,</p>
<p>As Xaprb said, your second insert is slow because you are inserting in a random order.  Before loading, sort your data in primary key order and the two inserts should run in similar time.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Marcus Schwartz &#187; Blog Archive &#187; Optimizing MySQL Indexes</title>
		<link>http://www.xaprb.com/blog/2006/05/10/when-to-avoid-and-when-to-use-surrogate-keys-in-innodb-tables/#comment-13731</link>
		<dc:creator>Marcus Schwartz &#187; Blog Archive &#187; Optimizing MySQL Indexes</dc:creator>
		<pubDate>Fri, 30 Nov 2007 18:49:46 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=128#comment-13731</guid>
		<description>[...] older article from the same site provides a bit more [...]</description>
		<content:encoded><![CDATA[<p>[...] older article from the same site provides a bit more [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Xaprb</title>
		<link>http://www.xaprb.com/blog/2006/05/10/when-to-avoid-and-when-to-use-surrogate-keys-in-innodb-tables/#comment-4674</link>
		<dc:creator>Xaprb</dc:creator>
		<pubDate>Thu, 01 Mar 2007 00:32:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=128#comment-4674</guid>
		<description>&lt;p&gt;I would say the choice of (epoch, id) is probably not good for your primary key.  You're inserting random data into epoch, and the table is clustered epoch-first, so you're going to cause lots of page splits and b-tree re-balancings as you insert the data.  That alone might explain the slowness.&lt;/p&gt;

&lt;p&gt;If insert speed is your priority, I'd stick with the first table design.  But now that you've got the data into the second table, test how fast the queries run against it.  You've just artificially unique-ified a non-unique value, and it might be a good trade-off.&lt;/p&gt;

&lt;p&gt;Use SHOW TABLE STATUS to see how big the table and index data is.  Compare the two designs.  The second will probably be quite a bit bigger.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>I would say the choice of (epoch, id) is probably not good for your primary key.  You&#8217;re inserting random data into epoch, and the table is clustered epoch-first, so you&#8217;re going to cause lots of page splits and b-tree re-balancings as you insert the data.  That alone might explain the slowness.</p>
<p>If insert speed is your priority, I&#8217;d stick with the first table design.  But now that you&#8217;ve got the data into the second table, test how fast the queries run against it.  You&#8217;ve just artificially unique-ified a non-unique value, and it might be a good trade-off.</p>
<p>Use SHOW TABLE STATUS to see how big the table and index data is.  Compare the two designs.  The second will probably be quite a bit bigger.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Murray</title>
		<link>http://www.xaprb.com/blog/2006/05/10/when-to-avoid-and-when-to-use-surrogate-keys-in-innodb-tables/#comment-4670</link>
		<dc:creator>Murray</dc:creator>
		<pubDate>Wed, 28 Feb 2007 21:01:45 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=128#comment-4670</guid>
		<description>&lt;p&gt;This is perhaps just me trying to be too smart. Or missing the obvious.  But...&lt;/p&gt;

&lt;p&gt;I picked up on clustered indexes from your "case study in profiling queries in MySQL".  &lt;/p&gt;

&lt;p&gt;Then I read the manual.  Then I came up with this idea.  Instead of:&lt;/p&gt;

&lt;pre&gt;CREATE TABLE impressions (
  id int NOT NULL auto_increment,
  epoch int NOT NULL,
  count int NOT NULL,
  PRIMARY KEY (id),
  INDEX epoch_idx (epoch)
) ENGINE=InnoDB;&lt;/pre&gt;

&lt;p&gt;I could instead do:&lt;/p&gt;

&lt;pre&gt;CREATE TABLE impressions (
  id int NOT NULL auto_increment,
  epoch int NOT NULL,
  count int NOT NULL,
  PRIMARY KEY (epoch, id),
  UNIQUE INDEX id_idx (id)
) ENGINE=InnoDB;&lt;/pre&gt;

&lt;p&gt;I need the id column to join with other tables, and there can be multiple rows with the same epoch, so epoch cannot be unique.&lt;/p&gt;

&lt;p&gt;The vast majority of my queries are operating on a subset of the impressions table and the vast majority of those use epoch to pick the subset.&lt;/p&gt;

&lt;p&gt;I then created some sample data using the following perl:&lt;/p&gt;

&lt;pre&gt;my $time = time;
for (my $i = 1; $i &lt; 250000 + 1; $i++) {
    my $r1 = $time - int(rand(100000));
    my $r2 = int(rand(10000));
    print "$i\t$r1\t$r2\n";
}&lt;/pre&gt;

&lt;p&gt;And then used:&lt;/p&gt;

&lt;pre&gt;LOAD DATA INFILE 'sample.sql' INTO TABLE impressions;&lt;/pre&gt;

&lt;p&gt;Note that I'm using exactly the same data on both tables.  On my Macbook Pro, the first table loaded in about 6.19 second.  The second took 70.16 seconds.  Worse yet, as the number of rows increased, the times for the second table appeared to be growing exponentially!&lt;/p&gt;

&lt;p&gt;My current working hypothesis is that the extra size used for the clustered index in the second table is causing the problems.  Aka your "exceptions to the rule."  But I'm frankly not sure and a little curious.  If I shifted the epoch to be seconds since 1st Jan 2000 would that help?   Or if I carefully managed my insert process so that I appended a dot followed by an counter to multiple rows with the same epoch - would that help?&lt;/p&gt;

&lt;p&gt;How big is too big for the primary key in an InnoDB table?&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>This is perhaps just me trying to be too smart. Or missing the obvious.  But&#8230;</p>
<p>I picked up on clustered indexes from your &#8220;case study in profiling queries in MySQL&#8221;.  </p>
<p>Then I read the manual.  Then I came up with this idea.  Instead of:</p>
<pre>CREATE TABLE impressions (
  id int NOT NULL auto_increment,
  epoch int NOT NULL,
  count int NOT NULL,
  PRIMARY KEY (id),
  INDEX epoch_idx (epoch)
) ENGINE=InnoDB;</pre>
<p>I could instead do:</p>
<pre>CREATE TABLE impressions (
  id int NOT NULL auto_increment,
  epoch int NOT NULL,
  count int NOT NULL,
  PRIMARY KEY (epoch, id),
  UNIQUE INDEX id_idx (id)
) ENGINE=InnoDB;</pre>
<p>I need the id column to join with other tables, and there can be multiple rows with the same epoch, so epoch cannot be unique.</p>
<p>The vast majority of my queries are operating on a subset of the impressions table and the vast majority of those use epoch to pick the subset.</p>
<p>I then created some sample data using the following perl:</p>
<pre>my $time = time;
for (my $i = 1; $i < 250000 + 1; $i++) {
    my $r1 = $time - int(rand(100000));
    my $r2 = int(rand(10000));
    print "$i\t$r1\t$r2\n";
}</pre>
<p>And then used:</p>
</pre><pre>LOAD DATA INFILE 'sample.sql' INTO TABLE impressions;</pre>
<p>Note that I&#8217;m using exactly the same data on both tables.  On my Macbook Pro, the first table loaded in about 6.19 second.  The second took 70.16 seconds.  Worse yet, as the number of rows increased, the times for the second table appeared to be growing exponentially!</p>
<p>My current working hypothesis is that the extra size used for the clustered index in the second table is causing the problems.  Aka your &#8220;exceptions to the rule.&#8221;  But I&#8217;m frankly not sure and a little curious.  If I shifted the epoch to be seconds since 1st Jan 2000 would that help?   Or if I carefully managed my insert process so that I appended a dot followed by an counter to multiple rows with the same epoch - would that help?</p>
<p>How big is too big for the primary key in an InnoDB table?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: 9rules Featured &#187; Blog Archive &#187; Featured blogger: Xaprb</title>
		<link>http://www.xaprb.com/blog/2006/05/10/when-to-avoid-and-when-to-use-surrogate-keys-in-innodb-tables/#comment-434</link>
		<dc:creator>9rules Featured &#187; Blog Archive &#187; Featured blogger: Xaprb</dc:creator>
		<pubDate>Wed, 17 May 2006 05:28:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.xaprb.com/blog/?p=128#comment-434</guid>
		<description>&lt;p&gt;[...] When to use surrogate keys in InnoDB tables [...]&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>[...] When to use surrogate keys in InnoDB tables [...]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
