<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.2.2" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/">
<channel>
	<title>Comments on: Why large IN clauses are problematic</title>
	<link>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/</link>
	<description>Stay curious!</description>
	<pubDate>Fri, 29 Aug 2008 04:57:42 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.2.2</generator>

	<item>
		<title>By: Xaprb</title>
		<link>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-14749</link>
		<author>Xaprb</author>
		<pubDate>Mon, 16 Jun 2008 16:05:36 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-14749</guid>
		<description>I think I wrote this post when I was seeing a particular problem, and before I'd started to see the problems with temp tables.  So my dogma switched as I learned :-)</description>
		<content:encoded><![CDATA[<p>I think I wrote this post when I was seeing a particular problem, and before I&#8217;d started to see the problems with temp tables.  So my dogma switched as I learned :-)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dewey</title>
		<link>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-14747</link>
		<author>Dewey</author>
		<pubDate>Mon, 16 Jun 2008 14:36:17 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-14747</guid>
		<description>Baron....I thought you were opposed to using temp tables?  I love them (especially memory) as a method to get variable data from the app layer with no schema dependency.   I'd love to see a post from you regarding when you find them risky, and when this approach is ok.</description>
		<content:encoded><![CDATA[<p>Baron&#8230;.I thought you were opposed to using temp tables?  I love them (especially memory) as a method to get variable data from the app layer with no schema dependency.   I&#8217;d love to see a post from you regarding when you find them risky, and when this approach is ok.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben Truitt</title>
		<link>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-14697</link>
		<author>Ben Truitt</author>
		<pubDate>Wed, 04 Jun 2008 21:57:52 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-14697</guid>
		<description>I just wanted to share additional news on this.

 My DBA had a really hard time believing my solution in the first email in this thread should be faster.  So after digging some, he ran an ANALYZE on the tables in question.

 This helps the PostgreSQL planner out to create more optimal query plans.

 It turns out that after analyze the original query that uses an "IN" takes 1ms.

 So we went from:
IN Clause: ~4500ms
Inner Join: ~300ms
Analyze with IN Clause: ~1ms

Holy cow.

The lesson is: use explain, use analyze, and consult your DBAs. They are your friends.</description>
		<content:encoded><![CDATA[<p>I just wanted to share additional news on this.</p>
<p> My DBA had a really hard time believing my solution in the first email in this thread should be faster.  So after digging some, he ran an ANALYZE on the tables in question.</p>
<p> This helps the PostgreSQL planner out to create more optimal query plans.</p>
<p> It turns out that after analyze the original query that uses an &#8220;IN&#8221; takes 1ms.</p>
<p> So we went from:<br />
IN Clause: ~4500ms<br />
Inner Join: ~300ms<br />
Analyze with IN Clause: ~1ms</p>
<p>Holy cow.</p>
<p>The lesson is: use explain, use analyze, and consult your DBAs. They are your friends.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ben Truitt</title>
		<link>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-14696</link>
		<author>Ben Truitt</author>
		<pubDate>Wed, 04 Jun 2008 16:13:42 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-14696</guid>
		<description>In postgreSQL, the suggestion to use inner joins can result in dramatic improvements, because postgreSQL does use ORs as described and proved above.

In our case, we went from ~4500ms for the query to ~300ms

An order of magnitude improvement!

Thanks for the great post!!</description>
		<content:encoded><![CDATA[<p>In postgreSQL, the suggestion to use inner joins can result in dramatic improvements, because postgreSQL does use ORs as described and proved above.</p>
<p>In our case, we went from ~4500ms for the query to ~300ms</p>
<p>An order of magnitude improvement!</p>
<p>Thanks for the great post!!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: rcolmegna</title>
		<link>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-2378</link>
		<author>rcolmegna</author>
		<pubDate>Sun, 05 Nov 2006 09:22:20 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-2378</guid>
		<description>&lt;p&gt;On a postgreSQL-DB (8.1), I have this results (1,000,000 of records):&lt;/p&gt;

&lt;p&gt;search 1000 recs via:&lt;/p&gt;

&lt;pre&gt;IN: 270ms
OR: 270ms
JOIN: 200ms (excluded INSERT time)&lt;/pre&gt;

&lt;p&gt;BASH script:&lt;/p&gt;

&lt;pre&gt;#!/bin/bash

M=1000
SQLB="SELECT AVG(id) FROM x ";
s=$[1000000/$M]
v=0

#
# IN test
#
SQL="$SQLB WHERE id IN("
for((i=1;i&#60;$M;i++)); do
  SQL="$SQL$v,"
  v=$[$v+$s]
done

SQL="$SQL$v);"
time /usr/bin/psql -d test -c "$SQL"

#
# OR test
#
v=0
SQL="$SQLB WHERE "

for((i=1;i&#60;$M;i++)); do
  SQL="$SQL id=$v OR"
  v=$[$v+$s]
done

SQL=$SQL" id=$v;"
time /usr/bin/psql -d test -c "$SQL"

#
# JOIN table test
#
v=0
SQL="DROP TABLE tx;CREATE TABLE tx (idFK INTEGER);"
SQL=$SQL"BEGIN; "

for((i=1;i&#60;$M;i++)); do
  SQL=$SQL"INSERT INTO tx VALUES($v);"
  v=$[$v+$s]
done

SQL=$SQL"INSERT INTO tx VALUES($v);COMMIT;"
psql -d test -c "$SQL"
SQL="$SQLB INNER JOIN tx ON tx.idFK=x.id"
time /usr/bin/psql -d test -c "$SQL"&lt;/pre&gt;

&lt;p&gt;Roberto Colmegna&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>On a postgreSQL-DB (8.1), I have this results (1,000,000 of records):</p>
<p>search 1000 recs via:</p>
<pre>IN: 270ms
OR: 270ms
JOIN: 200ms (excluded INSERT time)</pre>
<p>BASH script:</p>
<pre>#!/bin/bash

M=1000
SQLB="SELECT AVG(id) FROM x ";
s=$[1000000/$M]
v=0

#
# IN test
#
SQL="$SQLB WHERE id IN("
for((i=1;i&lt;$M;i++)); do
  SQL="$SQL$v,"
  v=$[$v+$s]
done

SQL="$SQL$v);"
time /usr/bin/psql -d test -c "$SQL"

#
# OR test
#
v=0
SQL="$SQLB WHERE "

for((i=1;i&lt;$M;i++)); do
  SQL="$SQL id=$v OR"
  v=$[$v+$s]
done

SQL=$SQL" id=$v;"
time /usr/bin/psql -d test -c "$SQL"

#
# JOIN table test
#
v=0
SQL="DROP TABLE tx;CREATE TABLE tx (idFK INTEGER);"
SQL=$SQL"BEGIN; "

for((i=1;i&lt;$M;i++)); do
  SQL=$SQL"INSERT INTO tx VALUES($v);"
  v=$[$v+$s]
done

SQL=$SQL"INSERT INTO tx VALUES($v);COMMIT;"
psql -d test -c "$SQL"
SQL="$SQLB INNER JOIN tx ON tx.idFK=x.id"
time /usr/bin/psql -d test -c "$SQL"</pre>
<p>Roberto Colmegna</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: ephes</title>
		<link>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-1404</link>
		<author>ephes</author>
		<pubDate>Wed, 09 Aug 2006 10:51:08 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-1404</guid>
		<description>&lt;p&gt;Hi, today tried to test for myself, if this union-select-join is actually slower or faster for my use case. And it turned out to be faster. But when I tested with really big in-clauses, i saw this:&lt;/p&gt;

&lt;pre&gt;ERROR 1064 (42000): parser stack overflow near '21053740 ) as aids on a.id = aids.aid
  JOIN kategorie k JOIN hersteller h JOIN ' at line 5334
Segmentation fault&lt;/pre&gt;

&lt;p&gt;Now, I couldn't reproduce the segfault, but the parse-error occurred reliably.&lt;/p&gt;

&lt;p&gt;My mysql-version is  5.0.22.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Hi, today tried to test for myself, if this union-select-join is actually slower or faster for my use case. And it turned out to be faster. But when I tested with really big in-clauses, i saw this:</p>
<pre>ERROR 1064 (42000): parser stack overflow near '21053740 ) as aids on a.id = aids.aid
  JOIN kategorie k JOIN hersteller h JOIN ' at line 5334
Segmentation fault</pre>
<p>Now, I couldn&#8217;t reproduce the segfault, but the parse-error occurred reliably.</p>
<p>My mysql-version is  5.0.22.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Xaprb</title>
		<link>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-1368</link>
		<author>Xaprb</author>
		<pubDate>Mon, 07 Aug 2006 17:29:22 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-1368</guid>
		<description>&lt;p&gt;Yes, after further thought I also realize the pain that prompted this article is mainly the maintainability issue.  I'm not averse to using IN(), but the ways I see it commonly used (at leat the ways I notice it -- perhaps I only notice it when I don't like it) are hard to maintain.&lt;/p&gt;

&lt;p&gt;It would be great if you can post an example of the kind of join you mention.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Yes, after further thought I also realize the pain that prompted this article is mainly the maintainability issue.  I&#8217;m not averse to using IN(), but the ways I see it commonly used (at leat the ways I notice it &#8212; perhaps I only notice it when I don&#8217;t like it) are hard to maintain.</p>
<p>It would be great if you can post an example of the kind of join you mention.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-1365</link>
		<author>Anonymous</author>
		<pubDate>Mon, 07 Aug 2006 14:08:22 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-1365</guid>
		<description>&lt;p&gt;Xaprb,&lt;/p&gt;

&lt;p&gt;I love your posts.  However, as others have pointed out, you're offbase wrt INs and mysql.  I'll go one further and say that I have quite often been able to optimize painful joins away with INs at my consulting clients. However, my other finding is that the gain is reversely proportional to the clue coefficient of the engineers who designed the db schema ;]&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Xaprb,</p>
<p>I love your posts.  However, as others have pointed out, you&#8217;re offbase wrt INs and mysql.  I&#8217;ll go one further and say that I have quite often been able to optimize painful joins away with INs at my consulting clients. However, my other finding is that the gain is reversely proportional to the clue coefficient of the engineers who designed the db schema ;]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Xaprb</title>
		<link>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-1147</link>
		<author>Xaprb</author>
		<pubDate>Thu, 20 Jul 2006 01:14:43 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-1147</guid>
		<description>&lt;p&gt;Thanks everyone for your insight.  You're helping me learn a lot.  Plus you're encouraging me not to stop at just noticing something, and I'm reading a lot of source these days.  Sorry for the slow moderation -- I was away for a week.&lt;/p&gt;

&lt;p&gt;Some of my experience is with Microsoft SQL Server, which definitely, at least in SQL Server 2000, did NOT optimize IN well.  I also realize now that I didn't write the article that clearly, distinguishing between all the different types of things that can go into an IN.  There can be lists of numbers, a subselect, lists of columns, and maybe more.  Subselects are not that well optimized in MySQL, as discussed ad nauseum elsewhere.  But it looks like lists of numbers are very well optimized, which is great.&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>Thanks everyone for your insight.  You&#8217;re helping me learn a lot.  Plus you&#8217;re encouraging me not to stop at just noticing something, and I&#8217;m reading a lot of source these days.  Sorry for the slow moderation &#8212; I was away for a week.</p>
<p>Some of my experience is with Microsoft SQL Server, which definitely, at least in SQL Server 2000, did NOT optimize IN well.  I also realize now that I didn&#8217;t write the article that clearly, distinguishing between all the different types of things that can go into an IN.  There can be lists of numbers, a subselect, lists of columns, and maybe more.  Subselects are not that well optimized in MySQL, as discussed ad nauseum elsewhere.  But it looks like lists of numbers are very well optimized, which is great.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: MySQL Developer &#187; Blog Archive &#187; At least four unsuccessful bugfixes in MySQL 5.0.23</title>
		<link>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-1114</link>
		<author>MySQL Developer &#187; Blog Archive &#187; At least four unsuccessful bugfixes in MySQL 5.0.23</author>
		<pubDate>Sun, 16 Jul 2006 16:18:52 +0000</pubDate>
		<guid>http://www.xaprb.com/blog/2006/06/28/why-large-in-clauses-are-problematic/#comment-1114</guid>
		<description>&lt;p&gt;[...] Just a few days ago Baron Schwartz wrote an article on how to replace large IN clauses with a UNION SELECT as a subquery in a JOIN (Why large IN clauses are problematic). Unfortunately this trick will not work anymore in MySQL 5.0.23, at least when you haven&#8217;t selected a default database previously. As we used similar approaches in our code, bug #21002 introduced in 5.0.23 is breaking some of our applications that were running fine since early 4.1 releases of MySQL. [...]&lt;/p&gt;</description>
		<content:encoded><![CDATA[<p>[&#8230;] Just a few days ago Baron Schwartz wrote an article on how to replace large IN clauses with a UNION SELECT as a subquery in a JOIN (Why large IN clauses are problematic). Unfortunately this trick will not work anymore in MySQL 5.0.23, at least when you haven&#8217;t selected a default database previously. As we used similar approaches in our code, bug #21002 introduced in 5.0.23 is breaking some of our applications that were running fine since early 4.1 releases of MySQL. [&#8230;]</p>
]]></content:encoded>
	</item>
</channel>
</rss>
