<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Xaprb &#187; Von Neumann bottleneck</title>
	<atom:link href="http://www.xaprb.com/blog/tag/von-neumann-bottleneck/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.xaprb.com/blog</link>
	<description>Stay curious!</description>
	<lastBuildDate>Thu, 09 Feb 2012 10:55:47 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Kickfire: stream-processing SQL queries</title>
		<link>http://www.xaprb.com/blog/2008/04/04/kickfire-stream-processing-sql-queries/</link>
		<comments>http://www.xaprb.com/blog/2008/04/04/kickfire-stream-processing-sql-queries/#comments</comments>
		<pubDate>Fri, 04 Apr 2008 13:01:01 +0000</pubDate>
		<dc:creator>Xaprb</dc:creator>
				<category><![CDATA[SQL]]></category>
		<category><![CDATA[caching]]></category>
		<category><![CDATA[Cg]]></category>
		<category><![CDATA[column store]]></category>
		<category><![CDATA[CPUs]]></category>
		<category><![CDATA[data warehousing]]></category>
		<category><![CDATA[Graphics]]></category>
		<category><![CDATA[Keith Murphy]]></category>
		<category><![CDATA[Kickfire]]></category>
		<category><![CDATA[MPP]]></category>
		<category><![CDATA[MySQL]]></category>
		<category><![CDATA[mysqluc2008]]></category>
		<category><![CDATA[pluggable storage engine]]></category>
		<category><![CDATA[QPU]]></category>
		<category><![CDATA[Von Neumann bottleneck]]></category>

		<guid isPermaLink="false">http://www.xaprb.com/blog/2008/04/04/kickfire-stream-processing-sql-queries/</guid>
		<description><![CDATA[Some of you have noticed Kickfire, a new sponsor at this year&#8217;s MySQL Conference and Expo. Like Keith Murphy, I have been involved with them for a while now. This article explains the basics of how their technology is different from the current state of the art in complex queries on large amounts of data. [...]


<strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2008/04/14/kickfire-relational-algebra-in-a-chip/' rel='bookmark' title='Permanent Link: Kickfire: relational algebra in a chip'>Kickfire: relational algebra in a chip</a></li>
<li><a href='http://www.xaprb.com/blog/2008/04/09/kickfire-is-not-ssd-based/' rel='bookmark' title='Permanent Link: Kickfire is not SSD-based'>Kickfire is not SSD-based</a></li>
<li><a href='http://www.xaprb.com/blog/2009/08/18/how-to-find-un-indexed-queries-in-mysql-without-using-the-log/' rel='bookmark' title='Permanent Link: How to find un-indexed queries in MySQL, without using the log'>How to find un-indexed queries in MySQL, without using the log</a></li>
<li><a href='http://www.xaprb.com/blog/2009/12/31/a-simple-way-to-make-birthday-queries-easier-and-faster/' rel='bookmark' title='Permanent Link: A simple way to make birthday queries easier and faster'>A simple way to make birthday queries easier and faster</a></li>
<li><a href='http://www.xaprb.com/blog/2009/11/01/catching-erroneous-queries-without-mysql-proxy/' rel='bookmark' title='Permanent Link: Catching erroneous queries, without MySQL proxy'>Catching erroneous queries, without MySQL proxy</a></li>
</ul>]]></description>
			<content:encoded><![CDATA[<p>Some of you have noticed <a href="http://www.kickfire.com/">Kickfire</a>, a
new sponsor at this year&#8217;s <a href="http://www.mysqlconf.com/">MySQL Conference and
Expo</a>.  Like <a href="http://www.paragon-cs.com/wordpress/?p=132">Keith
Murphy</a>, I have been involved with them for a while now.  This article
explains the basics of how their technology is different from the current state
of the art in complex queries on large amounts of data.</p>

<p>Kickfire is developing a MySQL appliance that combines a pluggable
storage engine (for MySQL 5.1) with a new kind of chip.  On the surface, the
storage engine is not that revolutionary: it is a column-store engine with data
compression and some other techniques to reduce disk I/O, which is kind of par
for the course in data warehousing today.  The chip is the
really exciting part of the technology.</p>

<p>The simplest description of their chip is that it runs SQL natively.</p>

<p>OK, but now you need to do something: <em>get &#8220;SQL chip&#8221; out of your mind</em>.  It
doesn&#8217;t work the way you think it does, and your pre-conceived ideas may prevent
you from understanding how different this really is.  (Everyone says their
technology is a paradigm shift, so I expect you to be numb to this phrase.)</p>

<p>I can&#8217;t explain all of the technology in this post,
partially because of NDA, but I want to prepare you for when you do hear the
details.  If you&#8217;re like me, you&#8217;ll miss a lot of stuff because you have tunnel
vision, and then you&#8217;ll say &#8220;wait, I get it now!  Can you please repeat
everything you&#8217;ve been saying for the last hour so I can think about it all over
again?&#8221;</p>

<h3>An important note</h3>

<p><strong>Very important:</strong> I have not seen this technology, tasted it,
smelled it, or benchmarked it.  This information is based on discussions with
their engineering and other staff.  I will not pretend
to know anything I don&#8217;t. I will be spending two days in the lab with the engineers next
week, and then I will be able to write in greater detail with more 
confidence.</p>

<h3>How your computer currently works</h3>

<p>To understand how Kickfire&#8217;s chip works, you need to understand something you
probably take for granted: how most chips work.  Most computers today use the
same architecture they always have: there&#8217;s data that is held in the CPU, and
data that is not.  The CPU has registers, which hold a miniscule bit of data &#8211;
the data it is currently working with.  When the CPU processes an instruction
that asks for some more data it doesn&#8217;t have, the CPU has to go fetch it.  In
the meantime, the instruction can&#8217;t complete.</p>

<p>As you might imagine, this is not terribly efficient.  Fetching data that&#8217;s
not in the CPU can take hundreds of CPU cycles (or more).  To work around this,
computer architects have developed a hierarchy of caches: the on-chip cache, the
main memory, and the hard drive, to name a few.  The caches make it faster to
get data when it&#8217;s not already on hand.  And modern chips have a pipeline, too.
The pipeline looks at the instructions as they flow towards the CPU, tries to
predict which data they&#8217;re going to need, then pre-fetches it.</p>

<p>In the best case, this works okay.  Not always &#8212; for example, the Pentium 4
has a very long pipeline, so the cost of a wrong branch prediction is very high.
Another case is when you simply need a lot of data, such as tens of gigabytes.
Suppose for your 10GB operation, you&#8217;re only going to look at each byte once (a
common occurrence in data warehousing queries).  This renders your caches
useless, because caches work on the principle that you&#8217;re likely to look at
recently accessed data again soon.</p>

<p>In these cases, the speed of the computation is constrained by the <a
href="http://en.wikipedia.org/wiki/Von_Neumann_architecture">Von Neumann
bottleneck</a>: the inefficient fetch-compute-wait cycle of constantly
going to the memory (or disk) for more data, a teeny bit at a time.  Remember,
even in-memory data is very slow compared to data that&#8217;s in the registers.
Having a lot of fast memory is not a <strong>solution</strong> to the Von
Neumann bottleneck.  It&#8217;s a <strong>workaround</strong> to reduce the cost.</p>

<h3>Kickfire&#8217;s architecture</h3>

<p>Kickfire is designed to work well where today&#8217;s general-purpose computing
architectures run queries slowly because they&#8217;re sitting on their thumbs much of
the time.  Think data warehousing: complex queries with lots of data.</p>

<p>What is the industry&#8217;s answer to this?  So-called massively
parallel processing, or MPP.  Current MPP data-warehousing solutions are special-purpose
database software that runs queries on dozens or hundreds of CPUs, which occupy
a lot of storage space and require lots of power, hardware, and
cooling.  &#8220;If you throw enough Von Neumann machines at the problem
simultaneously, they can answer your questions faster,&#8221; or so the thinking goes.
In other words, the current state of the art is to arrange conventional
computers in new ways.</p>

<p>Kickfire takes the opposite approach: <em>stream processing</em>.  This is a
fundamentally different computing architecture.  Stream processing is to Von
Neumann machines as LISP is to C.</p>

<p>For those of you who aren&#8217;t LISP programmers, here&#8217;s another analogy: In
stream processing, you take a bunch of data and you shove it through the chip
without stopping.  Rather than the chip asking for data from the storage
subsystem as needed, the data actually gets pushed at the chip.  That is, it&#8217;s
push-processing instead of the conventional pull-processing.</p>

<p>Conventional processing is like trying to fill your bathtub
from the sink with a paper cup.  Stream processing is like putting your tub
under the sink and opening the drain.</p>

<p>I&#8217;m taking some liberties here, to illustrate the differences.  As I said, I
haven&#8217;t seen the wiring diagrams of the Kickfire chip.  But hopefully you get
the concept.</p>

<p>This is not a new idea.  If you&#8217;ve worked with modern graphics cards, you&#8217;ve
seen this in action.  Programming languages like <a
href="http://en.wikipedia.org/wiki/Cg_%28programming_language%29">Cg</a> express
the stream-processing concepts elegantly.  If you&#8217;ve ever been in a classroom
full of C++ programmers trying to learn Cg, you&#8217;ve seen how hard it is to grasp
this different approach.  Essentially, graphics programming on one of these
chips is a series of transformations, not a series of instructions.  You input
some vertexes at one end of the processor, and you tell the chip to do some
matrix multiplies and so on.  Out pops the result at the other end.</p>

<p>If this doesn&#8217;t sound much different from instructions&#8230; well, meditate on
it.  It&#8217;s like an assembly line, but nobody leaves their station along the
conveyor belt.  In a traditional CPU, the &#8220;person&#8221; at the conveyor
<em>constantly</em> leaves to go get the materials he needs.</p>

<p>Kickfire runs in commodity hardware, and it is just one or two servers, not
racks full.  Like many other systems designed for large amounts of data, it uses
a column data store.  Unlike many other systems, it uses an industry standard
interconnect and a custom pluggable MySQL storage engine.</p>

<h3>What took so long?</h3>

<p>Stream processing is the obvious way to run SQL queries.  Some readers may
never have thought about it this way, but my guess is that a lot of you already
think of SQL in a stream-processing way, even though you might know that
computers today really implement it in conventional ways.  I have always tried
to think of it this way, and I <a
href="http://www.xaprb.com/blog/2005/10/03/understanding-sql-joins/">always try
to explain SQL as a stream</a>, too.</p>

<p>So when I was on a call with the Kickfire engineers and it finally sunk in, I
felt really silly.  Why didn&#8217;t I think of that?  It&#8217;s so obvious.</p>

<p>But then again, most breakthroughs are really obvious in hindsight.</p>

<h3>Performance</h3>

<p>I have seen initial benchmark results, but I&#8217;m under NDA about them.  I can&#8217;t
say any more yet.  And I haven&#8217;t run any benchmarks myself yet, nor have I had
access to the hardware.  So this is all theoretical until I get my hands on the
system.  Caveat emptor, your mileage may vary, etc etc.</p>

<p>One thing I&#8217;m interested in is how well the system performs for general-purpose
queries.  When you take it away from complex queries on lots of data, does it still have
an advantage?  I&#8217;ll be trying to get an answer to that question next week.</p>

<h3>About Kickfire</h3>

<p>They are still in stealth mode and my NDA prevents me from being able to
tell you a lot or answer all your questions yet.  But someday they will no
longer be in stealth mode, and you&#8217;ll find out everything you want to then.</p>

<p>Hint: they are going to be giving a <a
href="http://en.oreilly.com/mysql2008/public/schedule/detail/3286">keynote
address</a> on their technology, but there&#8217;s not much detail in the description.
Come to the keynote and find out.</p>

<h3>Why am I writing this?</h3>

<p>Well, they promised me chocolate&#8230;</p>

<p>Seriously: I do have an agenda, but there are actually several motivations
here.  The first is that they initially contacted me because of my involvement
with the MySQL community.  Of course they&#8217;re hoping to gain publicity through
me, but they also wanted to let the community have some input.  I&#8217;ve been sort
of a secret liason for you, representing your interests to Kickfire.  I&#8217;ve
advocated pretty strongly for certain things I&#8217;ll go into in a later post.</p>

<p>The other reason I&#8217;m working with them is that I&#8217;m excited about their
technology, even though I don&#8217;t have hard evidence about their claims and
benchmarks yet.  If what they&#8217;re saying is true, their product will be very good
for the environment.  It will let people save a lot of energy (power, cooling,
the need to build data centers) and it will help avoid the need to build a bunch
of servers.  Computers are extremely
toxic to manufacture.</p>

<p>I&#8217;m also interested in seeing them succeed because I anticipate that even if
this product isn&#8217;t what it claims to be, they&#8217;ll prove the concept and there
will be a competitive rush into this space.  That is guaranteed to produce a lot
of changes in how people build computers, probably in more areas than just data
warehousing.  So I&#8217;m happy that they&#8217;re starting this, because others will
finish it whether they do or not.  And that&#8217;s good news for the environment,
too.</p>

<p>Stay tuned.  More details are forthcoming.</p>

<p>PS: if you have questions you&#8217;d like me to look into while I&#8217;m onsite with the engineers, feel free to post them in the comments.  But I probably can&#8217;t answer them yet.</p>

<p><strong>Further Reading:</strong><ul><li><a href='http://www.xaprb.com/blog/2008/04/14/kickfire-relational-algebra-in-a-chip/' rel='bookmark' title='Permanent Link: Kickfire: relational algebra in a chip'>Kickfire: relational algebra in a chip</a></li>
<li><a href='http://www.xaprb.com/blog/2008/04/09/kickfire-is-not-ssd-based/' rel='bookmark' title='Permanent Link: Kickfire is not SSD-based'>Kickfire is not SSD-based</a></li>
<li><a href='http://www.xaprb.com/blog/2009/08/18/how-to-find-un-indexed-queries-in-mysql-without-using-the-log/' rel='bookmark' title='Permanent Link: How to find un-indexed queries in MySQL, without using the log'>How to find un-indexed queries in MySQL, without using the log</a></li>
<li><a href='http://www.xaprb.com/blog/2009/12/31/a-simple-way-to-make-birthday-queries-easier-and-faster/' rel='bookmark' title='Permanent Link: A simple way to make birthday queries easier and faster'>A simple way to make birthday queries easier and faster</a></li>
<li><a href='http://www.xaprb.com/blog/2009/11/01/catching-erroneous-queries-without-mysql-proxy/' rel='bookmark' title='Permanent Link: Catching erroneous queries, without MySQL proxy'>Catching erroneous queries, without MySQL proxy</a></li>
</ul>]]></content:encoded>
			<wfw:commentRss>http://www.xaprb.com/blog/2008/04/04/kickfire-stream-processing-sql-queries/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>

