Tag Archive for 'mysqluc2008'

Baron Schwartz on a podcast at MySQL Conference and Expo 2008

I did an interview with Barton George from Sun while I was at the conference last week. Barton has now posted the interview. If you’re quick, you can listen to it before I do.

Topics: everything and anything, including Maatkit and PostgreSQL.

Technorati Tags:, , , , , ,

You might also like:

  1. Like it or not, it is the MySQL Conference and Expo
  2. Going to PostgreSQL Conference East
  3. My presentations at the 2008 MySQL Conference and Expo
  4. Slides for the innotop workshop at MySQL Conference and Expo 2007
  5. MySQL Conference and Expo 2007 Audio

MySQL Conference and Expo 2008, Day Three

Here’s a rundown of Thursday (day 3) of the MySQL Conference and Expo. This day’s sessions were much more interesting to me than Wednesday’s, and in fact I wanted to go to several of them in a single time slot a couple of times.

Inside the PBXT Storage Engine

This session was, as it sounds, a look at the internals of PBXT, a transactional storage engine for MySQL that has some interesting design techniques. I had been looking forward to this session for a while, and Paul McCullagh’s nice explanations with clear diagrams were a welcome aid to understanding how PBXT works. Unlike some of the other storage engines, PBXT is being developed in full daylight, with an emphasis on community involvement and input. (Indeed, I may be contributing to it myself, in order to make its monitoring and tuning capabilities second to none).

PBXT has not only a unique design, but a clear vision for differentiating itself from other transactional storage engines. It’s not trying to clone any particular engine; Paul and friends are planning to add some capabilities that will really set it apart from other engines, including high-availability features and blob streaming.

I left this session with a much better understanding of how PBXT balances various demands to satisfy all sorts of different workload characteristics, how it writes data, how it achieves transactional durability, and so on. I think these capabilities, and its performance, can really be assessed only in the real world (of course), but in principle it sounds good. I love knowing how things work!

There were about 30 people in the talk. I wish there had been more, because I think PBXT is going to be an important part of the open ecosystem going forward. However, I feel pretty confident people will take more notice if it starts to get used in the real world. Someone had a video camera there, so you might check out the video when it’s available. Paul’s explanations are really good.

Helping InnoDB Scale on Servers with Many CPU Cores and Disks

This session was Mark Callaghan’s chance to unveil the work he and others have been doing on InnoDB’s scalability issues, which mostly revolve around mutex contention. Mark’s team has completely solved the problems on their workload and benchmarks. In fact, after the changes, InnoDB exhibited significantly better performance even than MyISAM, which began to be limited by the single mutex that synchronizes access to its key cache. (Yes, in fact MyISAM has scalability problems too).

Google’s workload for MySQL, in case you’re wondering, is pretty traditional (i.e. not web-like; more like an “enterprise” application). Heavily I/O-bound, 24/7 critical systems, and so on.

Mark also wore several community t-shirts at various points in the talk, including one of my Maatkit t-shirts. Mark said Maatkit would be perfect if only it were written in Python (Google’s preferred scripting language). Alas, Mark, it’ll stay in Perl. But thanks for the nice compliment anyway.

The room was packed full.

Scaling Heavy Concurrent Writes In Real Time

Dathan Pattishall, formerly the lead architect at Flickr, explained his techniques for scaling Flickr’s write capacity. He talked about how he’d worked to reduce primary key sizes, queued writes for batching, separated different types of data into different types of tables, and more. Dathan has never been afraid to do what he thinks is a good idea, even if it flies in the face of “best practices,” so I was happy to finally hear him talk.

By the way, Dathan pointed out that distributed locking with memcached and add() isn’t a silver bullet. It works ok until memcached evicts your lock due to the LRU policy. He uses MySQL’s built-in GET_LOCK() function for locking.

Dathan’s blog is a good source of information about his sometimes unorthodox approaches to database design.

The Power of Lucene

This was the only one of Frank (Farhan) Mashraqi’s talks I got to attend. This was pretty technical: how Lucene works, how to configure and install it, how to index documents, how to execute searches. If you were wondering how much work and complexity it would be to install and use Lucene, this talk would have been good for you to attend; I’ve never used it myself, but I’m pretty sure Frank covered everything you need to know.

Technorati Tags:, , , , , , , , , , ,

You might also like:

  1. Speed up your MySQL replication slaves
  2. How pre-fetching relay logs speeds up MySQL replication slaves
  3. MySQL Conference and Expo 2008, Day Two
  4. Sessions I want to see at the MySQL Conference
  5. Like it or not, it is the MySQL Conference and Expo

MySQL Conference and Expo 2008, Day Two

Day two of the conference was a little disappointing, as far as sessions went. There were several time blocks where I simply wasn’t interested in any of the sessions. Instead, I went to the expo hall and tried to pry straight answers out of sly salespeople. Here’s what I attended.

Paying It Forward: Harnessing the MySQL Contributory Resources

This was a talk focused on how MySQL has made it possible for community members to contribute to MySQL. There was quite a bit of talk about IRC channels, mailing lists, and the like. However, the talk gave short shrift to how MySQL plans to become truly open source (in terms of its development model, not its license). I think there was basically nothing to talk about there. I had a good conversation about some of my concerns with the speaker and some others from MySQL right afterwards.

There was basically nobody there — I didn’t count, but I’d say maybe 10 or 12 people. I think this is a telling sign.

Architecture of Maria: A New Storage Engine with a Transactional Design

I was interested in this talk because I’m interested in the tension between Falcon and Maria (and between Falcon and everything, for that matter) but I left and went to the expo hall again after a bit. The talk was good but I’d already seen and/or read it, and the question-and-answer component wasn’t enough to keep me there.

The MySQL Query Cache

This was the second session I gave at the conference, and again it was standing-room-only, with nearly 300 attendees according to the person who was watching the door. The questions were frequent and added a lot to the discussion. Slides will be on the conference website when they post them.

Grazr: Lessons Learned Building a Web 2.0 Application Using MySQL

I was keenly interested in this talk because a) I am a big fan of Patrick Galbraith’s work with many different projects, and b) I had heard a lot about Grazr but didn’t know much about it. However, I missed most of the talk. About ten minutes into it, I got a call I couldn’t refuse: my wife!

However, I did sneak back into the room for the last bit too. And I gave Grazr a try. Unfortunately, I got really confused by it; I tried a bunch of different ways to import my Google Reader’s OPML. I got that to work, but then I couldn’t figure out how to read the feeds in the OPML via Grazr. Then I think I figured that out (I’m not sure) but it didn’t strike me as a very handy way to read my feeds. I’ll try taking another look at it later if I get time. (I’m all ears if there’s a better way to read feeds).

Extending MySQL

This one was mostly for fun. I knew a lot about UDFs already (I’ve created some) and I knew about the pluggable storage engine API. But I didn’t know about pluggable event daemons. Holy cow, what a great way to shoot yourself (or your server) in the foot! All the power of an atomic bomb, with all the safety of SPF 5 sunblock in a nuclear attack. Or something like that. But darn, it sure is nifty. Brian is a great speaker too — very lively.

You know, there’s another way to extend MySQL that most people don’t seem to know about, which Brian didn’t mention. That is procedures (not stored procedures). They are sort of like a post-filter for a result set, and like UDFs they’ve been around forever. I have never heard of anyone writing their own, but there’s an example in the server itself: PROCEDURE ANALYSE.

Expo hall

I went to the expo hall to meet and greet many of the companies that Percona (my employer) is already working with (doing independent benchmarks, performance verification, analysis etc) or will be in the future. I also wanted to grill some of the vendors on their technology. Usually I find them very cagey; they claim X times faster this-or-that, but won’t tell you how, and won’t tell you what their systems don’t do well. I don’t understand why they take this approach; you can’t hide your system’s strong and weak spots. There is no security through obscurity, and shrewd independent observers are going to get to the bottom of it with or without your permission.

So, for instance, I was talking with Tokutek, who claimed to be a drop-in replacement for InnoDB with 200x better performance and apparently no downsides. However, on closer questioning, I did get him to admit that the system has table-level locking. Thus it won’t give any concurrency, so saying it’s a drop-in InnoDB replacement is questionable. And the comparison against InnoDB seemed contrived to create a worst-case situation with bad tuning and a workload so it would perform terribly. An honest comparison tunes both systems to their highest performance and measures them; you can’t tune one system as badly as possible and compare it to the other’s best-case performance. I pressed on further and asked about range scans in some specific cases (they claim they’re great at range queries, and equal to InnoDB on everything else). At last they admitted they can’t perform well on some very common queries such as real-life queries InnoDB performs very well on for me. They said these are “point queries” but that’s not true; you can design indexes to support many different ways to range-query a table in InnoDB and get great performance. So it sounds to me like Tokutek’s storage format is extremely narrowly focused, and there is indeed a trade-off. I will be interested to see how their technology develops, though. It’s not done yet.

In general

There are a lot of Maatkit t-shirts walking around, which makes me happy. If I’d printed 200 of them, I probably could have given them all away. I was wearing a PostgreSQL t-shirt myself. Proudly, I might add. I’m not the only person here who’s interested in PostgreSQL. This morning I met a person from EnterpriseDB.

Yesterday was a bit slow in terms of interesting sessions, but there was a lot going on in the hallways, the expo hall, the meetings over lunch, and so on.

Technorati Tags:, , , , , , , , , ,

You might also like:

  1. Like it or not, it is the MySQL Conference and Expo
  2. Sessions I want to see at the MySQL Conference
  3. Remember to sign up for MySQL Conference and Expo!
  4. My presentations at the 2008 MySQL Conference and Expo
  5. How pre-fetching relay logs speeds up MySQL replication slaves

Get a free sample chapter of High Performance MySQL Second Edition

If you’re at the MySQL Conference and Expo, you can get a free sample chapter of the upcoming High Performance MySQL Second Edition. Just go to the exhibition area. As you go through the doors, take an immediate left and look for the sample chapter on O’Reilly’s table. It’s a rough draft and contains typos and my incredibly crude drawings instead of those that will go into the final book, but it should serve to give you an idea of the book’s depth and scope. Kudos to Andy Oram, our editor, who was able to get these done for us on very short notice.

Technorati Tags:,

You might also like:

  1. Progress on High Performance MySQL, Second Edition
  2. Coming soon: High Performance MySQL, Second Edition
  3. Progress on High Performance MySQL Backup and Recovery chapter
  4. High Performance MySQL Second Edition Schedule
  5. Progress report on High Performance MySQL, Second Edition

MySQL Conference and Expo 2008, Day One

Today is the first day at the conference (aside from the tutorials, which were yesterday). Here’s what I went to:

New Subquery Optimizations in 6.0

By Sergey Petrunia. This was a similar session to one I went to last year. MySQL has a few cases where subqueries are badly optimized, and this session went into the details of how this is being addressed in MySQL 6.0. There are several new optimization techniques for all types of subqueries, such as inside-out subqueries, materialization, and converting to joins. The optimizations apply to scalar subqueries and subqueries in the FROM clause. Performance results are very good, depending on which data you choose to illustrate. The overall point is that the worst-case subquery nastiness should be resolved. I’m speaking of WHERE NOT IN(SELECT…) and friends. It remains to be seen how this shakes out as 6.0 matures, and what edge cases will pop up.

The Lost Art Of the Self Join

This was just great. Among many other things, Beat Vontobel showed how a Su Doku can be solved entirely with declarative queries: a very large self-join query against a table of digits and a table of the board’s initial state. I had been promoting this session because last year’s was so very good. I can’t wait to see what he comes up with for next year. Can he find another creative idea? Time will tell.

He wasn’t able to solve a 9×9 puzzle with MySQL because of the limitation on the number of joins, but PostgreSQL had no trouble doing it.

EXPLAIN Demystified

This was my session, of course. (Slides will be on the O’Reilly conference site, if they aren’t already). It went great, I thought. The room was full and people were standing in the back of the room and in the door. The questions came fast and furious; all really good questions. I think we ended up exploring a lot of the MySQL query execution method, strengths, and weaknesses by the time we were through. And I gave away all the remaining Maatkit t-shirts. Hopefully the people who took them will wear them tomorrow and the conference will be sea of deep, rich red shirts.

Someone did an audio recording of the session, but I don’t recall who it was.

Investigating InnoDB Scalability Limits

This session was given by Peter Zaitsev (disclosure: I now work for Percona, the company he co-founded). Peter and Vadim Tkachenko spent a lot of time over the last weeks and months running a dizzying array of benchmarks on MySQL 5.0.22, 5.0.51, and 5.1.24 (if I recall the versions correctly). They were able to show InnoDB’s scaling patterns for a number of different micro-benchmarks on a variety of configurations. If you didn’t attend, please look up the slides if you care about InnoDB performance. A lot of work went into the benchmarks — a lot of work. The slides should be on the conference website or on our blog, http://www.mysqlperformanceblog.com/.

Replication Tricks and Tips

Lars Thalmann and Mats Kindahl gave this session. At a high level, I’d say it was a run-down of all the different ways you can use MySQL replication. Replication is really a flexible tool, and they covered a large array of the most important ways you can use it to achieve different purposes. Many of the techniques they mentioned are implemented by various tools in Maatkit. A couple of the others are implemented in MySQL Master Master Manager and MySQL Semi Multi-Master tools. Don’t re-code these! You can save weeks of work and get quality code by using the pre-built tools. (I built Maatkit, so I know exactly how tricky it is to get some of these things right.)

BoF Sessions

I dropped in on a few BoF sessions, including the Sphinx one and the PBXT/Blob Streaming one. (Keep an eye on the PrimeBase folks — they are up to great things.) Ronald Bradford protected me from those who wanted to get me drunk. Hint: it’s really easy… I have to say, though, Monty’s black vodka was amazing.

Speaking of Blob Streaming, Paul McCullagh and I were talking earlier in the day about the project’s name, MyBS. This has been smirked about a few times. I think it’s a great name, because after all my initials are BS (I usually insert one of my four middle names in to alleviate this problem, but I digress). The conversation went like this:

Me: I like it. My initials are BS.

Paul: BS actually means British Standard, so it can’t be bad.

Me: Better than American Standard. That’s a toilet.

We also debated the merits of watching the original move The Blob. It’s a classic. It must be good.

Technorati Tags:, , , , , , , , , , , , , , ,

You might also like:

  1. Like it or not, it is the MySQL Conference and Expo
  2. Speed up your MySQL replication slaves
  3. MySQL Conference and Expo 2008, Day Three
  4. Baron Schwartz on a podcast at MySQL Conference and Expo 2008
  5. MySQL Conference and Expo 2008, Day Two

A different angle on the MySQL Conference

There are quite a few business angles you might see only if you’re here at the conference, and you won’t get from blogs. For example, let’s take a look at the contents of the shoulder bags they hand out with your registration. (This is only a partial list.)

  • SnapLogic’s flyer gets it right: their system is compatible with “GNU Linux.” Hooray, a commercial company acknowledging the GNU operating system for what it is!
  • MySQL Enterprise’s flyer has three big bullet points: MySQL Load Balancer, MySQL Connection Manager, and MySQL Enterprise Monitor Query Analyzer. The first two look like they’re probably built on MySQL Proxy. The last has a visual explain plan feature, which according to an elevator conversation is not yet built. I’ll stop by their booth and see. As you may know, Maatkit has provided a tool (which is designed for integration into other tools) that shows a visual explain plan for a long time.
  • There’s an issue of Linux Journal, which does not get the GNU part right. And it has no articles about MySQL. Off-topic! Discarded!
  • Infobright’s flyer says they can load data nearly real-time. I don’t know how you read it, but to me that says “can’t quite keep up with how fast you generate data.” So… what good can it possibly be, right?
  • The conference bag itself has Zmanda’s logo on the side.
  • Webyog’s flyer has one side for SQLyog, and one for MONyog. Each side takes the sparse but visually appealing approach of shiny icons to present a feature list. My favorite is the “Find slow SQL” turtle.
  • JasperSoft’s flyer has soothing, professional blues and rich reds. It makes them look very trustworthy. (I’m not being snarky.) And they have lots of nice whitespace. It’s a little bit of a different look.
  • Kickfire’s marketing department is really on the ball. I’ve seen a large number of flyers and other materials from them (online and offline) and they just changed their name and created a new logo and look-and-feel a short time ago. How do they do it so fast?
  • O’Reilly has a bunch of half-sized flyers for their conferences. We should have asked them to throw in one about our upcoming book, the second edition of High Performance MySQL. Alas, opportunity lost. By the way, stop by the bookstore and grab a copy of the sample chapter.
  • Zmanda, not content with stamping the outside of the bag, has a half-flyer inside it too, plus a chance to win a Digital Rebel to lure you to their booth. If you’re doing backups the way a lot of people seem to, you might want to stop by their booth anyway…
  • There’s a CD for a free trial of WinSQL. But the CD case doesn’t say what the

Sorry. I have a short attention span.

Technorati Tags:, , , ,

You might also like:

  1. High Performance MySQL 2nd Edition is in production
  2. High Performance MySQL, Second Edition: Advanced SQL Functionality
  3. Progress on High Performance MySQL Backup and Recovery chapter

Sessions I want to see at the MySQL Conference

This year’s conference has a great lineup. As usual, with 8 sessions concurrently, it’s impossible to pick which ones I want to see. However, I did learn a few things from last year’s conference, which I think will help me get more out of it this time.

Number one rule: not all sessions are created equal. I can’t say for sure, but I’m pretty sure that when you see “How Product X Will Scale Your Databases” presented by a person from Company X, you can reasonably suspect that Company X is paying for this privilege, and it’s not really a session as much as a product demo. These sessions were not reviewed and voted on by the community (I know, because I was one of the community members who were asked to review and vote on proposals. Maybe I’m being a whistle-blower and won’t get this honor next year as a result…)

Number two rule: if the description is vague, or if it sounds like regurgitation, I’m skeptical. For example, if the summary starts off by saying “Today’s databases are dealing with more data than ever before. Data is mission-critical to today’s business enterprises” they lost me already. Writing that in a session description betrays thoughtlessness.

There actually are a couple of time slots that I am not really zinged about any of the sessions, and wish that I could see one of the sessions that’s happening while I’m presenting myself instead. But for the most part, there’s more goodness than I can actually take in.

This year the conference website has become Web 2.0ish, in a good way. It lets you browse the schedule, and if you’re logged in, you can “star” the ones you want to see. Then you get a personal calendar of all the ones you’ve starred. Not only that, but when you look at a session, it shows you other sessions that other attendees have also starred. Pretty nice, if you’re trying to figure out which sessions to see.

Here are the sessions I’ve starred, in chronological order. It’s a little too much work for me to link to them all.

  • All Bases Covered: A Hands-on Introduction to High-availability MySQL and DRBD
  • Memcached and MySQL: Everything You Need To Know
  • New Subquery Optimizations in MySQL 6.0
  • The Lost Art of the Self Join
  • EXPLAIN Demystified
  • High Availability Landscape of MySQL
  • Disaster is Inevitable — Are You Prepared?
  • Services Oriented Architecture with PHP and MySQL
  • Database Integrity Protection with MySQL and DRBD
  • Falcon from the Beginning
  • Architecture of Maria: A New Storage Engine with a Transactional Design
  • The MySQL Query Cache
  • Grazr: Lessons Learned Building a Web 2.0 Application Using MySQL
  • Extending MySQL
  • Inside the PBXT Storage Engine
  • Helping InnoDB Scale on Servers with Many CPU Cores and Disks
  • Scaling Heavy Concurrent Writes In Real Time
  • High Availability MySQL with DRBD and Heartbeat: MTV Japan Mobile Services

I might change my mind, but these look like a pretty good start.

Rule three: ask around. You can get the scoop, and it might make you change your mind. For example, would you go see the one about the “Lost Art of the Self Join” if I wasn’t here telling you how much you don’t want to miss that one?

Rule four: go to my sessions *wink*

Technorati Tags:, ,

You might also like:

  1. My presentations at the 2008 MySQL Conference and Expo
  2. How to get your session accepted to MySQL Conference 2008
  3. Summary of beCamp 2008
  4. MySQL Conference and Expo 2008, Day One
  5. MySQL Conference and Expo 2007, Day 1

Kickfire: stream-processing SQL queries

Some of you have noticed Kickfire, a new sponsor at this year’s MySQL Conference and Expo. Like Keith Murphy, I have been involved with them for a while now. This article explains the basics of how their technology is different from the current state of the art in complex queries on large amounts of data.

Kickfire is developing a MySQL appliance that combines a pluggable storage engine (for MySQL 5.1) with a new kind of chip. On the surface, the storage engine is not that revolutionary: it is a column-store engine with data compression and some other techniques to reduce disk I/O, which is kind of par for the course in data warehousing today. The chip is the really exciting part of the technology.

The simplest description of their chip is that it runs SQL natively.

OK, but now you need to do something: get “SQL chip” out of your mind. It doesn’t work the way you think it does, and your pre-conceived ideas may prevent you from understanding how different this really is. (Everyone says their technology is a paradigm shift, so I expect you to be numb to this phrase.)

I can’t explain all of the technology in this post, partially because of NDA, but I want to prepare you for when you do hear the details. If you’re like me, you’ll miss a lot of stuff because you have tunnel vision, and then you’ll say “wait, I get it now! Can you please repeat everything you’ve been saying for the last hour so I can think about it all over again?”

An important note

Very important: I have not seen this technology, tasted it, smelled it, or benchmarked it. This information is based on discussions with their engineering and other staff. I will not pretend to know anything I don’t. I will be spending two days in the lab with the engineers next week, and then I will be able to write in greater detail with more confidence.

How your computer currently works

To understand how Kickfire’s chip works, you need to understand something you probably take for granted: how most chips work. Most computers today use the same architecture they always have: there’s data that is held in the CPU, and data that is not. The CPU has registers, which hold a miniscule bit of data – the data it is currently working with. When the CPU processes an instruction that asks for some more data it doesn’t have, the CPU has to go fetch it. In the meantime, the instruction can’t complete.

As you might imagine, this is not terribly efficient. Fetching data that’s not in the CPU can take hundreds of CPU cycles (or more). To work around this, computer architects have developed a hierarchy of caches: the on-chip cache, the main memory, and the hard drive, to name a few. The caches make it faster to get data when it’s not already on hand. And modern chips have a pipeline, too. The pipeline looks at the instructions as they flow towards the CPU, tries to predict which data they’re going to need, then pre-fetches it.

In the best case, this works okay. Not always — for example, the Pentium 4 has a very long pipeline, so the cost of a wrong branch prediction is very high. Another case is when you simply need a lot of data, such as tens of gigabytes. Suppose for your 10GB operation, you’re only going to look at each byte once (a common occurrence in data warehousing queries). This renders your caches useless, because caches work on the principle that you’re likely to look at recently accessed data again soon.

In these cases, the speed of the computation is constrained by the Von Neumann bottleneck: the inefficient fetch-compute-wait cycle of constantly going to the memory (or disk) for more data, a teeny bit at a time. Remember, even in-memory data is very slow compared to data that’s in the registers. Having a lot of fast memory is not a solution to the Von Neumann bottleneck. It’s a workaround to reduce the cost.

Kickfire’s architecture

Kickfire is designed to work well where today’s general-purpose computing architectures run queries slowly because they’re sitting on their thumbs much of the time. Think data warehousing: complex queries with lots of data.

What is the industry’s answer to this? So-called massively parallel processing, or MPP. Current MPP data-warehousing solutions are special-purpose database software that runs queries on dozens or hundreds of CPUs, which occupy a lot of storage space and require lots of power, hardware, and cooling. “If you throw enough Von Neumann machines at the problem simultaneously, they can answer your questions faster,” or so the thinking goes. In other words, the current state of the art is to arrange conventional computers in new ways.

Kickfire takes the opposite approach: stream processing. This is a fundamentally different computing architecture. Stream processing is to Von Neumann machines as LISP is to C.

For those of you who aren’t LISP programmers, here’s another analogy: In stream processing, you take a bunch of data and you shove it through the chip without stopping. Rather than the chip asking for data from the storage subsystem as needed, the data actually gets pushed at the chip. That is, it’s push-processing instead of the conventional pull-processing.

Conventional processing is like trying to fill your bathtub from the sink with a paper cup. Stream processing is like putting your tub under the sink and opening the drain.

I’m taking some liberties here, to illustrate the differences. As I said, I haven’t seen the wiring diagrams of the Kickfire chip. But hopefully you get the concept.

This is not a new idea. If you’ve worked with modern graphics cards, you’ve seen this in action. Programming languages like Cg express the stream-processing concepts elegantly. If you’ve ever been in a classroom full of C++ programmers trying to learn Cg, you’ve seen how hard it is to grasp this different approach. Essentially, graphics programming on one of these chips is a series of transformations, not a series of instructions. You input some vertexes at one end of the processor, and you tell the chip to do some matrix multiplies and so on. Out pops the result at the other end.

If this doesn’t sound much different from instructions… well, meditate on it. It’s like an assembly line, but nobody leaves their station along the conveyor belt. In a traditional CPU, the “person” at the conveyor constantly leaves to go get the materials he needs.

Kickfire runs in commodity hardware, and it is just one or two servers, not racks full. Like many other systems designed for large amounts of data, it uses a column data store. Unlike many other systems, it uses an industry standard interconnect and a custom pluggable MySQL storage engine.

What took so long?

Stream processing is the obvious way to run SQL queries. Some readers may never have thought about it this way, but my guess is that a lot of you already think of SQL in a stream-processing way, even though you might know that computers today really implement it in conventional ways. I have always tried to think of it this way, and I always try to explain SQL as a stream, too.

So when I was on a call with the Kickfire engineers and it finally sunk in, I felt really silly. Why didn’t I think of that? It’s so obvious.

But then again, most breakthroughs are really obvious in hindsight.

Performance

I have seen initial benchmark results, but I’m under NDA about them. I can’t say any more yet. And I haven’t run any benchmarks myself yet, nor have I had access to the hardware. So this is all theoretical until I get my hands on the system. Caveat emptor, your mileage may vary, etc etc.

One thing I’m interested in is how well the system performs for general-purpose queries. When you take it away from complex queries on lots of data, does it still have an advantage? I’ll be trying to get an answer to that question next week.

About Kickfire

They are still in stealth mode and my NDA prevents me from being able to tell you a lot or answer all your questions yet. But someday they will no longer be in stealth mode, and you’ll find out everything you want to then.

Hint: they are going to be giving a keynote address on their technology, but there’s not much detail in the description. Come to the keynote and find out.

Why am I writing this?

Well, they promised me chocolate…

Seriously: I do have an agenda, but there are actually several motivations here. The first is that they initially contacted me because of my involvement with the MySQL community. Of course they’re hoping to gain publicity through me, but they also wanted to let the community have some input. I’ve been sort of a secret liason for you, representing your interests to Kickfire. I’ve advocated pretty strongly for certain things I’ll go into in a later post.

The other reason I’m working with them is that I’m excited about their technology, even though I don’t have hard evidence about their claims and benchmarks yet. If what they’re saying is true, their product will be very good for the environment. It will let people save a lot of energy (power, cooling, the need to build data centers) and it will help avoid the need to build a bunch of servers. Computers are extremely toxic to manufacture.

I’m also interested in seeing them succeed because I anticipate that even if this product isn’t what it claims to be, they’ll prove the concept and there will be a competitive rush into this space. That is guaranteed to produce a lot of changes in how people build computers, probably in more areas than just data warehousing. So I’m happy that they’re starting this, because others will finish it whether they do or not. And that’s good news for the environment, too.

Stay tuned. More details are forthcoming.

PS: if you have questions you’d like me to look into while I’m onsite with the engineers, feel free to post them in the comments. But I probably can’t answer them yet.

Technorati Tags:, , , , , , , , , , , , ,

You might also like:

  1. Kickfire: relational algebra in a chip
  2. Kickfire is not SSD-based

Remember to sign up for MySQL Conference and Expo!

You have only a few more days to sign up for the MySQL Conference and Expo before the early-bird discount goes away. Check out the schedule of speakers and tutorials, and sign up soon! And just in case you didn’t get one from any of the other people blogging about it, you can email me for a code that’s good for a 20% discount.

I’m presenting two sessions: one on the query cache, and one on EXPLAIN. Both are manageable for an hour-or-so talk. I’m not trying to boil the ocean, but rather to help you understand these important topics in ways you’ll remember after leaving the conference.

I was also on the voting committee for the proposals, so I’ve read them all. I really believe this event is worth every penny. (Of course, as a speaker, it doesn’t cost me… but I digress).

While you’re there you should also plan to get certified, also at a significant savings. This is a great career move, and there are sessions that will help you prepare. There’s a critical shortage of people who really know how to use MySQL. I must admit, I’m not even certified! But I’ll be taking the certification exams too.

Technorati Tags:, , , , ,

You might also like:

  1. Like it or not, it is the MySQL Conference and Expo
  2. My presentations at the 2008 MySQL Conference and Expo
  3. Send your employees to the MySQL Conference
  4. How to get your session accepted to MySQL Conference 2008
  5. MySQL Conference and Expo 2008, Day Two

My presentations at the 2008 MySQL Conference and Expo

MySQL Conference & Expo 2008 I’ll be attending the 2008 MySQL Conference and Expo again this year, and I’m looking forward to hearing some great sessions, meeting new and old friends, and giving sessions myself. As a proposal reviewer, I looked at and voted on 250+ proposals for sessions and tutorials for this conference. There are going to be some great sessions and tutorials.[1]

If you haven’t come to the conference previously, it’s well worth your time and money, in my opinion.

I (Baron Schwartz) am giving two sessions myself, on extremely practical topics. One is the query cache, and the other is EXPLAIN. Both are the subject of many myths and misunderstandings! My goal is to remove all the programmer-speak and show you how they really work. Once you understand that, you can understand the technical terminology. (But it’s very hard to go the other direction).

I haven’t decided yet which sessions I want to attend, but I know this: I’m not going to miss seeing how Beat Vontobel solves a Su Doku puzzle with only self-joins. His session on views last year was just amazing.

Hopefully there’ll be plenty of time to sit down for meals and chats with all the people I correspond with throughout the year, but rarely get to see or talk to!

1And no, I don’t get any kickback for saying nice things about the conference. Even reviewing all those proposals was a volunteer job. And Jay Pipes tricked me into it, the rat! He told me it would be only a few hours. Haha, you can’t review 250 proposals in a few hours… I have to say though, some of them were really rewarding to read. One of them was about holding a cosmic prayer circle or something like that. Without expressing any opinion on my religion/spirituality, I did have to vote NO on that one — sorry, wrong conference.

Technorati Tags:, , , , , , ,

You might also like:

  1. Remember to sign up for MySQL Conference and Expo!
  2. Send your employees to the MySQL Conference
  3. How to get your session accepted to MySQL Conference 2008
  4. Sessions I want to see at the MySQL Conference
  5. MySQL Conference and Expo 2008, Day One