Tag Archive for 'mysql'

High Performance MySQL Second Edition Schedule

I just got the rest of the production schedule from the publisher, plus the PDF files for quality control, for our upcoming book. (Now I have to proofreeed the whole book!) This is the first time I’ve seen the entire production schedule. The book is supposed to go to the printer in the first week of June. I don’t know what the on-the-shelf date will be, but I think very shortly after that. The publisher has promised that it’ll physically be on sale at Velocity.

I also took a peek at the PDFs. Without the appendixes, the last page of Chapter 14 (Tools for High Performance) is page 604. The appendixes bring it to 660 pages. That’s real material, not including tables of contents and indexes. So my estimate (620) was not too far off.

660 pages is not bad, considering that the contract was for 384 pages.

Another note: the marketing materials for the book emphasize that it covers MySQL 5.1. While this is true, I want to point out that we took a real-life approach: we write about what we’ve seen in the real world, and 5.1 is not as widely deployed in the real world. However, the book’s real value, as far as version-specific content goes, is its tremendous depth and breadth in MySQL 4.1 and 5.0. These have been “out there” for a long time, and among the four of us we’ve seen about every conceivable scenario with it. So you’ll get a lot of insight about current, production-ready, widely-used versions. Let the other guys speculate — we just report the facts. It’s not like there’s any shortage of things to say about 5.0, right?

Technorati Tags:

You might also like:

  1. High Performance MySQL 2nd Edition is in production
  2. Coming soon: High Performance MySQL, Second Edition
  3. Progress on High Performance MySQL, Second Edition
  4. High Performance MySQL, Second Edition: Backup and Recovery
  5. An alternative to canonical URIs

Summary of beCamp 2008

Yesterday I went to beCamp 2008 along with four roomfuls of other people interested in technology (perhaps close to 100 people total). The conference was a lot of fun. Not everything went as planned, but that was as planned. This was an Open Spaces conference and I thought it worked very well. From an email Eric Pugh sent:

Basically it all boils down to:

Open Space is the Law of Two Feet: if anyone finds themselves in a place where they are neither learning nor contributing they should move to somewhere more productive. And from the law flow four principles:

  • Whoever comes are the right people
  • Whatever happens is the only thing that could have
  • Whenever it starts is the right time
  • When it’s over, it’s over

From Hadoop to Bang-Splat

I used the law of two feet a time or two. In fact, the first session I wanted to go to, which was about Hadoop and MapReduce, had no knowledgeable attendees. Someone overslept. OK, that’s the way it goes: move along.

From there I went to a session about Unix command-line productivity. Most of the sessions I saw were traditional in that they had one person standing up talking and many people sitting and listening, but not all. This one had several clever command-line gurus mentioning their favorite power tips.

I learned about bang-splat and bang-dollar. The bangs have always gotten me in Bash: I avoid them because I’ve never felt like reading the Bash man page section on them. (Am I too lazy, or not lazy enough?) So it was great to hear some people say “bang-splat and bang-dollar are great” and then explain them. That was easy for me, and now I know how they can be useful to me.

This problem-first type of tip is great for me: tell me the problem, then how to solve it, rather than telling me what the solution is and leaving me guessing what kinds of problems I can solve with it. (The Bash man page is solution-first).

In case you’re wondering, bang-splat substitutes the arguments to the last command, and bang-dollar substitutes the last argument of the last command. So, instead of this:

$ touch file1 file2 file3
$ rm file1 file2 file3

I can do this:

$ touch file1 file2 file3
$ rm !*

There were lots of other nice tips too.

MySQL Performance

I ended up doing a talk on MySQL performance basics. I had no idea what the audience was looking for, so I winged it. I did make some slides, but most of the talk isn’t on the slides. You can get the slides from Percona’s slide page. It seemed to be useful to the folks attending, who had a wide variety of experience and knowledge about MySQL.

Cloud Computing

This session began with a demo of how to create an entire application stack in a few minutes with Cohesive Flexible Technologies. Someone else then demoed a similar thing using RightScale. rPath’s Jeff Uphoff was also in the room, but we didn’t get to see a demo of that. During this session the talk turned to various topics including a little bit of the topics I wanted to hear about in the Hadoop session.

Lunch

Lunch was catered Indian food provided by the Rimm-Kaufman Group. Yum.

Large Scale Storage

This session was sort of a round-table. The two people who talked the most were Josh Malone from the National Radio Astronomy Observatory and the Library of Congress, both of whom have a lot of storage needs they are unsure how to meet. Some people from UVA’s library were there as well, but I didn’t ask what they were working on.

This reminded me a lot of a recent keynote Jacek Becla gave at another conference. He’s with the Stanford Linear Accelerator Center, who are going to be generating a lotta data pretty soon.

High Availability Linux

This one started off with more from Josh Malone, who demoed Nagios briefly and then talked about his storage and backup systems. He uses BackupPC, which sounds pretty neat and very smart. We then talked about some of the things he’s looking into doing, with audience suggestions to look into shared storage or DRBD. We also looked at UltraMonkey briefly — it looks like it’s stagnating, though. And the Linux HA project.

Google App Engine

Finally, someone showed us a calculator application they’d built on Google App Engine, including the code and talking about the data model somewhat. It looks like a neat idea, but the lock-in worries me, a sentiment that was voiced by many others in the room.

Technorati Tags:, , , , , , , , , , , , , ,

You might also like:

  1. Come to beCamp 2008
  2. Bash parameter expansion cheatsheet
  3. MySQL Conference and Expo 2008, Day Three

News flash: MySQL 5.1 has zero bugs

Zack Urlocker says MySQL 5.1 has zero bugs. He may have been misquoted, or quoted out of context, but there it is. I’ll quote enough of it that you can’t take it out of context twice:

Mickos also said MySQL 5.1 has upgraded its reliability and ease of use over 2005’s v5.0.

“Now we can admit it, but this version is much improved over 5.0, which we weren’t totally happy with,” Mickos confided.

He reported that more than 1,300 bugs (997 in 2007, 386 so far in 2008) have been fixed in v5.1, and that, according to standard DBT2 benchmarks, the performance of v5.1 is 10 to 15 percent better than the previous version.

“This version now has zero bugs,” Urlocker told eWEEK.

You can check for yourself at the MySQL bug statistics page.

Of course it’s not true. But what did Zack really say, I wonder?

Technorati Tags:, , , ,

No related posts.

Come to beCamp 2008

I’m going to be at beCamp 2008, the followup to the first beCamp, which I sadly missed.

beCamp is a BarCamp un-conference. Tonight was about meeting, greeting, and throwing ideas at the wall to see which ones stick. Literally. We stuck pieces of paper on the wall with our ideas — things we can either talk about or want to hear about — and then scratched our votes on them to see which are popular.

I live and breathe MySQL for a decent part of the day, so I hesitated, but then stuck “MySQL Performance” on the wall. It got quite a few votes, so I assume will be giving a talk on MySQL performance basics at some point during the conference. (The exact schedule is probably being determined right now, in my absence, but I’m so tired right now that I’ll just take my chances on it not being at 8:00 AM tomorrow.) [edit: I just checked the website and there won’t be anything before 9:00, and the schedule is determined tomorrow. I did say I’m tired, right?]

See you there!

PS: if you want to meet some of my colleagues from my former employer, the Rimm-Kaufman Group, they’ll be there too, wearing the “We’re Hiring” t-shirts. They’re hiring, by the way.

Technorati Tags:, , , ,

You might also like:

  1. I have joined Percona
  2. Summary of beCamp 2008
  3. Remember to sign up for MySQL Conference and Expo!
  4. Going to PostgreSQL Conference East
  5. My presentations at the 2008 MySQL Conference and Expo

Improved Cacti monitoring templates for MySQL

Download MySQL Cacti templates

As promised, I’ve created some improved software for monitoring MySQL via Cacti. I began using the de facto MySQL Cacti templates a while ago, but found some things I needed to improve about them. As time passed, I rewrote everything from scratch. The resulting templates are much improved.

You can grab the templates by browsing the source repository on the project’s homepage.

In no particular order, here are some things I improved:

  • Standard polling interval and graph size by default.
  • Full captions on every graph; you don’t have to guess at how big the values are. Each graph has current, max, and average values printed at the bottom for every value on it.
  • Much more data is captured. I’ve graphed almost everything I could think of.
  • The graphs are grouped better. Most graphs have only related values. There are some exceptions, but not many.
  • The templates don’t hijack your existing installation. They don’t depend on or alter anything in your default Cacti installation.
  • The script that gathers the data is totally rewritten from scratch, and much improved. For example, the math works on 32-bit systems. It has caching built-in so each poll cycle results in just one request to the server, instead of one request per graph. (This is a weakness of Cacti I’m trying to work around). It also has debugging aids and other good coding stuff.
  • By default, it assumes you have the same username and password across every server you’re monitoring, so you don’t have to fill in a username and password for every single graph you create.
  • One data template == one graph template. This helps work around another Cacti limitation.
  • Lots more. Honestly I can’t really remember everything I’ve done. I’m sure you’ll help me remember by asking me how to get X feature working the way you want, and I’ll go “oh, yeah, that’s another thing I improved…”

Cacti templates are very laborious to create if they’re complex at all; it takes a long time and is very error-prone. Instead of doing it through Cacti’s web interface and exporting a huge XML file, I eliminated the redundancies and created a small, easy-to-maintain file from which I generate the XML template with a Perl script. This gives the added benefit of letting me (or you) generate templates with different parameters such as polling interval or graph size. The README file has the full details. However, I’ve pre-generated a set of templates that matches Cacti’s defaults, so you can probably just use that.

This has taken a lot of time. In particular, I spent a lot of time working on it at my former employer, The Rimm-Kaufman Group (kudos to them for letting me open-source the work) and I just spent most of my weekend writing the scripts to convert from the compact format to XML templates, so it’s possible to maintain these beasts. Plus I had to develop the compact format, too. This took a lot of time because I had to understand the Cacti data model, which is pretty complex.

Please enter issue reports for bugs, feature requests, etc at the Google project homepage, not in the comments of this blog post. I do not look through comments on my blog when I’m trying to remember what I should be working on for a software project.

If these templates help you and you feel like visiting my Amazon.com wishlist and sending something my way, I’d appreciate it!

PS: You may also be interested in Alexey Kovyrin’s list of templates for monitoring servers.

Technorati Tags:, , , , , ,

You might also like:

  1. What’s the best way to choose graph colors?
  2. A new home for innotop in the new year

Like it or not, it is the MySQL Conference and Expo

The conference that many of us just went to is called the MySQL Conference and Expo, but a lot of people don’t call it that. They call it by the name it had in 2006 and earlier: MySQL User’s Conference. In fact, some people say (or blog) that they dislike the new name and they’re going to call it the old name, because [… insert reason here…].

I call it by the new name that some people dislike so much. Why? Because it is a conference and expo, not a user’s conference. There’s no reason to pretend otherwise. The conference is organized and owned by MySQL, not the users. It isn’t a community event. It isn’t about you and me first and foremost. It’s about a company trying to successfully build a business, and other companies paying to be sponsors and show their products in the expo hall. Times have changed.

I’m not saying any of this is bad. Being successful in business is a good thing, and having sponsors and partners is fine too. I’m just pointing out that trying to make it be a user’s conference, just by calling it one, isn’t going to work.

If community members want a community conference, we’ll have to make one. MySQL/Sun cannot do this for us, because then it wouldn’t be a community conference.

There’s a simple test of whether people want this: if it happens, then the community wanted it badly enough to do something about it.

The PostgreSQL East 2008 conference I went to a few weeks ago was a great example of how this works. And the attendance fee was $75, not thousands. A conference doesn’t have to be expensive.

Who wants a conference by, for, and of the community?

Technorati Tags:, , ,

You might also like:

  1. Going to PostgreSQL Conference East
  2. Baron Schwartz on a podcast at MySQL Conference and Expo 2008
  3. MySQL Conference and Expo 2008, Day Two
  4. Remember to sign up for MySQL Conference and Expo!
  5. My presentations at the 2008 MySQL Conference and Expo

Spring 2008 issue of MySQL Magazine

Keith Murphy and his hard-working crew have released the spring 2008 issue of MySQL Magazine. Go take a look — it includes quite a few articles on various topics, even a mention of our upcoming book (High Performance MySQL, Second Edition).

Technorati Tags:, ,

You might also like:

  1. Get a free sample chapter of High Performance MySQL Second Edition
  2. High Performance MySQL 2nd Edition is in production
  3. High Performance MySQL Second Edition Schedule
  4. Coming soon: High Performance MySQL, Second Edition
  5. High Performance MySQL, Second Edition: Replication, Scaling and High Availability

A different angle on the MySQL Conference

There are quite a few business angles you might see only if you’re here at the conference, and you won’t get from blogs. For example, let’s take a look at the contents of the shoulder bags they hand out with your registration. (This is only a partial list.)

  • SnapLogic’s flyer gets it right: their system is compatible with “GNU Linux.” Hooray, a commercial company acknowledging the GNU operating system for what it is!
  • MySQL Enterprise’s flyer has three big bullet points: MySQL Load Balancer, MySQL Connection Manager, and MySQL Enterprise Monitor Query Analyzer. The first two look like they’re probably built on MySQL Proxy. The last has a visual explain plan feature, which according to an elevator conversation is not yet built. I’ll stop by their booth and see. As you may know, Maatkit has provided a tool (which is designed for integration into other tools) that shows a visual explain plan for a long time.
  • There’s an issue of Linux Journal, which does not get the GNU part right. And it has no articles about MySQL. Off-topic! Discarded!
  • Infobright’s flyer says they can load data nearly real-time. I don’t know how you read it, but to me that says “can’t quite keep up with how fast you generate data.” So… what good can it possibly be, right?
  • The conference bag itself has Zmanda’s logo on the side.
  • Webyog’s flyer has one side for SQLyog, and one for MONyog. Each side takes the sparse but visually appealing approach of shiny icons to present a feature list. My favorite is the “Find slow SQL” turtle.
  • JasperSoft’s flyer has soothing, professional blues and rich reds. It makes them look very trustworthy. (I’m not being snarky.) And they have lots of nice whitespace. It’s a little bit of a different look.
  • Kickfire’s marketing department is really on the ball. I’ve seen a large number of flyers and other materials from them (online and offline) and they just changed their name and created a new logo and look-and-feel a short time ago. How do they do it so fast?
  • O’Reilly has a bunch of half-sized flyers for their conferences. We should have asked them to throw in one about our upcoming book, the second edition of High Performance MySQL. Alas, opportunity lost. By the way, stop by the bookstore and grab a copy of the sample chapter.
  • Zmanda, not content with stamping the outside of the bag, has a half-flyer inside it too, plus a chance to win a Digital Rebel to lure you to their booth. If you’re doing backups the way a lot of people seem to, you might want to stop by their booth anyway…
  • There’s a CD for a free trial of WinSQL. But the CD case doesn’t say what the

Sorry. I have a short attention span.

Technorati Tags:, , , ,

You might also like:

  1. High Performance MySQL 2nd Edition is in production
  2. High Performance MySQL, Second Edition: Advanced SQL Functionality
  3. Progress on High Performance MySQL Backup and Recovery chapter

Maatkit t-shirts are here

Maatkit

I’m at the MySQL Conference and the t-shirts I created for Maatkit have arrived. Come get yours! They are high-quality, attractive shirts you’ll be proud to wear, and they are a nice rich wine-red color.

Harrison Fisk (co-author of MySQL Clustering) got the first one, because he told me that he recommends Maatkit to MySQL Support customers about twice a week. I made sure to save one for Jay Pipes too, because his luggage got lost so he has nothing to wear. Unfortunately, I didn’t make any Maatkit underwear, sorry Jay. Now I know what to do for next year…

I’ve already given out a whole bunch of them, and at this rate all 50 of them might not last the day!

Technorati Tags:, , ,

You might also like:

  1. MySQL Conference and Expo 2008, Day One

Kickfire: relational algebra in a chip

I spent the day Thursday with some of Kickfire’s engineers at their headquarters. In this article, I’d like to go over a little of the system’s architecture and some other details.

Everything in quotation marks in this article is a quote. (I don’t use quotes when I’m glossing over a technical point — at least, not in this article.)

Even though I saw one of Kickfire’s engineers running queries on the system, they didn’t let me actually take the keyboard and type into it myself. So everything I’m writing here is still second-hand knowledge. It’s an unreleased product that’s in very rapid development, so this is understandable.

Kickfire’s TPC-H benchmarks are now published, so you can see the results of what I’ve been seeing them work on. They are now #1 in the world, in two categories. Visit them at their booth in the exhibition area at the conference, and you will be able to see more for yourself.

The big picture

At a high level, Kickfire is an appliance consisting of two or more commodity rack-mountable 1U pizza-box units.

One unit contains the Kickfire chip and a lot of standard, high-speed, server-grade ECC memory. This unit is what executes the queries at high speed.

The other unit is connected to the Kickfire chip unit via a standard PCIe interconnect. It runs stock CentOS 5, with MySQL 5.1. Kickfire has their own storage engine, which uses fairly well-known techniques such as column storage and compression.

To the outside world, the unit behaves just like an ordinary MySQL server. You connect to it in the same manner, you issue the same kinds of queries, you manage users and privileges the same way, and so on. However, when you run a query, it doesn’t get executed in the traditional MySQL manner (nested-loop joins with calls to the storage engine via the storage engine API). Instead, the query goes to the Kickfire chip and executes there. The chip is designed to execute queries very fast, through a variety of techiques that a) I’m not allowed to tell you about yet or b) are sometimes unclear to me because Kickfire was being a little protective about some of my technical questions.

I met with quite a few people at Kickfire, but I’ll just mention one: Ravi Krishnamurthy. Before Kickfire approached me, I had not heard of him. Anyway, I’ll just link to Ravi Krishnamurthy on Google Scholar, and let you read up on his papers if you want. It’s enough to say that I really enjoyed speaking with him and the other people at Kickfire.

One of the overall impressions I got was that the Kickfire engineers aren’t the type to do something halfway. When complete, this is not intended to be a system that has only some of the features you’d expect.

I/O bottlenecks

The Kickfire chip has no registers. Instead, the Kickfire chip addresses a very large amount of memory directly. Remember, registers are a bottleneck. As I said in my first article on Kickfire, using registers to process large amounts of data is like using a paper cup to fill your bathtub. Allowing the chip to address this memory directly removes a huge bottleneck.

There is still on-disk storage, though. (And no, it’s not SSD.) The interconnect between the on-disk storage and the memory is a standard PCIe connection. Nothing exotic or proprietary. But the system is apparently capable of moving a very large amount of data at very high speed from the disks to the Kickfire chip’s memory, where it can be addressed in O(1) speed like an array lookup.

Another interesting technique is that the system does not decompress the data to operate on it. According to the engineers, the queries run on the data in its compressed form. As Ravi told me, implementing this is “not for the faint of heart.”

Kickfire seems to have really worked hard at removing bottlenecks wherever possible. For example, they’ve rewritten the out-of-the-box drivers for key pieces of the commodity hardware they’re using.

Souped-up MySQL

If you know how MySQL executes queries, the statement “Kickfire executes joins directly in the Kickfire chip” implies that the Kickfire system isn’t just a storage engine, because MySQL currently processes many of the most costly parts of queries at the server level, not in the storage engine. Obviously Kickfire is not going to perform well unless it changes that. Kickfire has in fact built their own optimizer, which replaces the MySQL optimizer. It compiles the incoming query into a series of macro-operations, which apparently are very similar to the basic relational operators (project, join, etc). This is then sent to the chip for execution, and as the chip produces results it injects them back into the stream of bytes that the server normally uses to send results back to the client.

The Kickfire chip doesn’t implement everything in hardware. For example, there is no MD5() function in the chip. When it encounters an operation it can’t do in hardware, it makes a call back to the MySQL server to fill in the gaps in its functionality.

The rewritten optimizer sounds like an interesting piece of engineering. Ravi told me with pride that the optimizer is “world-class” and “can stand toe-to-toe with the best optimizers in the database industry.” It is a cost-based optimizer with rewrites (e.g. it transforms the operator tree into the most efficient equivalent structure) and it is exhaustive (e.g. it tries all possible combinations to find the best execution plan, unlike MySQL’s optimizer which by default switches to a greedy search when the number of tables to be joined becomes large [correction: as Timour pointed out to me today, I made it sound like MySQL’s optimizer isn’t exhaustive; I neglected to mention that you can configure it]).

I asked whether they had benchmarked the optimizer’s performance. (I mean how fast it can find an optimal query plan, not the performance of its results.) Of course, there is no standard benchmark for this, but I think it’s interesting just to compare it against the MySQL optimizer. They had not done this, but I think they will now that I have mentioned it. I think it’s relevant because if you use Kickfire for short queries, a slow-performing optimizer could actually become noticeable.

Is it really stream processing?

I wanted to know whether the chip really does stream processing, or whether it is only conceptually stream processing that’s really implemented some other way. It sounds to me like it’s the genuine article. I asked some pointed questions to this effect, such as “is there a way to interrupt a partially completed query.” As it turns out there is, but only because the stream processor apparently does time-slicing like a standard chip, and when it comes up for air it can check to see if a query should be aborted. In general, I was told, there is no interruption once the data stream starts flowing. That lets the query literally “run at the speed of electrons.”

But what about subqueries, you ask? That’s what I asked too. Stream processing is all very well for joins, but what about a correlated subquery, for example?

It turns out that if you’re clever, you can figure out ways to decorrelate them and then execute them in streaming fashion. The same holds for aggregation over data that’s not in the order needed for streamed aggregation. Pretty interesting ideas; I can’t go into them, because those are proprietary, but Ravi and I talked about them for quite a while.

And very large IN() lists can be turned into a relation and treated like any other.

Storage

Storage is obviously crucial to processing extremely large amounts of data very fast. A few of the things I noted about the storage:

  • Each column is stored in a fixed width. This is how Kickfire can look up a row as though it’s doing an array access.
  • The internal representation is chosen automatically and may not match what you think. Kickfire can profile data as it’s loaded, and choose the type as it goes.
  • If you tell Kickfire you’ll only store values that are X large in a column, and it builds its column storage space to hold that large a value, what happens when you then start adding larger values later? Ravi explained how it works, and it’s proprietary right now, but suffice to say that Kickfire does not need to rewrite all the data you’ve already stored if you suddenly start storing values you didn’t anticipate. Yet, it can still maintain O(1) array-lookup performance on the compressed data.
  • You can pass the storage engine special comments in the CREATE TABLE statement to tell it what kinds of data each column will get. These comments are part of MySQL’s standard syntax — Kickfire has not changed the MySQL parser, so it should be 100% syntax-compatible with a standard MySQL server.
  • Kickfire has a very Oracle-like set of features around tablespaces, extents, and so on. You can have multiple tablespaces, and you can add devices to tablespaces, etc.
  • Storage is transactional and ACID-compliant, with logging and ARIES recovery much like Oracle, InnoDB, etc. If it surprises you that a system built for large data warehouses would be transactional and ACID-compliant, welcome to the club. I was expecting the usual special-case behavior, you know, you can load data but you can’t update it, or something like that. But as I said, Kickfire isn’t doing this halfway. Plus, TPC-H requires ACID properties.

Loading, ETL, and star schemas

Loading data is also important to accelerate: executing queries on large amounts of data isn’t good if it takes forever to get the data into the server. Kickfire has their own suite of tools, including one for loading data that accelerates the load process with the SQL chip itself.

Kickfire’s attitude towards star schemas is that you shouldn’t need to build a special schema for your data warehouse. They think their system will be so fast that you can keep your data in the same schema you use for OLTP. If that turns out to be true, that will save a lot of work. (How much effort have you put into building a separate schema for your data warehouse?)

Other notes of interest

Here are some other tidbits I thought I’d share with you:

  • The system has support for foreign keys. It automatically creates indexes on foreign keys and primary keys.
  • The standard types of indexes don’t really apply. Instead, the indexes are “hardware-friendly” (the other term they used was that the indexes are “impedance-matched to the hardware”). There are special features for indexing ranges of dates and indexing words inside a string (but this is not a full-text index; I’m unclear on how it really works, but it helps accelerate LIKE queries, which is important for the TPC-H benchmarks)
  • The deadlock detection is via cycle detection in the waits-for graph, not timeout-based. As a result, it should be fast.
  • The system I saw was running in debug mode, and wrote its optimized query plan to a file for every query. I talked with them about making this available via SQL. The plan is much more detailed and informative than MySQL’s EXPLAIN. They asked me whether it would be a good idea to wedge this information into EXPLAIN, and I told them I wouldn’t do that; EXPLAIN is a tabular output that doesn’t make much sense unless you really know how to read it. When you’re trying to understand a query plan, which is generally a tree of relational operators, you need a hierarchical view of it.
  • They told me that they use the INFORMATION_SCHEMA extensively, but I did not get a chance to look at it myself.
  • They also told me that they use UDFs extensively for system management, but again I can’t confirm.

Licensing

As you probably know, I’m a strong believer in Free Software. I am not aware of any plans for Kickfire to release the source code for their modified version of MySQL or their storage engine or optimizer. These are the satellite diamonds that surround the crown jewels: open-sourcing them would make it easier to reverse engineer the chip, which they don’t want. However, they’ve promised me that they’re going to open-source some of the migration tools, etc etc. Not initially, but as time permits; and later they’ll look at open-sourcing other parts.

I have made sure that they know where I stand on this: I think the ethical thing to do is GPL all the code that they ship, and I think everyone I talked to heard me say that at least once. If you’re going to buy their magical hardware, you deserve to have the source code for everything that runs on it, too. And they need to release the interface specs for their hardware so people can use it in new and surprising ways. Who knows — someone could use it to find a cure for cancer.

Summary

My two days with Kickfire left me with a lot more questions, not surprisingly, and I don’t think that will change until I actually get access to a machine and start testing it myself. I saw a lot of slideshows; I saw some demos; I walked into the server rooms and saw the pretty blinking lights; but I’m not going to tell you that Kickfire will do X or Y because I don’t know a heck of a lot. I was hoping for more hands-on experience and in-depth technical details, but that wasn’t the way it really worked out. However, based on what I’ve seen, I have no reason to believe other than that Kickfire’s system will do what they claim: it will run large, complex queries on very large datasets extremely quickly.

Technorati Tags:, , , , , , ,

You might also like:

  1. Kickfire: stream-processing SQL queries
  2. Kickfire is not SSD-based
  3. MySQL’s FEDERATED storage engine: Part 2