Xaprb

Stay curious!

Archive for March, 2011

Big Data is how big exactly?

with 8 comments

I see that “Big Data” has become the new buzzword with a spike of hype around it. Everyone’s jumping on it. Companies are eager to promote their products as “Big Data,” just as they were eager to be associated with Web 2.0, Service-Oriented Architectures, and all the rest. Predictably, there’s basically zero agreement on what it means.

I’ve seen “Big Data” mentioned in the context of 1TB, which I think is rather moderate sized. But worse yet, I’ve seen 100GB labeled Big Data. I’ve even seen 5GB labeled Big Data. No links — I don’t want to draw attention to them.

I don’t know what Big Data is, but the stick-of-gum-sized flash drive in my pocket holds 16GB. It’s pretty Small. I mean, I forget it’s even there — it’s definitely not Big. I don’t know where I’d draw the line, but if it fits in a commodity server’s memory, which 100GB can do easily these days, it’s not Big Data. I don’t even think that 1TB is Big — again, it’s only twice as big as commonly available servers can fit in RAM. In fact, most things in the MySQL world aren’t Big Data if they run on a single server, and I’m not sure I’d call a large sharded data store Big Data either — just a bunch of Small Data sitting next to each other. I might make an exception to my no-MySQL-allowed rule of thumb for technologies like InfoBright, which starts to hit its stride in the low-to-mid tens of terabytes of data. That’s entry-level Big in my opinion. This is completely arbitrary, but I’d say 100TB is Big Data in my mind, because it is a couple orders of magnitude bigger than commodity RAM capacities. Ask me a few years from now, and I’ll probably say a petabyte.

The lack of definition of Big Data is characteristic of hyped buzzwords. It’s why nobody can refute anyone’s claims. I think a good guiding principle for marketing might be “don’t associate yourself with something that you can claim despite it being unverifiable.” This might go along with “don’t brag about things your competitors can also claim.”

Edit: oh my, I just realized that one of Percona’s webinars had “Big Data” in the title. Busted. It was Continuent who proposed the webinar and picked the title, but still… the pot calls the kettle black!

Written by Xaprb

March 31st, 2011 at 6:54 pm

Posted in Commentary,SQL

Tagged with , ,

Breaking news: MySQL saves baby seals

with 22 comments

This is a test to see if people will vote this down on Planet MySQL. If you’ll vote down some of the posts that have gotten negative marks recently, like Allan Packer saying that he’s still working on Sparc supercluster, or Drizzle going GA, or Percona Server and XtraBackup being available on Solaris, or mk-query-digest filter how-tos, or TokuDB announcing online add of columns, or XtraBackup Manager, or using WordPress on Drizzle, well…

Then you’re probably the kind of person who’ll vote negatively about MySQL saving the lives of baby seals.

Seriously: is it at all possible that the above posts, which got thumbs-down votes, are actually bad news for anyone? I usually don’t look at the Planet, and only read through my RSS feeds, but for some reason today I actually browsed to it, and I was just amazed at how many posts that are nothing but great steps forward for the MySQL community have negative votes! Who are these people? Who would do such a thing? Get a life! But before you do, please vote this post down — go on, do it! Prove my point!

There are two lessons to learn from this: the ability to vote something down brings out the second-worst in people, and the ability to vote anonymously brings out the absolute worst. Both of those “features” should be ripped out of Planet MySQL and thrown away.

Written by Xaprb

March 28th, 2011 at 9:50 pm

Posted in SQL

The partitioning improvement that almost was

with 3 comments

Today I was looking for the ALTER TABLE EXCHANGE PARTITION feature for a customer, and it looks like it did not get included into MySQL 5.5, although there is a hint of it in the documentation index, and you can find quite a few blog posts and presentations about it. The command simply throws a syntax error:

alter table t exchange partition p1 with table t2;

The worklog is still open, although a related bug report it mentions is closed and pushed into trunk. (It confused me for a moment until I realized that what was pushed into trunk, and released in 5.5, was TRUNCATE PARTITION support.)

Here’s hoping this gets included in a future release — this is a great feature that can make partitions much more amenable to operational tasks such as moving data from one partitioned table to another, or exporting a partition to a table, then exporting the table with xtrabackup and importing it onto another server.

Written by Xaprb

March 22nd, 2011 at 12:46 pm

Posted in SQL