Xaprb

Stay curious!

Archive for the ‘Infobright’ tag

Big Data is how big exactly?

with 8 comments

I see that “Big Data” has become the new buzzword with a spike of hype around it. Everyone’s jumping on it. Companies are eager to promote their products as “Big Data,” just as they were eager to be associated with Web 2.0, Service-Oriented Architectures, and all the rest. Predictably, there’s basically zero agreement on what it means.

I’ve seen “Big Data” mentioned in the context of 1TB, which I think is rather moderate sized. But worse yet, I’ve seen 100GB labeled Big Data. I’ve even seen 5GB labeled Big Data. No links — I don’t want to draw attention to them.

I don’t know what Big Data is, but the stick-of-gum-sized flash drive in my pocket holds 16GB. It’s pretty Small. I mean, I forget it’s even there — it’s definitely not Big. I don’t know where I’d draw the line, but if it fits in a commodity server’s memory, which 100GB can do easily these days, it’s not Big Data. I don’t even think that 1TB is Big — again, it’s only twice as big as commonly available servers can fit in RAM. In fact, most things in the MySQL world aren’t Big Data if they run on a single server, and I’m not sure I’d call a large sharded data store Big Data either — just a bunch of Small Data sitting next to each other. I might make an exception to my no-MySQL-allowed rule of thumb for technologies like InfoBright, which starts to hit its stride in the low-to-mid tens of terabytes of data. That’s entry-level Big in my opinion. This is completely arbitrary, but I’d say 100TB is Big Data in my mind, because it is a couple orders of magnitude bigger than commodity RAM capacities. Ask me a few years from now, and I’ll probably say a petabyte.

The lack of definition of Big Data is characteristic of hyped buzzwords. It’s why nobody can refute anyone’s claims. I think a good guiding principle for marketing might be “don’t associate yourself with something that you can claim despite it being unverifiable.” This might go along with “don’t brag about things your competitors can also claim.”

Edit: oh my, I just realized that one of Percona’s webinars had “Big Data” in the title. Busted. It was Continuent who proposed the webinar and picked the title, but still… the pot calls the kettle black!

Written by Baron Schwartz

March 31st, 2011 at 6:54 pm

Posted in Commentary,SQL

Tagged with , ,

What data types does your innovative storage engine NOT support?

with 2 comments

I’ve been investigating a few different storage engines for MySQL lately, and something I’ve noticed is that they all list what they support, but they generally don’t say what they don’t support. For example, Infobright’s documentation shows a list of every data type supported. What’s missing? Hmm, I don’t see BLOB, BIT, ENUM, SET… it’s kind of hard to tell. The same thing is true of the list of functions that are optimized inside Infobright’s own code instead of at the server layer. I can see what’s optimized, but I can’t see whether FUNC_WHATEVER() is optimized without scanning the page — and there’s no list of un-optimized functions.

I don’t mean to pick on Infobright. I’ve recently looked at another third-party storage engine and they did exactly the same thing. It’s just that the docs I saw weren’t public as far as I know, so I can’t mention them by name. XtraDB’s documentation falls short too, of course, although it’s pretty well understood that it is very similar to InnoDB.

For a product like this, I think the most helpful thing would be a page explaining two things: 1) the enhancements or extra functionality over the standard MySQL server, and 2) the unavailable or degraded functionality. It would also be good to have both high-level and detailed versions of this.

Written by Baron Schwartz

September 29th, 2009 at 12:33 am

Posted in SQL

Tagged with , , ,