I found myself in a, ahem, lively discussion with someone recently. It started when I said “there was always something wrong about the daily deals businesses (i.e. Groupon), but I’m sure they’ll teach us what’s really needed.” Turns out this person ran a local daily-deals site. Oops.
My feeling is that anytime something doesn’t take root and grow into a lasting business, there’s a lesson to learn. Early social-networking sites weren’t quite a match with needs. More recent ones have gotten something right that the early ones missed. More than one thing needs to be gotten right. In particular, to achieve good user adoption, the value you deliver for the user, what you ask in return, and what the user perceives of both the value and the returned favor are all very delicate balances.
I always felt that these were out of balance in daily-deals businesses because they framed things wrong for the customer. If I get a daily-deals coupon to go try something out, I might be motivated to try it, but my core belief is that if I wasn’t already a customer, I’m not going to become one after my cheapo trial. I think one of the main reasons for that is that the daily-deal coupon sends the signal that the product isn’t worth the usual price. I firmly believe that most people live by “you get what you pay for,” and that this is a two-way street. If you don’t pay a lot, then you don’t think it’s worth a lot.
When I joined Percona, their consulting rate was $200 per hour. Complaints about this high rate were widespread, both externally and internally (consultants thought it was too high also). But something funny was going on: customers wouldn’t keep their appointments with consultants, and they didn’t seem to care. I made it a policy that missed appointments would be billed anyway, and that didn’t change anything. So I experimented with the rates. After trying out various rates for about a month, $300 per hour seemed to be a sweet spot. Serious customers were still willing to pay, we weeded out many bad customers who only caused trouble, and most importantly, a lot fewer appointments were missed.
I experienced something similar in my individual consultancy. I worked on a pro bono basis for a local immigrant health clinic. I had a bad feeling that I was taken for granted, and one day when I was at the clinic early to finish up the project, I stood in the early morning cold for half an hour before I realized that nobody was coming to meet me and let me into the building. After that I stopped doing anything pro bono, and I always had a good feeling that my services were appreciated.
So while arguing about the daily deals sites, I recalled the feeling I always had: that there was something wrong with the business model — that it was undermining what it was supposed to promote. And that vague notion came back to me: the decline of daily deals should teach us what ought to be done instead.
Later, it came to me: A daily-deals offer is like a one-night stand, with no expectation of a long-term relationship. Without this agreement at the foundation of the relationship, the match between the ask and the offer is lopsided, and the “customer” becomes an exploiter, instead of a customer.
And with that, I realized that I had it wrong. Far from being the leading indicator, the daily deals sites were actually behind the times. There have been businesses for many years doing what they should have done. I’ve supported some of them myself. I’m referring to BMG’s music subscription, Omaha Steaks, book-of-the-month clubs, gourmet coffee or wine subscriptions, and so on.
You begin these relationships with several subtly intertwined things, including a good introductory offer, an upfront agreement on the real value of what you’re getting in the trial, and an agreement to be a long-term customer. Oh, sure, you can change your mind and cancel after the introductory offer is up. But agreeing and then canceling is different from never agreeing. There’s probably a lot of psychological research related to what I’m claiming, and I’m sure that some of it contradicts me, but I believe that “buy 10 CDs at $1 each, then we’ll send one per month at full price thereafter, and if you buy X more within a year you get another batch at $1 each” is very different from “you can rip this merchant off by walking out of his store with $1 CDs and no actual or implied obligation.” With BMG’s music subscription service, you opt in and then you have to opt out again (either once-and-for-all, or once every month). And maybe your conscious brain says “I’m going to just cancel after I get my cheap CDs,” but I’m an astute enough observer of myself to know that my subconscious feels differently about the matter. In the back of my mind, I feel like I’m cheating BMG if I cancel before I buy enough CDs at full price to be a profitable customer for them. The conscience gets involved. It never gets involved with a daily deal — there’s no opt-in to opt-out of.
Perhaps what the daily deals sites need to do is bring a platform for this kind of long-term relationship, and managing the logistics, to lots of companies so they don’t have to reinvent it on their own. Perhaps there is a “next time we’ll get this right” for daily deals after all.
On that note, it’s a good thing Gearhart’s Chocolates doesn’t have a BMG-like deal, because you’d have me at “chocolate.” (If you don’t know who they are, it’s a local Charlottesville chocolatier that is easily one of the best in the entire world, and priced to match. I send their variety pack as thank-yous fairly often. Just another perk of living in Charlottesville.)
Last week Tokutek announced that they’re open-sourcing their TokuDB storage engine for MySQL. If you’re not familiar with TokuDB, it’s an ACID-compliant storage engine with a high-performance index technology known as fractal tree indexing. Fractal trees have a number of nice characteristics, but perhaps the most interesting is that they deliver consistently high performance under varying conditions, such as when data grows much larger than memory or is updated frequently. B-tree indexes tend to get fragmented over time, and exhibit a performance cliff when data doesn’t fit in memory anymore.
The MySQL community is excited about having access to TokuDB’s source code, and rightly so. TokuDB is, broadly speaking, aimed at the same category of use cases as Oracle’s InnoDB, which has been MySQL’s leading storage engine for a long time.
MySQL’s market size is large for an opensource product (roughly $500M to $1B USD, depending on who you talk to), and in a big pond, a stone causes wide ripples. I think the most significant implications, though, are for MongoDB. Tokutek has published a series of benchmarks of MongoDB performance with TokuDB as the storage engine instead of MongoDB’s default storage engine. The results are compelling.
I think TokuDB will rapidly become the storage engine of choice for MongoDB, and could catapult MongoDB into the lead in the NoSQL database arena. This would have profound implications for opensource databases of all flavors, not just NoSQL databases.
It’s worth revisiting a bit of ancient history for some context.
Way back in the olden days, MySQL’s main storage engine was MyISAM. MyISAM is non-transactional and has table-level locking, meaning that a write (update, insert, delete, or similar) blocked all concurrent access to the table. This is okay for some uses, and can even be very good in special cases, but in the general case it is a disaster. MyISAM introduced some special workarounds for common cases (such as permitting nonblocking inserts to occur at the end of the table), but in the end, you can’t fix table-level locking. A mixed workload needs storage that’s designed for high read and write concurrency without blocking.
MyISAM had other problems, such as lacking transactions, being prone to data corruption, and long repair times after a crash.
As a result, MySQL as a whole was only interesting to a minority of users. For demanding applications it was little more than a curiosity.
Then came InnoDB. InnoDB introduced ACID transactions, automatic crash recovery, and most importantly, row-based locking and MVCC, which allowed highly concurrent access to rows, so readers and writers don’t block each other. InnoDB was the magic that made MySQL a credible choice for a wide range of use cases.
Most of the interesting chapters in MySQL’s history have involved InnoDB in one way or another. To list some highlights: Oracle bought InnoDB’s creator Innobase Oy, MySQL scrambled to find a replacement (Maria, Falcon, PBXT), Sun’s decision to acquire MySQL was said to be influenced by Falcon, Percona created XtraDB, and Oracle acquired Sun. Things are settling down now, but it’s easy to forget how much of a soap opera the MySQL world has lived through because of InnoDB not being owned by MySQL.
And in the middle of all this came NoSQL databases. In the past half-dozen years, more databases have been invented, popularized, and forgotten than I care to think about. In many cases, though, these databases were criticized as ignoring or reinventing (badly) decades of learning in relational database technology, and even computer science in general. I know I’ve looked at my share of face-palm code.
Databases, by and large, depend on reliable, high-performance storage and retrieval subsystems more than anything else. Many of the NoSQL databases have interesting ideas built on top of bad, bad, bad storage code.
MongoDB is a case in point. MongoDB reinvented some of the worst parts of MySQL all over again. Storage was initially little more than mmap over a file. I think Mark Callaghan put it best in 2009, when he said “Reinventing MyISAM is not a feature.” MongoDB’s storage at that time really was MyISAM-like. It’s improved somewhat since then, but it hasn’t had the wholesale rip-and-replace improvement that I think is needed. Not only that, but MongoDB as a whole is still (predictably) built around the limitations of the underlying storage, with coarse-grained locking.
But MongoDB, like MySQL, has been relevant in spite of these shortcomings. Form your own opinion about why this is, but from my point of view there are two main reasons:
- MongoDB was born in an era when the popular databases were frustratingly slow and clunky to work with, and innovation was stalled due to the political drama surrounding them.
- MongoDB simply feels nice to developers. If you’re not a developer, this is a little hard to explain, but it just feels good, like your favorite pair of jeans. Like a hug from a good friend. Like a hammock and a summer day. The difference between an SQL database and MongoDB for many developers is like the difference between an iPod and a cheap knockoff MP3 player. I could go on and on.
It’s difficult to overstate the importance of this, because it means that MongoDB may well become an enterprise database, despite what bad opinions you may have about it now. Why is this? It’s because developers are king in the modern IT enterprise. Developers determine what technologies get adopted in IT. CTOs like to think the decisions come from the top down, but I’ve seen it work the other way time and time again. Developers start to use something that frustrates them less than the alternatives, and a groundswell begins that’s impossible to stop. Someday the CTO discovers that the question of whether to use technology X was decided by a junior developer long ago and deployed to production, and now it’s too late.
I’ve done it myself. At Crutchfield I hijacked the company-wide policy that migration from legacy VB6 to .NET would proceed along the lines of a transition to VB.NET. I was fighting through awful code day in and day out, and I knew that a more restrictive language would prevent a lot of bad practices. So I wrote several major systems in C# without asking permission. It’s a lot easier to get forgiveness than permission. Then I showed off what I’d done. When I left Crutchfield, the IT department had chosen C#, not VB.NET, as its language of the future (even though there were, and probably still are, major VB.NET applications).
Similarly, at Crutchfield I was provided a 15-inch CRT monitor to work on. This was 2003, you understand. Even at that time, it was awful. How can you expect a developer to work on a flickering, small monitor? I bought my own large-screen LCD and put it in my cubicle. Management ordered me to remove it because it was causing a flood of “hey, how did Baron get a nice monitor?” questions, but the camel already had a nose under the tent. I took my monitor home, but not too long after that we all started to get nicer monitors. I brought my own nice chair to work, too. All told I probably forced Crutchfield to spend thousands of dollars upgrading equipment. You have to be careful about headstrong kids like me — don’t turn your backs on us for a moment.
This story illustrates why MongoDB is likely to become a major database: because developers enjoy working with it. It feels pleasant and elegant. Remember, most technology decisions are based on how people feel, not on facts. We’re not rational beings, so don’t expect the best solution to win. Expect people to choose what makes them happy.
And with the availability of TokuDB, MongoDB is lovable by a lot more people. With reliable storage and transactions, uncool kids can like it too.
It goes further than just the storage engine. The kernel of MongoDB has code that needs to be fixed, such as the coarse-grained locking code. Tokutek basically forked MongoDB in order to insert TokuDB into it. They had to, in order to get all that locking out of the way and allow MongoDB to shine with TokuDB on the backend.
I’m not sure exactly how this will play out — will Tokutek start offering a competitive product? Will there be opensource community-based forks of MongoDB that integrate TokuDB? Will 10gen do the engineering to offer TokuDB as a backend? Will 10gen and Tokutek partner to do the engineering and provide support? Will 10gen acquire Tokutek? Will a large company acquire both? You decide.
But I believe that a few things are inevitable, and don’t require a crystal ball to guess.
Anyone who cares about MongoDB is going to be using TokuDB as their storage backend within a matter of months. It’s happened before — look at what happened to MySQL and InnoDB. Look at Riak; people dropped Bitcask like a hot potato when LevelDB storage arrived (although it hasn’t been a perfect solution).
Just to be clear, I do not think that MongoDB’s parallels with MySQL’s history must inevitably repeat in all aspects of the story. The world of databases today (big data, cloud, mobile) is not in the same situation it was when MySQL was creeping into general awareness (web, gaming, social, general lack of good alternatives to commercial databases), and the reasons people use MongoDB now are different from the reasons people chose MySQL back in the day. Still, there’s a good chance that MySQL’s past can teach us about MongoDB’s future, and for some use cases, MongoDB deployments will soon accelerate rapidly. I expect a larger commercial ecosystem to emerge, too; right now the MongoDB market is worth tens of millions, and I’d guess in a few years we’ll look back and see a sharp inflection point in 2013 and 2014. TokuDB could help propel MongoDB’s market size into hundreds of millions of dollars, which is a position occupied uniquely by MySQL today in the opensource database world.
It’s getting real in the MongoDB world — this is going to be interesting to watch.
This confuses lots of people, including most recently Todd Hoff of HighScalability fame, who wrote in last week’s summary post,
Have to say, this distinction has never made sense to me: Concurrency is not parallelism: concurrency is the composition of independently executing processes, while parallelism is the simultaneous execution of (possibly related) computations. Concurrency is about dealing with lots of things at once. Parallelism is about doing lots of things at once.
I think the problem is that words are hard to understand. The Go blog post is confusing because of that. Pictures are easier. Look, a single-threaded, non-parallel, concurrent process:
Lots of tasks can run on the system, but only one of them makes progress at a time. And here’s one that’s both concurrent and parallel:
Hopefully that clears things up.