Continuing with my wishlist, I’ll add windowing functions. They’re enormously powerful. They allow you to extend relational logic beyond the strict boundaries of tuples. In MySQL at present, one must use ugly hacks to preserve state from one row to the next, such as user variables — which are not guaranteed to work if the optimizer changes the query plan.
And yeah, PostgreSQL and SQL Server have windowing functions too, and once you’ve used them it’s a little hard to go back. This is in fact one of the main things I hear from people who love PostgreSQL for what I consider to be legitimate reasons.
Windowing functions extend the uses of SQL (sometimes awkwardly, sometimes elegantly), into areas you can’t really go without them. Time-series data, for example, or more powerful graph processing. These things must be done externally to SQL otherwise, in ugly procedural logic.
Windowing functions together with CTEs (my previous post) are particularly powerful.
Anyone want to guess what my next wish will be?
The pace of MySQL engineering has been pretty brisk for the last few years. I think that most of the credit is due to Oracle, but one should not ignore Percona, Monty Program, Facebook, Google, Twitter, and others. Not only are these organizations (and the individuals I haven’t mentioned) innovating a lot, they’re providing pressure on Oracle to keep up the improvements, too.
But if you look back over the last few years, MySQL is still functionally a lot like it used to be. OK, we’ve got row-based binary logging — but we had binary logging and replication before, this is just a variation on a theme. Partitioning — that’s a variation on a theme (partitioned tables are a variation on non-partitioned tables). Performance — same thing, only faster. And so on.
I’m painting things with too broad a brush. There’s actually a lot of stuff that’s NOT just a variation.
But if you look around at what’s out there in other open-source DBs, there’s a lot of innovation, particularly in PostgreSQL, which has had CTEs (common table expressions) for a while. CTEs are not a variation on a theme. They are major new feature, analogous to going from no-subquery-support to supports-subqueries. They enable a lot of things like recursive queries, making a SQL database useful in many more types of situations — think graph-processing, for example, which is downright annoying without them.
Will we see CTEs in MySQL soon?
I’ve been considering using TokuDB for a large dataset, primarily because of its high compression. The data is append-only, never updated, rarely read, and purged after a configurable time.
I use partitions to drop old data a day at a time. It’s much more efficient than deleting rows, and it lets me avoid indexing the data on the time dimension. Partitioning serves as a crude form of indexing, as well as helping purge old data.
I wondered if TokuDB supports partitioning. Then I remembered some older posts from the Tokutek blog about partitioning. The claim is that “there are almost always better (higher performing, more robust, lower maintenance) alternatives to partitioning.”
I’m not sure this is true for my use case, for a couple of reasons.
First, I clearly fall into the only category that the flowchart acknowledges may be a good use case for partitioning: I do need instant block deletes. Paying for data ingestion as well as purging doesn’t make sense in my case. It’s like eating a hot hot curry — I don’t want to feel the pain on the way out too :-)
Secondly, data size matters a lot. If I need to create a redundant index on the timestamp dimension, no matter how good TokuDB’s compression is, it’ll inflate my storage and I/O costs. And make my backups bigger, and so on, and so on. I don’t want an index that I don’t need. My queries operate very efficiently without the timestamp index, and creating one only to help delete older data fast wouldn’t make sense.
In the end I got sidetracked and decided to write this blog post. And I didn’t find out whether TokuDB supports partitioning or not! Silly me.