For various reasons, I’ve become interested in analytical databases. These are traditionally called OLAP (online analytics processing) and are designed to extract insights from very large datasets, often with expectations of long response times (hours). More recently, though, various databases capable of running relatively interactive queries over large datasets have emerged. This post is more-or-less a list of analytic databases, with somewhat of a taxonomy added.
As with any list of this type, categories are inexact, and I’m sure this is only partial. I’m making some value judgments about what to mention and what to omit; this is mostly guided by my intuition. However, if you think I should list something I’ve left out, please let me know. I may have simply failed to think of it, so omission shouldn’t be considered a negative opinion! I welcome your feedback and suggestions.
Traditional Analytics Databases
These are the canonical names in the previous generation of big data analytics, and are still widely deployed and in many cases regarded as the gold standard in various ways.
In-Memory Analytics Databases
This is a work in progress, please tweet your suggestions to me.
Open-Source Analytics Databases
These databases aren’t easy to group into other categories for one reason or another, but all are open source. (Note that many of the databases in other categories are also open source.)
At the vanguard of hardware-accelerated databases, GPUs are being used to speed up analytical workloads.
Hadoop / Big Data Ecosystem
The “big data” ecosystem includes a number of databases designed for analytics and BI workloads. At their simplest, these can be seen as access layers over massive datasets stored in distributed filesystems, especially columnar storage layouts such as Parquet and Arrow. Some, however, are more distant from the raw bytes, such as Presto, which is more of a query engine than a database.
NoSQL and Multi-Model Analytics Databases
Most NoSQL databases don’t really fall into the analytics category, but some are used for analytics purposes regardless.
Time Series Databases
Time series is often a simpler case of full-fledged analytics, with some limitations on the complexity of queries and use cases.
Cloud Analytics Databases
Custom-Built Analytics and Event Databases
Many monitoring, analytics, and security companies, finding nothing existing that was well suited for their purposes, have built at least part of their own analytics platforms in-house. Here are some that I’m aware of to varying levels of detail.
Many so-called NewSQL databases are more transactional or OLTP than analytical, or otherwise blur the lines of this article, but I list them here nonetheless.