Can Anomaly Detection Solve Alert Spam?

Anomaly detection is all the buzz these days in the “#monitoringlove” community. The conversation usually goes something like the following: Alerts are spammy and often generate false positives. What you really want to know is when something anomalous is happening. Anomaly detection can replace static thresholds and heuristics. The result will be better accuracy and lower noise. I’m going to give a webinar about the science of statistical anomaly detection on June 17th.

» Continue Reading (about 100 words)

The Goal

Once upon a time I managed several teams of consultants. At a certain stage of the organization’s growth, we wanted to achieve a higher billable-time utilization more easily, and we wanted more structure and process.

Cary Millsap, about whom I have written quite a bit elsewhere on this blog, suggested that I might profit from reading The Goal by Eliyahu Goldratt. I will let history be the judge of the outcome, but from my perspective, this was revolutionary for me. It is a clear watershed moment in my memory: I lived life one way and saw things through one lens before, and afterwards everything was different.

Horse Race

» Continue Reading (about 1500 words)

Can MySQL be a 12-factor service?

A while ago I wrote about some of the things that can make MySQL unreliable or hard to operate. Some time after that, in a completely unrelated topic, someone made me aware of a set of principles called 12-factor that I believe originated from experiences building Heroku.


That’s been over a year, and I’ve come to increasingly agree with the 12-factor principles. I guess I’m extremely late to the party, but making applications behave in 12-factor-compliant ways has solved a lot of problems for me.

This experience has repeatedly reminded me of one of the applications that continues to cause a lot of the kinds of pain that the 12-factor principles have solved for me: MySQL.

» Continue Reading (about 500 words)

Monitorama 2014: This One Weird Time-Series Math Trick

Monitorama 2014 Portland has been a great show. I’ve enjoyed the technical nature of the talks, the diversity of the speakers, the topics from hilarious to thought-provoking, and the stage in a theater, set up for a Shakespearean tragedy. I have also taken a lot of notes. For example, Toufic from Metafor Software suggested that the audience look into the Kolmogorov-Smirnov test. I am proud of the slide that made its way into my talk as a result:

» Continue Reading (about 200 words)

GopherCon 2014

I spoke at Gophercon last week in Denver, and it was one of the best conferences I’ve attended. I can’t remember learning so much and meeting so many great people in years. I have page after page of notes in my notebook, many of which I’ve yet to follow up on. The conference prompted a burst of learning and a flurry of creativity for me, as well as a huge list of things to study further.


In no particular order, here are some of the many highlights for me:

» Continue Reading (about 600 words)

Go MySQL Drivers

If you’re interested in Google’s Go programming language, perhaps you aren’t sure what drivers to use for MySQL. The good news is there are excellent drivers for MySQL. There are several opensource ones on GitHub and elsewhere, but the driver I recommend is Why? It is pure Go, not a wrapper around a C library, and is liberally licensed. It is high performance. A lot of work has gone into making it avoid allocations and consume minimal CPU.

» Continue Reading (about 300 words)

JOIN Versus Key-Value Stores

I was listening to a conversation recently and heard an experienced engineer express an interesting point of view on joins and key-value databases. I don’t entirely agree with it. Here’s why.

Library Of Congress

First, the opinion. If I may paraphrase, the discussion was something like this:

  • With experience in building distributed systems, one learns to avoid JOIN.
  • Therefore, much of the work of JOIN is done in the application instead of the database.
  • Access to the database is usually reduced to simple primary-key lookups.
  • Therefore, a key-value store is as good a choice as a relational database.

I’m simplifying, because the speaker actually suggested that MySQL makes a really good database for primary-key lookups as well.

» Continue Reading (about 700 words)

Ultima Online and the History of Sharding

Have you heard of sharding a database? Of course you have. Do you know where the term comes from? Someone asked me this at a cocktail party recently. I gave it my best shot.

“The earliest I remember was Google engineers using it to describe the architecture of some things,” I said. “That would have been about 2006.”

“Nope. Much earlier than that,” said my new friend.


I pondered. “Well, I guess there was the famous LiveJournal architecture article about MySQL. That was, I dunno, 2003?”

The person then told me the following history. I can neither confirm nor deny it; what do you know about it?

» Continue Reading (about 400 words)

Switching from Sublime Text back to Vim

I’ve used Vim for as long as I can remember, but when I started to work with Go at VividCortex, for some reason I started to use Sublime Text instead. It does make a very nice GUI-based editor, but I never felt that it was as powerful as Vim.

Ever notice how the Vim logo looks a little like Superman’s logo? No? Squint a little harder, then.


» Continue Reading (about 300 words)

Slides From Percona Live

Embedded below are slides for the two talks I gave at Percona Live. The first one is titled knowing the unknowable. It illustrates the special regression technique we developed at VividCortex for computing the amount of CPU, IO, or other resources a query uses within MySQL. The second one is on building MySQL database applications with Go.

» Continue Reading (about 100 words)

Replication Sync Checking Algorithms

I was interested to see the announcement of a MySQL replication synchronization checker utility from Oracle recently. Readers may know that I spent years working on this problem. The tool is now known as pt-table-checksum in Percona Toolkit, but the original work started in 2006. I would say that I personally have spent at least 6 months working on that; adding up all the other Percona Toolkit developers, there might be several man-years of work invested.

» Continue Reading (about 600 words)

Percona Live Recap

I had a great time at Percona Live. I think this was the best MySQL conference I’ve ever been to. (The food was excellent too. The fastest way to a man’s heart is through his stomach.) The talks I attended were very good. Jay Janssen’s tutorial on Percona XtraDB Cluster was impressive. I can’t imagine how much time he must have spent preparing for that. I was very happy that Oracle, MariaDB, and WebScaleSQL had a strong presence, too.

» Continue Reading (about 400 words)

Time-Series Databases and InfluxDB

Time-series databases are of particular interest to me these days. Not only is VividCortex working with large-scale time-series data, but it’s a growing trend in the technology world in general. What’s perhaps most surprising is the dearth of native time-series databases, either commercial or opensource.

No Time to Say Hello

The World is Time-Series

The data we gather is increasingly timestamped and dealt with in time-series ways. For the last 10 years, I’ve worked with “roll-up” or “summary” tables almost constantly. I built, and saw others build, the same types of solutions over and over. For example, I probably consulted with over a dozen companies who do search-engine marketing and advertising. Cost tables are a given, and there’s usually cost-per-ad-per-day and half a dozen other summary tables. In my case I saw these things in the MySQL context, but you can pick your technology and someone’s trying to do time-series tasks on top of it.

» Continue Reading (about 1700 words)

The Barnes and Noble Nook HD+

I consider myself a very slow adopter with regard to tablets. I’m too picky. I think the iPad is inconveniently large, and there are a lot of devices that have the same screen size, which eliminated many of the popular ones from my consideration. Many of the devices out there are 7-inch screens, and that’s too small. A while ago I tried using a 7-inch device but after a while I stopped using it. Now I’ve found a tablet I really like.


» Continue Reading (about 700 words)

MySQL falls with the decline of PHP

Sometimes people’s perspective can be so interesting. I mean this with absolutely no irony. Josh Berkus wrote recently in a post about upcoming JSON improvements in PostgreSQL 9.4:

MySQL largely rose on the success of PHP, and it fell as PHP became marginalized.

This is an aside in the blog post, off-topic. But it’s interesting to discuss because it reveals the completely different things people see when they look at something. It’s like the proverbial story about the blind men describing an elephant. We have such a variety of perceptions.


This post, by the way, is not yet another flame war about MySQL versus PostgreSQL. To the contrary, it is very important for MySQL users and community members to understand that there are other communities who do not share the same assumptions, values, and beliefs at all. In my experience, many arguments about things like MySQL versus PostgreSQL result from people (or groups of people) holding such differences but being unaware of them, and therefore misinterpreting words and actions from a group who doesn’t share the same worldview, believing them to be dishonest, irrational, or hostile.

» Continue Reading (about 700 words)

Respectful Introductions and Recommendations

In the last few years of my career, I’ve increasingly been involved in meeting people. This often involves requests or offers for recommendations, introductions, and so forth.

I’ve learned to be very careful about making or accepting such offers or requests, and I’d like to share my current thoughts about that with you, because a lot of trouble can come of a seemingly innocent request or offer.


» Continue Reading (about 2600 words)

Amber Alert: Worse Than Nothing?

In the last few years, there’s been a lot of discussion about alerts in the circles I move in. There’s general agreement that a lot of tools don’t provide good alerting mechanisms, including problems such as unclear alerts, alerts that can’t be acted upon, and alerts that lack context. Yesterday and today at the Strata conference, my phone and lots of phones around me started blaring klaxon sounds. When I looked at my phone, I saw something like this (the screenshot is from a later update, but otherwise similar):

» Continue Reading (about 500 words)

Bloom Filters Made Easy

I mentioned Bloom Filters in my talk today at Strata. Afterwards, someone told me it was the first time he’d heard of Bloom Filters, so I thought I’d write a little explanation of what they are, what they do, and how they work. But then I found that Jason Davies already wrote a great article about it. Play with his live demo. I was able to get a false positive through luck in a few keystrokes: add alice, bob, and carol to the filter, then test the filter for candiceaklda.

» Continue Reading (about 200 words)

MySQL, SQL, NoSQL, Open Source And Beyond: a Google Tech Talk

I’ve been invited to give a Tech Talk at Google next Thursday, February 13th, from 11:00 to 12:30 Pacific time. Unfortunately the talk won’t be streamed live, nor is it open to the general public, but it will be recorded and hosted on YouTube afterwards. I’ve also been told that a small number of individuals might be allowed to attend from outside Google. If you would like me to try to get a guest pass for you, please tweet that to @xaprb.

» Continue Reading (about 300 words)

A simple rule for sane timestamps in MySQL

Do you store date or time values in MySQL? Would you like to know how to avoid many possible types of pain, most of which you cannot even begin to imagine until you experience them in really fun ways? Then this blog post is for you. Here is a complete set of rules for how you can avoid aforementioned pain: All date and time columns shall be INT UNSIGNED NOT NULL, and shall store a Unix timestamp in UTC.

» Continue Reading (about 100 words)