Automatically detecting abnormal behavior in MySQL

Over the course of years, I have observed that the three most sensitive indicators of MySQL having a server lockup are the queries per second, number of connections, and number of queries running. Here is a chart of those three on a production system. Find the bad spot:

I am currently working on developing an automated system that detects abnormal behavior in these three metrics, but doesn’t require any a priori inputs or thresholds, e.g. you don’t have to tell it “more than X is bad.” (It could be that during a low period of the day, X is different than during the peak load.)

It turns out that this is hard to do reliably, without a lot of false positives and without false negatives (not triggering during an incident). If there is existing literature on the mathematical techniques to do this, I’d be interested in not reinventing the wheel. Does anyone have references to share?

See Also

I'm Baron Schwartz, the founder and CEO of VividCortex. I am the author of High Performance MySQL and lots of open-source software for performance analysis, monitoring, and system administration. I contribute to various database communities such as Oracle, PostgreSQL, Redis and MongoDB. More about me.