How to Extract Data Points From a Chart

I often see benchmark reports that show charts but don’t provide tables of numeric results. Some people will make the actual measurements available if asked, but I’ve been interested in analyzing many systems for which I can’t get numbers. Fortunately, it’s usually possible to get approximate results without too much trouble. In this blog post I’ll show several ways to extract estimates of values from a chart image.


» Continue Reading (about 1000 words)

Setting Thresholds With Quantiles

I was talking with someone the other day about a visualization I remembered seeing some years ago, that could help set a reasonable value for a threshold on a metric. As I’ve written, thresholds are basically a broken way to monitor systems, but if you’re going to use them, I think there are simple things you can do to avoid making threshold values completely arbitrary.

I couldn’t find the place I’d seen the visualization (if you know prior art for the below, please comment!) so I decided to just blog about it. Suppose you start off with a time series:

time series

» Continue Reading (about 800 words)

New O'Reilly Book, Anomaly Detection For Monitoring

UPDATE: the book is now available from

Together with Preetam Jinka, I’m writing a book for O’Reilly called Anomaly Detection for Monitoring (working title).

I’d like your help with this. Would you please comment, tweet, or email me examples of anomaly detection used for monitoring; and monitoring problems that frustrate you, which you think anomaly detection might help solve?

Thanks in advance.


» Continue Reading (about 100 words)

Can Anomaly Detection Solve Alert Spam?

Anomaly detection is all the buzz these days in the “#monitoringlove” community. The conversation usually goes something like the following: Alerts are spammy and often generate false positives. What you really want to know is when something anomalous is happening. Anomaly detection can replace static thresholds and heuristics. The result will be better accuracy and lower noise. I’m going to give a webinar about the science of statistical anomaly detection on June 17th.

» Continue Reading (about 100 words)

Thinking clearly about fitting a model to data

I have often seen people fitting curves to sets of data without first understanding whether that is appropriate. I once even used this blog to criticize someone for doing that. I was trying to explain that it’s wrong to fit a model to a set of measurements, unless the model actually describes the process that produced the measurements. All of my explanations (and rants) have fallen far short of the clarity and simplicity of this curve-fitting guide.

» Continue Reading (about 400 words)

Determining the Universal Scalability Law's coefficient of performance

If you’re familiar with Neil Gunther’s Universal Scalability Law, you may have heard it said that there are two coefficients, variously called alpha and beta or sigma and kappa. There are actually three coefficients, though. See? \[ C(N) = \frac{N}{1 + \sigma(N-1) + \kappa N (N-1)} \] No, you don’t see it – but it’s actually there, as a hidden 1 multiplied by N in the numerator on the right-hand side.

» Continue Reading (about 300 words)

Trending data with a moving average

In my recent talk at Surge and Percona Live about adaptive fault detection (slides), I claimed that hardcoded thresholds for alerting about error conditions are usually best to avoid in favor of dynamic or adaptive thresholds. (I actually went much further than that and said that it’s possible to detect faults with great confidence in many systems like MySQL, without setting any thresholds at all.) In this post I want to explain a little more about the moving averages I used for determining “normal” behavior in the examples I gave.

» Continue Reading (about 600 words)