# How to Extract Data Points From a Chart

I often see benchmark reports that show charts but don’t provide tables of numeric results. Some people will make the actual measurements available if asked, but I’ve been interested in analyzing many systems for which I can’t get numbers. Fortunately, it’s usually possible to get approximate results without too much trouble. In this blog post I’ll show several ways to extract estimates of values from a chart image.

# Setting Thresholds With Quantiles

I was talking with someone the other day about a visualization I remembered seeing some years ago, that could help set a reasonable value for a threshold on a metric. As I’ve written, thresholds are basically a broken way to monitor systems, but if you’re going to use them, I think there are simple things you can do to avoid making threshold values completely arbitrary.

I couldn’t find the place I’d seen the visualization (if you know prior art for the below, please comment!) so I decided to just blog about it. Suppose you start off with a time series:

# New O'Reilly Book, Anomaly Detection For Monitoring

UPDATE: the book is now available from https://ruxit.com/anomaly-detection/.

Together with Preetam Jinka, I’m writing a book for O’Reilly called Anomaly Detection for Monitoring (working title).

I’d like your help with this. Would you please comment, tweet, or email me examples of anomaly detection used for monitoring; and monitoring problems that frustrate you, which you think anomaly detection might help solve?

# Can Anomaly Detection Solve Alert Spam?

Anomaly detection is all the buzz these days in the “#monitoringlove” community. The conversation usually goes something like the following: Alerts are spammy and often generate false positives. What you really want to know is when something anomalous is happening. Anomaly detection can replace static thresholds and heuristics. The result will be better accuracy and lower noise. I’m going to give a webinar about the science of statistical anomaly detection on June 17th.

# Thinking clearly about fitting a model to data

I have often seen people fitting curves to sets of data without first understanding whether that is appropriate. I once even used this blog to criticize someone for doing that. I was trying to explain that it’s wrong to fit a model to a set of measurements, unless the model actually describes the process that produced the measurements. All of my explanations (and rants) have fallen far short of the clarity and simplicity of this curve-fitting guide.

If you’re familiar with Neil Gunther’s Universal Scalability Law, you may have heard it said that there are two coefficients, variously called alpha and beta or sigma and kappa. There are actually three coefficients, though. See? $C(N) = \frac{N}{1 + \sigma(N-1) + \kappa N (N-1)}$ No, you don’t see it – but it’s actually there, as a hidden 1 multiplied by N in the numerator on the right-hand side.