# Analyzing Changing Workloads with the USL

Production servers often have much more dynamic, complex workload and behaviors than you might be accustomed to seeing, because most monitoring products aggregate system behavior into high-level global metrics, losing all the detail. Multivariate analysis by another dimension often reveals something unexpected about server behavior. This can prompt you to explore the system, and sometimes leads to a deeper understanding of it. Here’s one such example.

# The Erlang Response Time Stretch Factor For 3 And 4 Servers

In a previous post I explored a few variations of equations that express the M/M/m queueing theory response time “stretch factor,” and tried to indicate some areas where I wanted to dig into the relationships between these formulas a bit more. In this post I discuss the divergence between the official Erlang C formula and Neil Gunther’s heuristic approximation to it. I introduced this before thusly:

At $$m=3$$ and above, the heuristic is only approximate. What does the Erlang form reduce to for the first of those cases? Does it result in the missing term that will extend to 4 and beyond too?

# The Queueing Knee, Part 2

Last week I wrote about the so-called “knee” in the M/M/m queueing theory response time curve. In that post I examined one definition of the knee; here is my analysis of the others, including the idea that there is no such thing as the knee.

There are potentially several ways to think about the “knee” in the queueing curve. In the previous post I dug into Cary Millsap’s definition: the knee is the point where a line tangent to the queueing curve passes through the origin:

Here are a few others to consider:

# The Queueing Knee, Part 1

The “knee” in the M/M/m queueing theory response time curve is a topic of some debate in the performance community. Some say “the knee is at 75% utilization; everyone knows that.” Others say “it depends.” Others say “there is no knee.”

Depending on the definition, there is a knee, but there are several definitions and you may choose the one you want. In this post I’ll use a definition proposed by Cary Millsap: the knee is where a line from the origin is tangent to the queueing response time curve. The result is a function of the number of service channels, and although we may argue about the topics in the preceding paragraph and whether this is the right definition, it still serves to illustrate important concepts.

# The Response Time Stretch Factor

Computer systems, and for that matter all types of systems that receive requests and process them, have a response time that includes some time waiting in queue if the server is busy when a request arrives. The wait time increases sharply as the server gets busier. For simple M/M/m systems there is a simple equation that describes this exactly, but for more complicated systems this equation is only approximate. This has rattled around in my brain for a long time, and rather than keeping my notes private I’m sharing them here (although since I’m still trying to learn this stuff I may just be putting my ignorance on full display).

# The Square Root Staffing Law

The square root staffing law is a rule of thumb derived from M/M/m queueing theory, useful for getting an estimate of the capacity you might need to serve an increased amount of traffic.

The square root staffing law is designed to help with capacity planning in what’s called the QED regime, which tries to balance efficiency with quality of service. Capacity planning is a set of tradeoffs: for best quality of service, you must provision lots of spare capacity (headroom), but that’s wasteful. For best efficiency, you minimize idle capacity, but then quality of service becomes terrible.

# How to Extract Data Points From a Chart

I often see benchmark reports that show charts but don’t provide tables of numeric results. Some people will make the actual measurements available if asked, but I’ve been interested in analyzing many systems for which I can’t get numbers. Fortunately, it’s usually possible to get approximate results without too much trouble. In this blog post I’ll show several ways to extract estimates of values from a chart image.

# Thinking clearly about fitting a model to data

I have often seen people fitting curves to sets of data without first understanding whether that is appropriate. I once even used this blog to criticize someone for doing that. I was trying to explain that it’s wrong to fit a model to a set of measurements, unless the model actually describes the process that produced the measurements. All of my explanations (and rants) have fallen far short of the clarity and simplicity of this curve-fitting guide.

# A close look at New Relic's scalability chart

I’ve written a lot about modeling MySQL with the USL, and I like it best of all the scalability models I’ve seen, but it’s not the only way to think about scalability. I was aware that New Relic supports a scalability chart, so I decided to take a peek at that. Here’s a screenshot of the chart, from their blog: Here’s how it works. It plots response time (or database time, or CPU) as the dependent variable, versus throughput as the independent variable.

# Modeling scalability with the USL at concurrencies less than 1

Last time I said that you can set a starting value for the USL’s coefficient of performance and let your modeling software (R, gnuplot, etc) manipulate this as part of the regression to find the best fit. However, there is a subtlety in the USL model that you need to be aware of. Here is a picture of the low-end of the curve: The graph shows the USL model as the blue curve and linear scalability as the black line.

# Determining the USL's coefficient of performance, part 2

Last time I said that the USL has a forgotten third coefficient, the coefficient of performance. This is the same thing as the system’s throughput at concurrency=1, or C(1). How do you determine this coefficient? There are at least three ways. Neil Gunther’s writings, or at least those that I’ve read and remember, say that you should set it equal to your measurement of C(1). Most of his writing discusses a handful of measurements of the system: one at concurrency 1, and at least 4 to 6 at higher concurrencies.

# Determining the Universal Scalability Law's coefficient of performance

If you’re familiar with Neil Gunther’s Universal Scalability Law, you may have heard it said that there are two coefficients, variously called alpha and beta or sigma and kappa. There are actually three coefficients, though. See? $C(N) = \frac{N}{1 + \sigma(N-1) + \kappa N (N-1)}$ No, you don’t see it – but it’s actually there, as a hidden 1 multiplied by N in the numerator on the right-hand side.

# Scalability, performance, capacity planning and USL at Hotsos Symposium

I presented at this year’s [Hotsos Symposium](). I am searching for a claim to specialness, and I think it may be that I am the first Hotsos presenter who’s specifically focused on MySQL. True? I don’t know, but I’ll run with it for now. VividCortex is the startup I founded in 2012. It’s the easiest way to monitor what your servers are doing in production. It does TCP network traffic analysis.