Archive for the ‘Queueing Theory’ tag
Of all the books I’ve reviewed, this one has taken me the longest to study first. That’s because there is a lot of math involved, and Neil Gunther knows a lot more about it than I do. Here’s the short version: I’m learning how to use this in the real world, but that’s going to take many months, probably years. I’ve already spent about 10 months studying this book, and have read it all the way through twice — parts of it five times or more. Needless to say, if I didn’t think this was a book with value, I wouldn’t be doing that. But you’ll only get out of this book what you put in. If you want to learn a wholly new way to understand software and hardware scalability, and how to do capacity planning as a result, then buy the book and set aside some study time. But don’t think you’re going to breeze through this book and end up with a simple N-step method to take capacity forecasts to your boss. If you want that, buy John Allspaw’s book instead. (If you’re reading this blog post, you need that book.)
I don’t want to spend a lot of time talking about Neil’s method, because honestly the book isn’t about the method first and foremost, and I think many readers will have a hard time digging the capacity planning method out of the math-ness. This book is, in a sense, a textbook or workbook for his training courses. It begins with a lot of general topics, such as how managers think about capacity, risk, what’s needed in the world of businesses that are driven by Wall Street, ITIL, and so on. Then there’s the mathematical background for the rest of the book, things like significant digits and expressing error.
The part of the book that I’m still studying begins in Chapter 4, which introduces ways to quantify scalability. The math begins with Amdahl’s Law, which you may have heard of. It turns out that not only can this be used to understand how much overall speedup is possible by speeding up part of a process, which is how I’m used to using it, but it can be used to model what is possible with parallelization. (I think I actually learned this in my university classes, but I’d forgotten its uses in parallel computing since then.) Anyway, it’s a straightforward model that makes intuitive sense and is easy to accept. I believe in it because it’s so logical and simple, and because I’ve worked with it for a long time. That’s the last bit of math in this book that I can understand so solidly, because after that, we get into a lot of things that have to do with interaction between concurrently performed work, and nothing is ever intuitive about that domain.
Now, when you’re talking about scalability, you generally are working with scalability of concurrent systems, and queueing theory is Topic Number One. Proper queueing theory is correctly modeled, under certain very restricted conditions, by the Erlang C formula. This is a complex bit of math, and although I believe in it, I don’t understand it enough to know how it’s derived or proven to be correct. Well, there’s no Erlang C math in this book. Neil Gunther goes a completely different direction. Instead of modeling the impact of queueing through the math that describes the model, he creates a new model. Let’s leave the model for later, and just look at what’s nice about not using Erlang C math to model computer system scalability:
- The Erlang C formula requires complex calculations.
- It is valid only in restricted conditions, and it’s a lot of work to prove that your workload conforms.
- It models queueing delay, but it doesn’t model coherency delay.
- It requires inputs such as service time, which are difficult or impossible to measure accurately.
Someone once said that all models are wrong, but some models are useful. Neil Gunther heads in the direction of a more useful model. First, he proves that two parameters are necessary and sufficient to create a realistic model. Next, he introduces another parameter into Amdahl’s Law to account for coherency delay. The resulting (still simple) equation models serial delay (the reverse of parallel speedup) and coherency delay. Now we have a model for how a system scales under a given workload as you increasingly parallelize the hardware. This is the universal scalability model. From the mathematical point of view, it’s the crowning achievement of this book. I’m very much summarizing, by the way. There’s a lot to think about in developing such a model, so the reader gets quite a tour de force here. Along the way Neil shows how you can arrive at the same surprising result through an entirely different route, without even using Amdahl’s Law as a starting point.
There are other models. Neil discusses these. They all have problems. Some don’t model what we know can happen in the real world — retrograde scaling — where performance can decrease when you add more power to a system. Others are physically impossible, predicting negative speedup. Negative speedup means the system’s performance goes below zero. As in, you ask it to do work, and it, uh, takes back work it’s already done? Impossible. So it certainly looks like Neil’s model is the strongest contender. By the way, Craig Shallahamer’s book on forecasting Oracle performance uses the universal scalability model, although without the mathematical rigor.
Now, the problem is how to apply this in the real world. To model a system’s performance, you have to know the value of those two magical parameters. How on earth can you find these values? This seems to be just as hard as Erlang C math. But Neil shows the second most remarkable thing: if you transform the universal scalability model around a bit, then you get a polynomial of degree two. This is exciting because if you take some measurements of your system’s observed performance at different points on the scalability curve (holding the work per processor constant, and adding more processors), and then transform those measurements in an equivalent manner, you can fit a regression curve through those points. Now you can reverse the transformations to the equation, plug in the coefficients of the quadratic equation that resulted from the curve-fitting, and out come the parameters you need for the universal scalability equation! Final result: you can extrapolate out beyond your observations and predict the system’s scalability.
We’re not done. All of this was about hardware scalability: “how much faster will this system run if I add more CPUs?” Software scalability is next. Neil goes back to the basics, starting with how Amdahl’s Law applies to software speedup, and essentially covers all the same ground we’ve already covered, but this time modeling what happens when you hold the hardware constant, and increase the concurrency of the workload the software is serving. It turns out that exactly the same scalability model holds for software as it did for hardware. This is why he calls it the universal scalability model. But not only that, it works for multi-tier architectures of arbitrary complexity.
And this is why I say I am not competent to really prove or disprove the validity of the whole thing. It makes sense to me that even a multi-tier architecture can conform to a model with two parameters. As we know in the real world, there is usually a single worst bottleneck, a weakest link. And therefore no matter how complex the architecture, the dominant factors limiting scalability are still coherency and/or queueing at the bottleneck, and how much you can parallelize (Amdahl’s Law). Thus, the universal scalability model intuitively might be valid for such architectures. But proving it — wow, that’s way beyond me. I know my limits. I’m taking it all on faith, experience, and intuition at this point.
In my mind, the results Neil Gunther derives up to this point in the book would have been plenty. However, there’s lots more left in the book. The rest of the book is about how to use the model for capacity planning, but surprisingly, it’s not about just how to use the universal scalability model. It’s about Guerrilla Capacity Planning in the real world. Right after exploring software scalability, he dives into virtualization for a whole chapter — and then shows you how to measure, model, and predict the scalability of various virtualization technologies. Next chapter: web site capacity planning. After that? “Gargantuan Computing: GRIDs and P2P.” Yep, he analyzes the scalability limits of Gnutella and friends. And then, apparently just because he can, he dissects arguments about network traffic in general (read: “how scalable is the Internet?”). I can’t pretend to understand all this myself. I’m just following along.
I have a feeling that Neil Gunther is kind of like Einstein: his real gift is his ability to create thought experiments that make the model accessible to mortals. Maybe someday he’ll be a legend you learn about in CS101 classes, or maybe someday he’ll be proven wrong like Newton, who knows. In the meantime, I’m going to keep working on applying it all in the real world, especially to MySQL, and see what comes of it. The fact that I’m still doing that bears out what I said earlier: you aren’t going to just waltz through this book and come away with a clear picture of how to work through a capacity planning method. You’ll have some work to do. If you want an elegant and simple capacity planning method, then you should buy John Allspaw’s The Art Of Capacity Planning instead.
The best overall method of performance optimization is optimization for response time. Users care about response time, not load average or cache hit ratios. The job of a system is to accept some request and do the required work, and deliver a result. The time elapsed between the request and the result is the response time.
Methods of Response Time Optimization
Not all optimization methods are created equal. Here are a few I see commonly.
- No method. Most people simply have no method of performance optimization at all. They just look for things that look “bad” and try to make them look “better.” In the MySQL database world, the classic example is trying to improve a cache hit ratio. This is utter folly, and doesn’t become any less stupid no matter how many times it is taught and repeated.
- Server Load Reduction. One step up from that is to try to understand what work the database is performing and discover what part of that work consumes the most response time, then improve that to lower the load on the database server. This is better, but still not a logical way to optimize response time for the end user. Imagine that you’ve built your scaling strategy around archiving and purging unnecessary data — a very sensible strategy. Most well-designed archiving and purging jobs are terribly inefficient, for a reason: they are designed to consume resources that are not needed by anything else, so they don’t interfere with anything else. Archiving a billion rows from a table is best done in nibbles, not in billion-row chunks. The nibbles are going to be slow. If you measure the entire system and find out where the response time goes, you’re almost guaranteed to find these jobs are a top offender. And yet they don’t matter at all, because they have no impact on the user’s response time. Server load reduction is a shotgun approach that sometimes yields results, because it’s easy to aim a shotgun.
- Method R, or Goal-Driven Performance Optimization. Two methods I know of that are based in sound thinking are Cary Millsap’s Method R and Peter Zaitsev’s Goal-Driven Performance Optimization. Cary wrote an excellent book about his method, and I recommend buying that book and reading it at least twice. These methods are guaranteed to truly optimize the system in question: they will produce the best possible performance improvements with the least possible cost, and they have a termination condition that is satisfied when further improvements are either not possible or cost more than they are worth. A system that has been subjected to one of these methods can be confidently called “fully optimized.”
Response time in queued systems
Method R looks at where a system consumes time and sorts the biggest consumers to the top in a profile, then works on those first. Amdahl’s Law guarantees that this is the best way to improve the system’s performance.
Although the approach is correct, it doesn’t mean it’s easy. It might be easy if a system is stable. But unstable systems, those suffering from queueing delay (usually characterized by response time with a high standard deviation, a.k.a. “spikes” or “blips”), are much harder to optimize. The queries that are performing badly can no longer be assumed to be the source of the performance problem. Instead, they might be “good” queries that are the victim of something else happening.
Unstable systems suffer from a) dependencies between requests, and b) statistical variations in request arrival time, which causes queueing. The classic case is lock contention. Suppose someone goes to your OLTP database and runs an ad-hoc query against the table of invoice line items, and locks the table. Normally that table has specific, fast, well-indexed queries against it, but as soon as the ad-hoc query locks it, the queries instantly pile up and “look bad.” The system becomes an unstable train wreck. Alas, database servers such as MySQL typically don’t give the DBA enough information to blame the problem on a source.
Internal contention inside the database software itself is another potential cause of queueing. I can no longer remember the number of times I’ve disabled the query cache in MySQL and solved a system’s random freezes. Typically, the only way to prove that the query cache mutex is the source of the problem is to take a backtrace in GDB. A complex piece of software without good instrumentation is pretty difficult to troubleshoot in conditions like this.
And that brings me back to Method R. I can see that the queries are suffering from unstable performance, but I cannot see directly how to optimize the system because it is un-instrumented. Unfortunately, falling back to system load optimization is often the best that can be done, in terms of maximum optimization with minimal cost. An expert can do this with as much rigor as possible, and hopefully with good knowledge of the system’s internals can find the source of the problem quickly, but it’s still a much harder problem.
And this is why it is a crime to write un-instrumented software: because when an un-instrumented system starts to get overloaded, it is very hard to determine the source of performance problems.
This is easily one of the best books I’ve ever read on performance optimization. I’ve just finished reading it for the second-and-a-half time in two weeks, and I very rarely read a book more than once. I’ve been telling a lot of people about it.
Despite the title, it is actually not about Oracle performance. It is a book on how to optimize a) any system, including a MySQL-based application b) the process of optimization itself. It is very, very good, and I highly recommend it to all database users. My bet is that most people will learn more by reading this book than by spending thousands of dollars on conferences and training, especially since a lot of what you’ll learn from those sources is wrong and harmful. So not only will you save money and time on learning, you’ll reap great rewards thereafter.
The book is organized into three sections. The first section explains a performance optimization methodology called Method R (for response time) that is designed to deterministically advance from identifying what needs to be optimized, to collecting diagnostic data, and then choosing the correct activities to optimize. Simply put, it is probably the clearest and most logical process I’ve ever seen for focusing on what matters. It is quite similar to our process at Percona, which we call Goal-Driven Performance Optimization. But Cary does a really good job explaining it. Cary told me that he wrote the book while simultaneously developing Method R training classes, and did dozens of classes before the book went to press. You can tell!
The second and third parts of the book are about putting the method into practice.
Cary shows many typical mistakes, such as focusing on ratios, working on things that “look bad,” ignorance of Amdahl’s Law, trying to draw conclusions about specific activities by examining system-wide information, and so on. He brings the focus back to response time again and again.
Another typical “tuning mistake” is not knowing when to quit. Cary shows how to know when further performance improvements, even if they’re possible, will not be economically justified. At the point where the gains don’t exceed the cost, you’re done. The system is optimized. Maybe it’s not performing as well as it could, but if it costs too much to get any more performance, it’s still optimal.
This is a book that backs up every assertion with references to source material, or even with a proof or derivation from first principles. It is significantly more rigorous than most books of this type. There’s a very good chapter on kernel timings that explains a ton of stuff about how processes work, measurement intrusion effect, quantization error, and so on.
There is a very good section on queueing theory, with excellent examples of using it to prove that a desired improvement is mathematically impossible and/or impractically expensive. There are also examples of how to use queueing theory to predict when a hardware upgrade will worsen your performance problems. (And lots of other examples of when “optimizing” something will actually make it worse, as for example improving performance of some task other than the one you’re interested in, thereby making it use and contend for more resources — and hurting performance of the task you wanted to improve.)
What is it NOT about? It’s not about how to write more efficient SQL — there is practically nothing in the book about SQL. It’s not about how everything inside Oracle works, though it does have one or two chapters that are mostly about how to apply the principles to Oracle through extended SQL trace files, a discussion that will transfer well to any other system that emits trace data. (Read these, or you’ll miss learning about quantization error and such!)
Highly, highly recommended.