Archive for the ‘Oracle’ tag
Forecasting Oracle Performance. By Craig Shallahamer, Apress 2007. Page count: about 250 pages. (Here’s a link to the publisher’s site). Short version: buy it and read it, but make sure you don’t rely on it alone; deepen your knowledge through other sources.
I bought and read this book because I’m interested in performance, performance forecasting, and capacity planning. I’m not interested in forecasting Oracle performance per se. However, I have noticed that there is a lot of good literature in the Oracle arena that can apply to other databases (*cough* MySQL), and even systems of any type. Oracle and its practitioners are at least a decade ahead of MySQL in terms of treating performance scientifically.
This book is a compendium of performance forecasting techniques. It begins with an introduction to performance forecasting with simple models, and gradually gets into the more advanced techniques such as queuing theory, which match the real world better. It ends with chapters on ratio modeling, linear regression modeling, and scalability.
The book is fairly straightforward and easy to read. Chapter summaries are well written, and the structure is clear and well thought through. It has frequent case studies to show the topics through examples. I appreciated this; I think it makes things pretty clear, although it is a bit wordy sometimes. Some of my colleagues did not like the case studies at all. There really are a lot of case studies, so maybe he just went too far for some people’s taste. Some of them seemed a bit magical, too: “given that the sky is blue and grass is green, then e-to-the-i-pi plus one equals zero, and we’ll see why that’s so later.”
Chapter 1 discusses several different types of models, including mathematical, benchmark, and simulation models. It introduces the challenges in forecasting performance. Chapter 2 begins with definitions of transactions, arrival rate, and other notions that are essential to understanding performance. It begins to discuss the familiar response time curve and queuing at this point. It shows the difference between CPU and I/O subsystems in terms of their queuing models. Later in the chapter, it introduces what it calls essential mathematics for performance forecasting. These are a handful of formulas that the author uses to model performance under changing circumstances. I have an issue with these formulas. All of the definitions and math that we have seen so far in the book makes it seem as though we are talking about the formal queueing math that many of us are perhaps used to. However, the formulas that are shown here under the essential mathematics heading are not Erlang C formulas. They are approximations that are not accurate at all. The author does not disclose this, and a lazy reader such as myself might assume that he is simply skipping some of the more advanced aspects of queuing theory and presenting the functions simplified down to their most important forms. Indeed, this is what I thought at first. I thought the functions looked a little bit funny, but I did not check the math; I thought he was skipping details (hence the word “essential?”), and I was confused. Readers need to beware that this chapter is playing fast and loose with the response time mathematics. They are not “of the essence” at all.
In chapter 3, the author introduces modeling gotchas, several forecasting models, and how to choose them. At this point it also begins to talk about more correct response time mathematics, such as the Erlang C formulas. There is a lot of discussion of the difference between these formulas and the so-called essential formulas presented earlier. I think he should have just stuck with Erlang C formulas and skipped this “essential” stuff, or at least presented it later as simplifications that are easier to work out by hand for back-of-envelope math, rather than making it seem like The Answer without qualification.
Chapter 4 continues with basic forecasting statistics, including definitions of samples and populations, skew, and other things that will be familiar to you if you’ve taken statistics or probability courses. Chapter 5 follows with an introduction to queuing theory. There is a good overview of Little’s Law and Kendall’s notation. There are lots of graphs in this chapter, showing how the response time curves change under different circumstances. The book also begins to use a spreadsheet, which is available from the author’s website, for showing how response time varies for particular examples. The spreadsheet shows a lot of output that the author never explains mathematically, such as standard deviation of response time. How does one forecast the standard deviation of response time given the input parameters? I am not sure. I wish the book had told me, so I could form an opinion on whether it is valid and useful. Another thing that I think this book glosses over is validating that the workload can be modeled accurately with queuing theory. The distribution of arrival rates and response times matters a lot, but it really was not mentioned prominently.
I would consider chapter 6 to be something that most people want to skip. It is a little bit promotional of the author’s own method for his consulting practice, and I don’t think it is concrete enough for most people to put into action. In fact, chapter 7, which is about characterizing the workload, is much the same way. After reading it, I was unclear on exactly how to apply it. Maybe I just needed to read it more times. It felt to me like he was kind of insistent about “you must characterize your workload!” and then… we’re all waiting… yes? Oh, here is the chapter summary. Letdown.
Chapter 8 introduces ratio modeling, which is essentially a set of rules of thumb that predict how a system might perform based on intuition and experience with similar systems. I am not sure how useful this is, because the ratios seem overly simplistic. However, I am willing to accept that because systems are so hard to model, ratios might be just as good as formal queueing math.
Chapter 9 is about linear regression modeling. There is a lot of good stuff in here about how to take a list of measurements and fit it to a curve. There are examples of residual analysis, how to get rid of statistical outliers, and how to understand the correlation strength.
Chapter 10, Scalability, begins with a definition that I think most people get wrong. “A solid definition is that scalability is a function that represents the relationship between workload and throughput.” I agree with this definition, and I’m glad that he stated it so clearly (although it’s not the only useful definition of scalability). The chapter continues by defining effective CPUs, another relevant topic in the world of hyperthreading and virtualization. Then it introduces several scalability models: Amdahl, geometric, quadratic, and super-serial. Just as with the essential forecasting formulas shown earlier, some of these are clearly ridiculous and do not model real systems at all. The quadratic is a good example. I think readers can see this easily, so he doesn’t necessarily need to spell it out, but I think the amount of space devoted to this was not really warranted. I also think that he is too casual about Amdahl’s law. This last chapter will be familiar to readers of Neil J. Gunther’s work, although I value Gunther’s approach more highly.
I am a bit skeptical about this book. There is too much rabbit-out-of-hat with the math, so much so that I ended up taking almost everything with a grain of salt and thinking “I’ll make a mental note about that, and if I ever encounter a situation where it could be of use, I’ll have to do the work and prove or research proofs myself.” Too many of the foundational bits are swept under the rug, so you get a book that kind of says “This is hard stuff, but just trust me and my magic spreadsheet and you’ll be all right.” Also, in many places where the rubber meets the road, the book stops just short of really showing how to apply the material. It’s kind of hard to explain what I mean, but I get the feeling he withholds a bit to promote his business and himself. In the end he doesn’t really show what to do with the Scalability chapter; it isn’t included in his Patented Method ™ and so it seems like a waste, or a revelation that the stuff you’ve learned so far in the book is going to turn out to be an oversimplification after all (a feeling I got a lot in this book). There were too many “oh hey, so this invalidates the earlier stuff” surprises in the book for me. And some of the things that he kind of insists are SO IMPORTANT are the parts he doesn’t really cover properly or give you a good take-away for, in my view. Validation of precision of results is one of those.
In the end, despite my reservations, I think this book is worth buying because I haven’t yet seen a better book on performance forecasting. I have seen better books on capacity planning (check my list of essential books), but that’s not the same thing. Although not everything is explained fully and there is not enough mathematical rigor to satisfy me, the applications of the techniques are worth learning, provided you do not rely on this book alone.
To program is human, to instrument is divine. Complex systems that will support a heavy workload will eventually have to be tuned for it. There are two prerequisites for tuning: tunability, and measurability.
Tunability generally means configuration settings. Adding configuration settings is a sign of a humble and wise programmer. It means that the programmer acknowledges “I don’t understand how this system will be used, what environment it will run in, or even what my code really does.” Sometimes things are hard-coded. InnoDB is notorious for this, although don’t take that to mean that I think Heikki Tuuri isn’t humble and wise — nobody’s perfect. Sometimes programmers set out to create systems that are self-tuning. I’m not aware of any success stories I can point to in this regard, but I can point to plenty of failures. Perhaps I can’t think of any successes because I don’t need to.
Measurability (instrumentation) is the next sign of a wise and humble programmer. If your system must be tuned, then it needs to be measured to enable wise decisions. There are at least two important kinds of metrics — a subject for another blog post. Most large systems I’ve worked with (primarily database systems, but operating systems too) are seriously lacking in measurability. A programmer who makes the system measurable acknowledges “I might be wrong, and if I am, it’s a good thing to enable people to prove it,” and realizes that “you cannot improve what you cannot measure.”
Complex, high-load systems get micro-optimized, making them even more opaque. By the time an I/O operation in InnoDB reaches the disk, it’s often impossible to blame it on a specific query. Not just because of lack of instrumentation — even with perfect instrumentation, I/O operations wouldn’t be assignable one-to-one with user actions. Optimization does that, because a lot of optimizations are about deferring, anticipating, or combining work. That makes instrumentation even more important.
This weekend, I heard conflicting stories about instrumentation in Postgres. Someone claimed to have offered patches with a detailed set of instrumentation (I’d also heard this story from someone else at the same company, six months ago in a different place). He told me that the maintainers had declined it on the basis of the added overhead. Someone else told me that no such offer had been made, at least not in public where the decision could be taken to the mailing lists. I don’t know what’s true. I do know that stock Postgres is virtually un-instrumented in ways that matter a lot. The same can be said of MySQL, although interestingly the Venn diagram of the ways these two projects are instrumented doesn’t overlap all that much.
The performance and maintenance cost of adding instrumentation to an application pales in comparison to the benefits. There’s a famous quote from Oracle guru Tom Kyte, who when asked about the cost of Oracle’s performance instrumentation, estimated it at negative ten percent. That is, without the ability to measure Oracle and thus improve it, it’d be at least ten percent slower. I think ten percent is a modest estimate for most systems I work with.
One of MySQL’s notable projects was splitting the product into two editions: Enterprise Edition and Community Edition. This move alienated many in the community, and failed to create meaningful differentiation on either side, even with a team of people beating the community bushes for “contributions.” The net differentiation was ultimately Jeremy Cole’s
SHOW PROFILES functionality, which made Community better than Enterprise. Sun put less effort into making this split work, and eventually they abandoned it.
But that could change under Oracle’s stewardship. Oracle’s promises to maintain a GPL version don’t preclude it, and the fact that they thought it worth mentioning explicitly seems significant. Here’s a quote from the press release:
Oracle will not release any new, enhanced version of MySQL Enterprise Edition without contemporaneously releasing a new, also enhanced version of MySQL Community Edition licensed under the GPL. Oracle shall continue to make the source code of all versions of MySQL Community Edition publicly available at no charge.
This manages to sound generous, but a) the second sentence is simply what’s required by law as a consequence of the first sentence, and b) there has been no MySQL Enterprise/Community split for quite a while. So although this press release seems to say that Oracle would be maintaining the status quo, I am not sure that impression is supported by the facts.
I’ve always said that the split didn’t have to be a business failure. I think Oracle could be quite capable of making this work where MySQL couldn’t and Sun decided to stop trying.
A renewed commitment to the split could re-alienate many in the community. It might also result in a closed-source Enterprise Edition of MySQL, a tactic that MySQL themselves tried but abandoned.