Review of Pro Nagios 2.0 and Nagios System and Network Monitoring

Last week I read two books on Nagios. I found one easy to use and the other difficult.

Note: VividCortex is the startup I founded in 2012. It’s the easiest way to monitor what your servers are doing in production. VividCortex offers MySQL performance monitoring and PostgreSQL performance management among many other features.

The books

Cover of Nagios System and Network Monitoring

Nagios System and Network Monitoring (Wolfgang Barth, No Starch Press, 2006) is a delight. It explained Nagios briefly and clearly, showed me how to get it running, and continues to be a useful reference. I rate it 4 out of 5 stars.

Cover of Pro Nagios 2.0

Pro Nagios 2.0 (James Turnbull, Apress, 2006) is hard for me to read, understand and use. Mr. Turnbull is clearly an expert on Nagios, but the book doesn’t communicate his knowledge effectively. I give it two stars.

Despite the differences, the books are identical in many ways. Both have chapters on the same topics. They cover exactly the same subject matter in the same level of detail, have the same target audiences, were written about the same time, are about the same size, and so on. The differences are in organization and writing style.

Mr. Barth’s book lets me do a breadth-first search of the subject matter, and Mr. Turnbull’s makes me do a depth-first search. Breadth-first is a better way to learn a topic like Nagios.

Pro Nagios 2.0 (James Turnbull)

I read James Turnbull’s Pro Nagios 2.0 first. I brought it home for the weekend, and read past page 200 – more than half the book – but I did not learn much about Nagios. Here’s why I found it hard to use:

After a while I tried to skim through and read the book at a higher level, but I found it very difficult.

It’s not impossible to use, and I don’t mean to criticize too much. For example, once I was familiar with Nagios I found it useful for reference a few times. If you know what you want to find, it’s easy to navigate the book with the table of contents or the index (though the index doesn’t seem to be complete). I remembered seeing something about how to organize configuration files in various ways, and I referred back to that since Mr. Barth’s book didn’t mention it. This discussion, on pages 32-35, helped me think about how to arrange configuration files into subdirectories with the cfg_dir directive.

On the whole, I found the book a poor introduction to Nagios.

Nagios System and Network Monitoring (Wolfgang Barth)

Mr. Barth’s book succeeds through clarity, organization, detail, and ease of use. His first language is apparently German, so his English is not always elegant, but it is concise and communicates his knowledge of Nagios well.

The best thing about the book is the organization. The first part, “From Source Code to a Running Installation,” is three compact chapters on everything you need to get Nagios working. The first chapter is about compiling, installing, and basic web interface configuration. The second is an overview of general Nagios configuration – the stuff you need to actually monitor things. The third is about checking your configuration and starting Nagios.

By page 55 I understood more about Nagios than I was able to glean from Pro Nagios 2.0. When I finished the first three chapters, I had Nagios running and monitoring itself.

The second part is called “In More Detail,” and it was very helpful while I configured Nagios to monitor other servers. It was a useful reference when I looked up specific topics, and easy to skim for important instructions and pitfalls. The good introductory sentences make it easy to decide whether I need to read a section, and the uninterrupted text makes it easy to skip if I don’t. Large blocks of text make headings stand out, so there’s less need to use the table of contents. References have page numbers.

There are few footnotes, and no interruptions in the text. When there are configuration examples, the book shows the whole configuration definition for context, but makes the relevant parts bold, so they’re easy to see. There are lots of diagrams, which clearly show concepts that would require hundreds of words to explain. Captions are in the margins so they don’t interrupt the text.

I could go on, but suffice to say it’s a delight to read and use this book.

On the downside, I dislike the typeface the Barth book uses. Though I’m no expert on typefaces, I think it has a hard-on-the-eyes combination of shapes, heavy line-weights in odd places, and internal white-space. The typeface is sans-serif, so the letters are harder to recognize; for example, the “c” looks a lot like the “e.” It is difficult to speed-read. On the other hand, the text is fully justified and hyphenated (the publisher told me he believes it was typeset with LaTeX, which explains a lot – I love LaTeX). This makes it easier to read than Apress’s standard ragged-right typesetting.

The difference between the books

Let me show you the differences between the two books by comparing how they explain “flapping,” a Nagios feature to prevent a flood of notifications from services that are cycling between states.

The earliest index entry for flapping in the Turnbull book is on page 47. On pages 47 and 48, Turnbull explains the algorithm in detail:

So how does Nagios determine if a host or service is flapping? When you enable flap detection, Nagios keeps a record of the last 21 states of the host or service in an array. It then counts the number of times the states in the array have changed. This is calculated as a percentage. For example, with 21 states recorded we have a possible 20 state changes. For a normal-behaving host or service, it may be that the last recorded 21 states were OK. Hence, the percentage of states change is 0 percent. If, however, during the last 21 recorded states the host or service changed state 9 times, then the percentage of state change is 45 percent. But there is another added layer of complexity in this calculation. The states in this array are weighted – the newest state is considered 50 percent more important than the oldest state. This is because Nagios considers that the newer states are more indicative of the current behavior of the host or service than the older states.

Compare those 175 words to page 219 in the Barth book:

If a regular test shows that service or computer is changing its data continuously, this is called flapping in Nagios.

The Barth book says no more on the topic, but refers interested readers to the appendix on page 401, where it explains flapping with a nice diagram.

This illustrates the major differences between the books:

  1. The Barth book tells you where to learn more, if you want to know. The Turnbull book practically forces you to read through irrelevant details.
  2. The Barth book presents minor concepts like flapping later in the book (page 219 and 401). The Turnbull book puts the trivia in the beginning (page 47).
  3. The Barth book uses a dozen or so words, or a diagram. The Turnbull book uses hundreds of words instead.

Summary

I like Wolfgang Barth’s Nagios System and Network Monitoring better than James Turnbull’s Pro Nagios 2.0.


Comments