Xaprb

Stay curious!

Archive for the ‘Sys-Admin’ Category

Version 1.1.6 of Better Cacti Templates released

with one comment

I’ve released version 1.1.6 of the Better Cacti Templates project. This release includes a bunch of bug fixes (but not all of them!) and two new sets of graphs. One set is for disk I/O on GNU/Linux, and the other is a new set of templates for OpenVZ. I’m looking for feedback on both of those. This release also has a bunch of code-level features: much better test coverage (hooray!), and a refactored ss_get_by_ssh.php that makes it much easier to create new graphs and templates. The SSH-based templates also take advantage of the same caching as the MySQL templates, which makes them a lot more efficient.

There are upgrade instructions on the project wiki for this and all releases. There is also a comprehensive tutorial on how to create your own graphs and templates with this project. Use the project issue tracker to view and report issues, and use the project mailing list to discuss the templates and scripts.

The full changelog follows.

2010-01-10: version 1.1.6

  * Added OpenVZ graphs (--type openvz) (issue 95).
  * Added IO usage graphs (--type diskstats) (issue 97).
  * Added extra error-reporting (issue 110).
  * The $debug $debug_log options couldn't be set in the .cnf file (issue 110).
  * Added a --use-ssh option to ss_get_by_ssh.php (issue 66).
  * Added a debugging log to ss_get_by_ssh.php (issue 54).
  * Enabled caching of results in ss_get_by_ssh.php (issue 46).
  * Added a test suite for ss_get_by_ssh.php (issue 110).
  * The 'free' stats suffered from PHP's issues with big numbers (issue 102).
  * There was ambiguity (but no error) in SHOW STATUS overrides (issue 106).
  * It was hard to debug failures caused by missing ext/mysql (issue 105).
  * Code to make ss_get_mysql_stats.php testable was broken (issue 108).

Written by Xaprb

January 10th, 2010 at 11:01 am

Posted in PHP, SQL, Sys-Admin

Tagged with , , ,

How Linux iostat computes its results

with one comment

iostat is one of the most important tools for measuring disk performance, which of course is very relevant for database administrators, whether your chosen database is Postgres, MySQL, Oracle, or anything else that runs on GNU/Linux. Have you ever wondered where statistics like await (average wait for the request to complete) come from? If you look at the disk statistics the Linux kernel makes available through files such as /proc/diskstats, you won’t see await there. How does iostat compute await? For that matter, how does it compute the average queue size, service time, and utilization? This blog post will show you how that’s computed.

First, let’s look at the fields in /proc/diskstats. The order and location varies between kernels, but the following applies to 2.6 kernels. For reads and writes, the file contains the number of operations, number of operations merged because they were adjacent, number of sectors, and number of milliseconds spent. Those are available separately for reads and writes, although iostat groups them together in some cases. Additionally, you can find the number of operations in progress, total number of milliseconds during which I/Os were in progress, and the weighted number of milliseconds spent doing I/Os. Those are not available separately for reads and writes.

The last one is very important. The field showing the number of operations in progress is transient — it shows you the instantaneous value, and this “memoryless” property means you can’t use it to infer the number of I/O operations that are in progress on average. But the last field has memory, because it is defined as follows:

Field 11 — weighted # of milliseconds spent doing I/Os This field is incremented at each I/O start, I/O completion, I/O merge, or read of these stats by the number of I/Os in progress (field 9) times the number of milliseconds spent doing I/O since the last update of this field. This can provide an easy measure of both I/O completion time and the backlog that may be accumulating.

So the field indicates the total number of milliseconds that all requests have been in progress. If two requests have been waiting 100ms, then 200ms is added to the field. And thus it records what happened over the duration of the sampling interval, not just what’s happening at the instant you look at the file. We’ll come back to that later.

Now, given two samples of I/O statistics and the time elapsed between them, we can easily compute everything iostat outputs in -dx mode. I’ll take them slightly out of order to reflect how the computations are done internally.

  • rrqm/s is merely the incremental merges divided by the number of seconds elapsed.
  • wrqm/s is similarly simple, and r/s, w/s, rsec/s, and wsec/s are too.
  • avgrq-sz is the number of sectors divided by the number of I/O operations.
  • avgqu-sz is computed from the last field in the file — the one that has “memory” — divided by the milliseconds elapsed. Hence the units cancel out and you just get the average number of operations in progress during the time period. The name (short for “average queue size”) is a little bit ambiguous. This value doesn’t show how many operations were queued but not yet being serviced — it shows how many were either in the queue waiting, or being serviced. The exact wording of the kernel documentation is “…as requests are given to appropriate struct request_queue and decremented as they finish.”
  • %util is the total time spent doing I/Os, divided by the sampling interval. This tells you how much of the time the device was busy, but it doesn’t really tell you whether it’s reaching its limit of throughput, because the device could be backed by many disks and hence capable of multiple I/O operations simultaneously.
  • await is the total time for all I/O operations summed, divided by the number of I/O operations completed.
  • svctm is the most complex to derive. It is the utilization divided by the throughput. You saw utilization above; the throughput is the number of I/O operations in the time interval.

Although the computations and their results seem both simple and cryptic, it turns out that you can derive a lot of information from the relationship between these various numbers. This is one of those tools where a few lines of code have a surprising amount of meaning, which is left for the reader to understand. I’ll get more into that in the future.

Written by Xaprb

January 9th, 2010 at 9:53 pm

Posted in GNU/Linux, PostgreSQL, SQL, Sys-Admin, Tools

Tagged with

A review of Cacti 0.8 Network Monitoring by Dinangkur Kundu and S. M. Ibrahim Lavlu

without comments

Cacti Network Monitoring

Cacti Network Monitoring

Cacti 0.8 Network Monitoring, Dinangkur Kundu and S. M. Ibrahim Lavlu, Packt, 2009. Page count: 110 pages. (Here’s a link to the publisher’s site.)

This is a quite short book that covers some of the breadth but very little of the depth of Cacti. For example, it focuses on Cacti as an SNMP tool for graphing network data, but SNMP is only one of the many ways Cacti can collect data, and of course it graphs anything, not just networks. Each chapter takes the reader through the most important topics, with some code listings and screenshots. On the plus side, this makes it very easy to read quickly, because it doesn’t go off on many tangents about special cases and errors.

I don’t want to criticize too much, but I think I should give a summary of the major shortcomings. First, the book is just too small, especially for the price. It is also not very well edited; it seems to have been edited by non-English speakers. Finally, it constantly refers to Cacti as a monitoring tool, even talking about the need to find out about crashed equipment and so on — but it doesn’t clearly say that Cacti is only for performance graphing, not for monitoring and alerting. I wish they had not flung the word “monitoring” around so casually.

In terms of topics, it has an overview, installation, creating graphs, creating templates, managing users, SNMP, data queries, and basic administration. The strongest point is the explanation of SNMP. The other chapters have a lot of needless information and screenshots. The installation chapter, for example, goes through installing prerequisites from APT — which APT can do itself.

In the end it’s light reading that shouldn’t take you long to finish — an overview in case you don’t know much about Cacti.

Written by Xaprb

January 9th, 2010 at 6:25 pm

Posted in Review, Sys-Admin

Tagged with